Feat/config optimization sources by aa1ex · Pull Request #222 · kaasops/vector-operator

aa1ex · 2026-06-12T13:36:30Z

Every kubernetes_logs source in vector runs its own apiserver clients: 3 watch streams (pods/namespaces/nodes) and a separate pod metadata cache. The operator generates a source per pipeline, so an agent on a cluster with N pipelines holds 3xN watch connections and N copies of the node's pod metadata, and reconnects all of them every ~290s. On our cluster with ~2000 pipelines vector agents generate 90-95% of all kube-apiserver requests (55-60M per hour), and an agent doesn't even start with default resource limits at ~500 sources.

This adds an opt-in operator flag --enable-config-optimization (a Vector CR can be opted out with the vector-operator.kaasops.io/config-optimization: disabled annotation, e.g. for a staged rollout). The CRD is not changed; the flag is expected to become the default and be removed eventually. When enabled, sources that differ only in the watched namespace are collapsed into a single source with a kubernetes.io/metadata.name in (...) selector (split at 1000 namespaces to keep the selector reasonable), and route transforms split the stream back per namespace: a flat route up to 16 namespaces, md5-bucketed two-level routing above that. Inputs of pipeline transforms and sinks are rewired automatically. Sources with different settings are left alone. An event matching several pipelines still reaches all of them. Source names are derived from the group settings hash and don't depend on the namespace list, so file checkpoints survive adding/removing pipelines. With the flag off the generated config doesn't change.

Numbers from a test bench (1000 pipelines, single-node kind, vector 0.48, same workload):

	before	after
agent watch requests to apiserver, per 10 min	6014	6
agent memory	2802 MiB	119 MiB
agent CPU / delivery throughput at nominal load		no change
delivery integrity, 600k numbered events		0 lost, 0 duplicated
kube-apiserver memory (separate run on the same bench, ES sink)	2218 MiB	1403 MiB
apiserver 429 responses	0.24/s steady	0

Rollout note: enabling or disabling the optimization renames the sources, so vector re-reads the log files retained on the nodes once (no losses, one-time duplicates). Checkpoint migration via an init-container is planned as a follow-up PR. Docs: docs/config-optimization.md.

Closes the "Vector config optimization" roadmap item from the README.

…(configOptimization)

…roups, add docs

… hierarchical routing e2e

…on opt-out

…lisions

sakateka · 2026-06-14T18:26:25Z

Hi! Could you please explain what will happen if I enable optimization and one of the sinks gets stuck and can’t send logs, while its retry setting is left at the default (retry indefinitely), and I have buffers configured with when_full: block at every level?

Vector explicitly describes this behavior: “A source only sends events as fast as the slowest sink that is configured to provide backpressure (buffer.when_full = block)” (see the concepts documentation).

Will all pipelines built by this optimization under a single source suffer because of one slow sink?
Is there any best practice for avoiding this?

aa1ex · 2026-06-15T09:21:59Z

Hi, @sakateka! Good question, and yes, that's the expected behavior.

With optimization enabled, pipelines that share identical source settings are collapsed onto a single kubernetes_logs source. Since Vector emits only as fast as the slowest sink applying backpressure (when_full: block), a stuck sink with indefinite retries will stall every pipeline sharing that source, not just its own. Pipelines in other groups and sources that aren't collapsed are unaffected.

This is an inherent trade-off of a shared source: the savings come from collapsing the per-pipeline watchers and readers, while in Vector backpressure isolation only comes from a non-blocking buffer or a bounded retry. So the cleanest fix is on the affected sink itself. Setting buffer.when_full: drop_newest (or a disk buffer with overflow) and/or a bounded request.retry_max_duration_secs breaks the backpressure back to the shared source, keeps the optimization's savings, and only impacts the misbehaving sink.

If you instead need hard isolation for a specific pipeline (a known-unreliable destination where dropping isn't acceptable), the optimizer hasn't been released yet, so before release we'll add a way to exclude that pipeline from optimization, keeping its own dedicated source. We'll document this behavior and the recommendations as well.

Thanks for the detailed report!

aa1ex added 5 commits June 12, 2026 16:06

feat: collapse agent kubernetes_logs sources with identical settings …

3b734d9

…(configOptimization)

feat(config-optimization): dedupe routes per namespace, chunk large g…

461fae8

…roups, add docs

fix(config-optimization): use mod() instead of % in bucketer VRL, add…

bc76f8e

… hierarchical routing e2e

refactor: config optimization as controller flag with per-CR annotati…

6dea6aa

…on opt-out

fix(config-optimization): dedupe rewired inputs, guard group hash col…

a5ab772

…lisions

aa1ex assigned dkhachyan, zvlb and aa1ex Jun 12, 2026

aa1ex merged commit 5f863d5 into kaasops:main Jun 12, 2026
5 checks passed

aa1ex deleted the feat/config-optimization-sources branch June 12, 2026 14:20

aa1ex mentioned this pull request Jun 15, 2026

Feat/checkpoint migration #224

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/config optimization sources#222

Feat/config optimization sources#222
aa1ex merged 5 commits into
kaasops:mainfrom
aa1ex:feat/config-optimization-sources

aa1ex commented Jun 12, 2026

Uh oh!

Uh oh!

sakateka commented Jun 14, 2026

Uh oh!

aa1ex commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

aa1ex commented Jun 12, 2026

Uh oh!

Uh oh!

sakateka commented Jun 14, 2026

Uh oh!

aa1ex commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants