Skip to content

[addon-operator] add queue head info metric and critical flag to module info#771

Draft
diyliv wants to merge 1 commit into
mainfrom
feature/queue-head-info-metric
Draft

[addon-operator] add queue head info metric and critical flag to module info#771
diyliv wants to merge 1 commit into
mainfrom
feature/queue-head-info-metric

Conversation

@diyliv
Copy link
Copy Markdown
Contributor

@diyliv diyliv commented Jun 2, 2026

What this PR does

Adds two metrics that let us replace the flat D8DeckhouseQueueIsHung alert with severity-differentiated alerts.

New metric: tasks_queue_head_info

A gauge (value=1) with labels queue, module, task_type, hook. Published every 5 seconds for each non-empty queue. Old series are expired when the head changes -> no phantom metrics remain.

Label cleanup:

  • ParallelModuleRun synthetic names like "Parallel run for a, b, c" -> normalized to empty string (would otherwise produce a bad join with deckhouse_mm_module_info)
  • Global tasks (ConvergeModules, GlobalHookRun, DiscoverHelmReleases, ApplyKubeConfigValues) -> module is empty, which is correct since these are not module-specific

New label: critical on deckhouse_mm_module_info

Value "true" or "false" from BasicModule.GetCritical() (the critical: true property in module.yaml). Added additively -> existing queries are unaffected.

Why it's needed

The old D8DeckhouseQueueIsHung alert had two problems:

  • No way to see what's stuck -> only the queue name was visible, not the module, task type, or hook
  • Same severity for everything -> all hung queues alerted at severity 7 regardless of how critical the module was

With these two metrics, we can create three separate alerts:

Alert Severity Triggers for
D8DeckhouseQueueIsHungCritical 4 critical="true" modules
D8DeckhouseQueueIsHung 6 critical="false" modules
D8DeckhouseQueueIsHungGlobal 4 global tasks (module="")

@diyliv diyliv self-assigned this Jun 2, 2026
@diyliv diyliv marked this pull request as draft June 2, 2026 17:03
@diyliv diyliv changed the title add queue head info metric and critical flag to module info [addon-operator] add queue head info metric and critical flag to module info Jun 2, 2026
@diyliv diyliv force-pushed the feature/queue-head-info-metric branch from 2b91642 to 14ed834 Compare June 2, 2026 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant