Skip to content

feat(modules): HealthMonitor — scheduled probes with pluggable sinks (CORE-006)#2

Open
typelicious wants to merge 2 commits into
masterfrom
feat/health-monitor-module
Open

feat(modules): HealthMonitor — scheduled probes with pluggable sinks (CORE-006)#2
typelicious wants to merge 2 commits into
masterfrom
feat/health-monitor-module

Conversation

@typelicious
Copy link
Copy Markdown
Contributor

Adds a generic, pure-config HealthMonitor module to the OSS template. Org layovers declare endpoints + sinks in YAML; no per-org handler code.

Why

Some layovers (e.g. capacium-ops .github/workflows/health-monitor.yml) appended health-check lines to a tracked file inside the source repo and pushed them back every 3 hours, polluting main with auto-commits. This module provides the same observability via proper sinks — stdout, file (ephemeral CI path, not the repo), webhook, or a single labeled GitHub Issue on failure — without ever writing to the source repo.

What

  • src/ops_engine/config_loader.py — HealthCheck, HealthSink, HealthMonitorConfig (Pydantic).
  • src/ops_engine/modules/health_monitor.py — HealthMonitor.run(config) + CLI (python -m ops_engine.modules.health_monitor --config <yml>). Default User-Agent so CF-fronted endpoints don't 403 python-urllib. Exit 1 on failure.
  • README: new entry in the modules table.

Org-layover usage

MyOrg:
  health_monitor:
    enabled: true
    checks:
      - { name: api,  url: https://api.example.com/health, expect_status: 200 }
      - { name: web,  url: https://example.com }
    sinks:
      - { type: stdout }
      - { type: github_issue, issue_label: health-alert, only_on_failure: true }

Tested

Live smoke against docs.capacium.xyz + api.capacium.xyz/health (both CF-fronted, both 200 with the UA fix) + a synthetic 404 path (correctly flagged failed).

Follow-up

capacium-ops will switch its workflow to call this module and drop the auto-commit step.

…(CORE-006)

Replaces the anti-pattern of committing health-check log files back to the source
repo. Pure-infrastructure module, config-driven (Pydantic), no business logic.

Module: src/ops_engine/modules/health_monitor.py
  • HealthMonitor.run(config) probes each endpoint and fans out to sinks.
  • Sinks: stdout (default), file (ephemeral CI path, not the repo), webhook,
    github_issue (creates/updates a single labeled issue on failure).
  • Probe sends a sensible default User-Agent so CF/WAF-fronted endpoints don't
    403 python-urllib's default.
  • CLI: python -m ops_engine.modules.health_monitor --config <org-layover.yml>
  • Exit code 1 on any failed check (fail_run_on_error, default true) so the
    runner reflects health in its CI status.

Config: HealthCheck + HealthSink + HealthMonitorConfig in config_loader.py.
Org-only by design (not repo-level) — declare in your org-layover config.yml.

Usage example in the module docstring; see capacium-ops for a live config.
@dev-bot-langevc dev-bot-langevc added enhancement New feature or request needs-triage labels May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request needs-triage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants