feat: port operational hardening (health probes, metrics, logging, batching, GPU monitoring) by davidamacey · Pull Request #3 · davidamacey/OpenProcessor

davidamacey · 2026-07-04T04:08:37Z

Ports production-proven hardening back to the public main branch:

Health: /live (pure liveness) + /ready (real Triton is_server_live() gRPC probe + OpenSearch HTTP probe, 2s bound, 503 with per-service detail). /health stays as a back-compat alias of /ready; the container HEALTHCHECK moves to /live so a degraded dependency can't cascade through depends_on: service_healthy.
Metrics: http_request_duration_seconds histogram (method/route/status) + GET /metrics, with optional PROMETHEUS_MULTIPROC_DIR aggregation; Prometheus scrape job included.
Logging: request-id contextvar moved to src.core.logging (importable by out-of-process workers), merge_contextvars wired, and a real foreign_pre_chain formatter bug fixed (stdlib records crashed with tuple item-deletion).
Clients: gRPC message caps 100→512 MB (large raw detector heads); cluster_distance sort tolerates missing fields/mappings.
Triton tuning: 25 ms batch queue delay + 3 YOLO instances (fires batch 8-16 under ingest instead of batch≈1).
Monitoring: dcgm-exporter (all GPUs) + a GPU Metrics Grafana dashboard.
Guardrails: max-file-size pre-commit ratchet (700 LOC, existing oversize modules grandfathered).
Tests: pytest scaffolding + integration tests for health, metrics, request-id, and prometheus scrape topology (14 tests, no live stack required).

New scripts/codegen/check_file_size.py enforces a per-file line ceiling over src/ and scripts/ so module splits don't silently regress into monoliths. Existing oversize modules are grandfathered in the hook's exclude list until they are split.

…reign_pre_chain formatter bug - request_id_ctx / get_request_id now live in src.core.logging (with new bind_request_id / clear_request_id helpers) so service modules and out-of-process workers can import them without pulling in the FastAPI app; src.main re-exports for backward compatibility. - merge_contextvars added first in the processor chain so request_id auto-attaches to every structlog event on the task. - ProcessorFormatter.wrap_for_formatter removed from foreign_pre_chain: it wraps the event dict in a tuple and is only valid as the LAST structlog-native step; in the pre-chain it crashed stdlib log records (uvicorn, opensearch-py) with 'tuple' object does not support item deletion.

- /live: pure process liveness — never gated on dependencies. The Dockerfile HEALTHCHECK now targets it so a degraded downstream dep cannot mark the container unhealthy and cascade through depends_on: service_healthy. - /ready: probes Triton via a real is_server_live() gRPC round-trip (pool active_connections is 0 until the first infer and deadlocks dependent containers at startup) and OpenSearch via HTTP, each bounded to 2s; 503 with per-service detail when any dep is down. - /health: kept as a backward-compat alias for /ready.

- src/core/metrics.py: prometheus_client Histogram labeled by method, route template, and status; render_metrics() supports the optional PROMETHEUS_MULTIPROC_DIR aggregation mode for multi-worker uvicorn. - http_duration_middleware records every request using the matched route template, with a low-cardinality path fallback for 404s. - GET /metrics exposition endpoint + prometheus scrape job for yolo-api.

…nce sort - Triton gRPC send/receive caps 100MB -> 512MB: raw detector heads can emit hundreds of MB per max-batch response; 100MB rejected legitimate full-batch replies. - cluster_distance sort gains missing:_last + unmapped_type:double so paging a cluster tolerates docs without the field and freshly created indices without the mapping (previously a 400 shard failure).

Under sustained ingest the per-image arrival rate is too slow to fill preferred-size batches within 5ms, so Triton fired near batch=1. 25ms reaches batch 8-16 within latency budget; a third YOLO instance (~1.5 GB) pipelines concurrent ingest batches through more GPU streams.

dcgm-exporter publishes per-card utilization, VRAM, power, and temperature for every host GPU (read-only, no compute reservation); prometheus scrapes it and the new 'GPU Metrics' Grafana dashboard renders GPU + host panels.

…uest-id, scrape config - conftest.py excludes standalone live-deployment scripts from pytest collection (they define test_*(name,...) helpers pytest miscollects). - Health probe tests run against the router with probes monkeypatched — no live Triton/OpenSearch needed. - Metrics + request-id tests exercise the real app middleware wiring. - Scrape-config test pins the prometheus job topology and validates every static target against docker-compose services. - Synthetic fixture image (generated shapes, 640x480).

davidamacey added 8 commits July 3, 2026 23:35

davidamacey mentioned this pull request Jul 4, 2026

feat: Triton 26.06 + TensorRT 11 upgrade and CVE remediation #4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: port operational hardening (health probes, metrics, logging, batching, GPU monitoring)#3

feat: port operational hardening (health probes, metrics, logging, batching, GPU monitoring)#3
davidamacey wants to merge 8 commits into
mainfrom
feat/port-operational-hardening

davidamacey commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davidamacey commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant