feat: port operational hardening (health probes, metrics, logging, batching, GPU monitoring)#3
Open
davidamacey wants to merge 8 commits into
Open
feat: port operational hardening (health probes, metrics, logging, batching, GPU monitoring)#3davidamacey wants to merge 8 commits into
davidamacey wants to merge 8 commits into
Conversation
New scripts/codegen/check_file_size.py enforces a per-file line ceiling over src/ and scripts/ so module splits don't silently regress into monoliths. Existing oversize modules are grandfathered in the hook's exclude list until they are split.
…reign_pre_chain formatter bug - request_id_ctx / get_request_id now live in src.core.logging (with new bind_request_id / clear_request_id helpers) so service modules and out-of-process workers can import them without pulling in the FastAPI app; src.main re-exports for backward compatibility. - merge_contextvars added first in the processor chain so request_id auto-attaches to every structlog event on the task. - ProcessorFormatter.wrap_for_formatter removed from foreign_pre_chain: it wraps the event dict in a tuple and is only valid as the LAST structlog-native step; in the pre-chain it crashed stdlib log records (uvicorn, opensearch-py) with 'tuple' object does not support item deletion.
- /live: pure process liveness — never gated on dependencies. The Dockerfile HEALTHCHECK now targets it so a degraded downstream dep cannot mark the container unhealthy and cascade through depends_on: service_healthy. - /ready: probes Triton via a real is_server_live() gRPC round-trip (pool active_connections is 0 until the first infer and deadlocks dependent containers at startup) and OpenSearch via HTTP, each bounded to 2s; 503 with per-service detail when any dep is down. - /health: kept as a backward-compat alias for /ready.
- src/core/metrics.py: prometheus_client Histogram labeled by method, route template, and status; render_metrics() supports the optional PROMETHEUS_MULTIPROC_DIR aggregation mode for multi-worker uvicorn. - http_duration_middleware records every request using the matched route template, with a low-cardinality path fallback for 404s. - GET /metrics exposition endpoint + prometheus scrape job for yolo-api.
…nce sort - Triton gRPC send/receive caps 100MB -> 512MB: raw detector heads can emit hundreds of MB per max-batch response; 100MB rejected legitimate full-batch replies. - cluster_distance sort gains missing:_last + unmapped_type:double so paging a cluster tolerates docs without the field and freshly created indices without the mapping (previously a 400 shard failure).
Under sustained ingest the per-image arrival rate is too slow to fill preferred-size batches within 5ms, so Triton fired near batch=1. 25ms reaches batch 8-16 within latency budget; a third YOLO instance (~1.5 GB) pipelines concurrent ingest batches through more GPU streams.
dcgm-exporter publishes per-card utilization, VRAM, power, and temperature for every host GPU (read-only, no compute reservation); prometheus scrapes it and the new 'GPU Metrics' Grafana dashboard renders GPU + host panels.
…uest-id, scrape config - conftest.py excludes standalone live-deployment scripts from pytest collection (they define test_*(name,...) helpers pytest miscollects). - Health probe tests run against the router with probes monkeypatched — no live Triton/OpenSearch needed. - Metrics + request-id tests exercise the real app middleware wiring. - Scrape-config test pins the prometheus job topology and validates every static target against docker-compose services. - Synthetic fixture image (generated shapes, 640x480).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ports production-proven hardening back to the public main branch:
/live(pure liveness) +/ready(real Tritonis_server_live()gRPC probe + OpenSearch HTTP probe, 2s bound, 503 with per-service detail)./healthstays as a back-compat alias of/ready; the container HEALTHCHECK moves to/liveso a degraded dependency can't cascade throughdepends_on: service_healthy.http_request_duration_secondshistogram (method/route/status) +GET /metrics, with optionalPROMETHEUS_MULTIPROC_DIRaggregation; Prometheus scrape job included.src.core.logging(importable by out-of-process workers),merge_contextvarswired, and a realforeign_pre_chainformatter bug fixed (stdlib records crashed with tuple item-deletion).cluster_distancesort tolerates missing fields/mappings.