feat: YOLO26 + YOLO11 dual serving with per-family export toolchains by davidamacey · Pull Request #5 · davidamacey/OpenProcessor

davidamacey · 2026-07-04T04:30:13Z

Stacked on #4 (auto-retargets when it merges).

Serves YOLO11 and YOLO26 side by side in the same Triton + API instance:

Dual export toolchains in one image: the proven YOLO11 EfficientNMS end2end path keeps its exact production pin (ultralytics==8.3.253) in an isolated /opt/venv-y11 (CPU torch, ~5 GB saved); export_models.py transparently re-execs into it, so documented CLI commands are unchanged. The main env moves to ultralytics 8.4.x and gains export/export_yolo26.py — native NMS-free fused export (single (batch,300,6) tensor, no plugin), config.pbtxt written from the introspected ONNX output.
Signature-based adapter registry (src/clients/model_adapters.py): the output contract is resolved from Triton model metadata, not names — EfficientNMS 4-tensor and fused single-tensor engines both parse to the same normalized result. Hardcoded model literals removed from triton_client.py.
YOLO_MODEL env selects the default detector (default unchanged: yolov11_small_trt_end2end); /detect?model_name= selects per request; /models/{name}/load|unload add/remove either family at runtime.
/models upload+export API auto-routes end2end (YOLO26) .pt uploads to the native toolchain (detected via the model's end2end attribute).
Ships models/yolo26_small_trt/ repo entry; README + migration docs; 6 adapter behavior tests (20 integration tests total pass).

- Serving/main env moves to ultralytics>=8.4.82,<8.5 (YOLO26 support: native NMS-free export + 8.4-era .pt deserialization). - The proven YOLO11 EfficientNMS export path keeps its exact production pin (ultralytics==8.3.253) in an isolated /opt/venv-y11 built from requirements-export-y11.txt (CPU torch — engine builds use the tensorrt wheels, not torch, saving ~5 GB). - export_models.py gains a toolchain guard: launched under >=8.4 it transparently re-execs into /opt/venv-y11, so the documented CLI keeps working unchanged; without the venv it fails with a clear message. - Both toolchains compile engines against tensorrt-cu13==11.0.0.114 (Triton 26.06 match). EfficientNMS_TRT verified present in the 26.06 image's libnvinfer_plugin.so.11.0.0.

…rving - export/export_yolo26.py: fused single-tensor (batch,300,6) export via stock ultralytics >=8.4 (no plugin, no patch), TRT-11 engine build with dynamic batch profile, config.pbtxt written from the introspected ONNX output, labels from model.names, --custom-model support. - src/clients/model_adapters.py: detection adapters resolved from Triton model METADATA (not names) — End2EndNMSAdapter (num_dets/det_boxes/ det_scores/det_classes) and FusedDetAdapter ((...,6) single tensor) — so YOLO11 end2end and YOLO26 engines serve side by side through the same endpoints. - triton_client: all YOLO paths go through the adapter registry; the two hardcoded 'yolov11_small_trt_end2end' literals are gone; batch path accepts model_name. - YOLO_MODEL env var selects the default detector; /detect model_name param selects per request. - /models upload+export API routes end2end (YOLO26) uploads to the native toolchain automatically (validated via the model's end2end attribute) and auto-loads the fused engine. - Ships models/yolo26_small_trt/ repo entry (config + labels).

- README + migration guide: YOLO26 export/load/serve alongside YOLO11, YOLO_MODEL default switch, runtime load/unload. - tests/integration/test_model_adapters.py: signature resolution (both contracts + reject), end2end truncation, fused zero-score padding drop (6 tests).

…guards TRT 11 removed BuilderFlag.FP16, Builder.platform_has_fast_fp16, and trtexec --fp16 (strongly-typed builds only; precision follows ONNX dtypes): - trt_utils.enable_fp16(): guarded no-op on TRT 11 (typed FP32 engines run with TF32 tensor cores on Ampere+); all six legacy export scripts route through it instead of touching the removed flag. - YOLO26: FP16 baked into the ONNX via NVIDIA ModelOpt AutoCast (ultralytics helper), then built with an explicit profile that bounds EVERY dynamic axis — the export leaves H/W dynamic and an unbounded spatial axis makes TRT budget 12+ GB activation tactics that fail on consumer GPUs. yolo26_small capped at batch 32 for 12 GB cards. - nvidia-modelopt[onnx] added to requirements (onnx bound raised to <1.22 to match); venv-y11 keeps its own tighter pins. - paddleocr_rec exporter: trtexec path fixed for 26.06 (/usr/bin), container name env-overridable (TRITON_CONTAINER), --fp16 dropped. Verified on GPU 1: all 8 engines build and save under TRT 11.0.0.114 (yolo11 end2end 41 MB, yolo26 FP16 23 MB, scrfd, arcface, mobileclip x2, paddleocr det+rec).

Verified end-to-end on live hardware (all 8 engines READY, 20/20 endpoint tests, dual-family detect, dynamic load/unload): - FP16 baking per model family: EfficientNMS graphs use the onnxconverter-common rewrite (plugin op block-listed); YOLO26 and the CLIP image encoder use ModelOpt AutoCast via the ultralytics wrapper (needs ORT-executable graphs + calibration shapes); CLIP text encoder stays typed FP32 (token-id input); paddleocr rec baked + built via trtexec (container name now env-overridable via TRITON_CONTAINER). - apt-mark hold on TensorRT packages in Dockerfile.triton: NVIDIA's apt repo offers TRT 11.1 over the image's 11.0.0.114 and a silent upgrade invalidates every client-built engine. - yolov11 end2end config: det_boxes/det_scores are TYPE_FP16 (the EfficientNMS plugin emits at baked precision); exporter template aligned. - /detect confidence filter is now unconditional: NMS-free YOLO26 engines emit all top-K candidates (near-zero scores included) and rely on it; a no-op for end2end YOLO11 (0.25 baked at export). - Instance counts right-sized as a 12 GB-card baseline with scale-up comments (yolo 2, scrfd/arcface/clip 2, rec/bls 1). - test_endpoints.sh accepts the /health->/ready status contract.

…ans clean Trivy (fixable HIGH/CRITICAL, --ignore-unfixed): - API image: CLEAN - Triton image: was 13 HIGH — 11 Go-stdlib CVEs in the Nsight Systems profiler CLI (dev tooling, unused at inference; removed) and 2 in starlette 0.49.3 (upgraded >=1.3.1) — now CLEAN. Zero fixable HIGH/CRITICAL across both shipped images.

…ep assertion Policy change per review: engines are always re-exported from ONNX at deployment, so the image now takes the newest TensorRT from NVIDIA's apt repo instead of holding the NGC tag's stock version. A build-time assertion (TRT_VERSION arg) fails the image build the moment apt brings a different TRT than the client-side tensorrt-cu13 pip pins — the two can only ever move together, in one commit. - tensorrt-cu13==11.1.0.106 in requirements.txt, requirements-export-y11 and pyproject (onnx bound synced to <1.22). - All 8 engines re-exported and E2E-verified on TRT 11.1: models READY, YOLO11 + YOLO26 detection parity, 20/20 endpoint tests.

- tests/test_endpoints.sh gains a 'dual' target: dynamically load yolo26_small_trt, detect with both families side by side, unload and assert inference is refused, reload. Skips gracefully when the yolo26 engine hasn't been exported. Full suite: 25 passed, 0 failed, 0 skipped against the live 26.06/TRT-11.1 stack. - FP16 bakes now respect each exporter's precision flag; paddleocr_det defaults to FP32 — borderline text detection is threshold-sensitive to FP16 (verified: synthetic caution_sign detected at FP32, missed at FP16) and the engine is small. - OCR synthetic fixtures generated via scripts/create_ocr_test_images.py (test_images/ is gitignored; the suite skips when absent).

davidamacey added 8 commits July 4, 2026 00:15

davidamacey changed the base branch from feat/triton-2606-cve-refresh to main July 4, 2026 15:06

davidamacey merged commit 49111d6 into main Jul 4, 2026

davidamacey deleted the feat/yolo26-dual-serving branch July 4, 2026 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: YOLO26 + YOLO11 dual serving with per-family export toolchains#5

feat: YOLO26 + YOLO11 dual serving with per-family export toolchains#5
davidamacey merged 8 commits into
mainfrom
feat/yolo26-dual-serving

davidamacey commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davidamacey commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant