fix: resolve ExecuTorch TRT target_device per partition (coalesced multi-engine) by shoumikhin · Pull Request #4350 · pytorch/TensorRT

shoumikhin · 2026-06-18T04:35:01Z

Stacked on #4349. The first commit here is from #4349 (the engine-constant lookup fix); this PR adds the per-partition target_device commit on top. Please review the second commit (fix: resolve ExecuTorch TRT target_device per partition) — once #4349 merges, this diff reduces to just that change.

What's broken

When coalescing TensorRT with other delegates into one .pte, the graph has multiple TensorRT engines. TensorRTPartitioner resolved target_device once for the whole program via _get_engine_info_from_edge_program(), which requires exactly one engine node. With more than one engine it raised, so every TensorRT partition fell back to cuda:0 with a spurious warning:

Could not derive target_device from the TensorRT engine (... expects exactly 1
engine node per partition, found 2); falling back to cuda:0.

On a single GPU this is just noise, but a multi-GPU graph cannot label each delegate with its own device.

The fix

Extract _get_engine_info_for_node() (single-node engine-info extraction) out of _get_engine_info_from_edge_program() — the latter keeps its one-engine contract used by preprocess() — and resolve target_device per partition from that partition's own engine node.

Single-GPU behavior is unchanged (still cuda:0), minus the spurious warning.
Multi-engine / multi-GPU graphs now get a correct per-delegate device label.

Test

Verified on a coalesced TensorRT -> Another -> TensorRT model (two TRT engines plus one another delegate): the .pte still contains both TensorRTBackend and AnotherBackend, and the "found 2 engines" warning no longer fires.

Saving a partially-TRT-compiled program to ExecuTorch (output_format="executorch") via the modern torch.export path (retrace=True) aborts with: RuntimeError: execute_engine node 'execute_engine': placeholder engine 'obj__run_on_acc_0_engine' not found in exp_program.constants even though the engine is present. torch.export lifts the TRT engine ScriptObject as a custom-object constant keyed by its graph-signature FQN (InputSpec.target) and renames the placeholder node (an obj_ prefix), so the existing constants[node.name] / constants[node.target] lookup misses. The legacy exporter (retrace=False) only worked by accident: it kept the placeholder name equal to the constants key. Resolve the placeholder via the canonical ExportGraphSignature.inputs_to_lifted_custom_objs mapping, falling back to the direct lookup only for legacy programs that lack it, and unwrap a FakeScriptObject to its real object. A shared helper in dynamo/_exporter.py is used by both the save serializer (_compile.py) and the backend engine-info extractor (executorch/backend.py), which carried the same latent lookup. Adds CPU-only unit tests for the resolver (no GPU/executorch required). This unblocks coalescing TensorRT + CUDA delegates into one .pte via the modern exporter.

TensorRTPartitioner resolved target_device once for the whole exported program via _get_engine_info_from_edge_program(), which requires exactly one engine node. A coalesced graph (TensorRT + CUDA delegates) has multiple TRT engines, so that call raised and every TRT partition fell back to cuda:0 with a spurious "expects exactly 1 engine node per partition, found N" warning; multi-GPU graphs also could not be labeled per partition. Extract _get_engine_info_for_node() (single-node engine-info extraction) from _get_engine_info_from_edge_program() and resolve target_device per partition from that partition's own engine node. Single-GPU behavior is unchanged (still cuda:0) minus the warning; multi-engine/multi-GPU graphs now label each delegate correctly.

meta-cla Bot added the cla signed label Jun 18, 2026

github-actions Bot added component: tests Issues re: Tests component: core Issues re: The core compiler component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Jun 18, 2026

github-actions Bot requested a review from lanluo-nvidia June 18, 2026 04:35

shoumikhin force-pushed the fix/executorch-per-partition-device branch from 4323b2f to 380e295 Compare June 18, 2026 04:41

shoumikhin force-pushed the fix/executorch-per-partition-device branch from 380e295 to 6971383 Compare June 18, 2026 05:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve ExecuTorch TRT target_device per partition (coalesced multi-engine)#4350

fix: resolve ExecuTorch TRT target_device per partition (coalesced multi-engine)#4350
shoumikhin wants to merge 2 commits into
pytorch:mainfrom
shoumikhin:fix/executorch-per-partition-device

shoumikhin commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shoumikhin commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's broken

The fix

Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shoumikhin commented Jun 18, 2026 •

edited

Loading