fix: resolve ExecuTorch TRT target_device per partition (coalesced multi-engine)#4350
Open
shoumikhin wants to merge 2 commits into
Open
fix: resolve ExecuTorch TRT target_device per partition (coalesced multi-engine)#4350shoumikhin wants to merge 2 commits into
shoumikhin wants to merge 2 commits into
Conversation
Saving a partially-TRT-compiled program to ExecuTorch
(output_format="executorch") via the modern torch.export path (retrace=True)
aborts with:
RuntimeError: execute_engine node 'execute_engine': placeholder engine
'obj__run_on_acc_0_engine' not found in exp_program.constants
even though the engine is present. torch.export lifts the TRT engine
ScriptObject as a custom-object constant keyed by its graph-signature FQN
(InputSpec.target) and renames the placeholder node (an obj_ prefix), so the
existing constants[node.name] / constants[node.target] lookup misses. The
legacy exporter (retrace=False) only worked by accident: it kept the
placeholder name equal to the constants key.
Resolve the placeholder via the canonical
ExportGraphSignature.inputs_to_lifted_custom_objs mapping, falling back to the
direct lookup only for legacy programs that lack it, and unwrap a
FakeScriptObject to its real object. A shared helper in dynamo/_exporter.py is
used by both the save serializer (_compile.py) and the backend engine-info
extractor (executorch/backend.py), which carried the same latent lookup.
Adds CPU-only unit tests for the resolver (no GPU/executorch required).
This unblocks coalescing TensorRT + CUDA delegates into one .pte via the
modern exporter.
4323b2f to
380e295
Compare
TensorRTPartitioner resolved target_device once for the whole exported program via _get_engine_info_from_edge_program(), which requires exactly one engine node. A coalesced graph (TensorRT + CUDA delegates) has multiple TRT engines, so that call raised and every TRT partition fell back to cuda:0 with a spurious "expects exactly 1 engine node per partition, found N" warning; multi-GPU graphs also could not be labeled per partition. Extract _get_engine_info_for_node() (single-node engine-info extraction) from _get_engine_info_from_edge_program() and resolve target_device per partition from that partition's own engine node. Single-GPU behavior is unchanged (still cuda:0) minus the warning; multi-engine/multi-GPU graphs now label each delegate correctly.
380e295 to
6971383
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's broken
When coalescing TensorRT with other delegates into one
.pte, the graph has multiple TensorRT engines.TensorRTPartitionerresolvedtarget_deviceonce for the whole program via_get_engine_info_from_edge_program(), which requires exactly one engine node. With more than one engine it raised, so every TensorRT partition fell back tocuda:0with a spurious warning:On a single GPU this is just noise, but a multi-GPU graph cannot label each delegate with its own device.
The fix
Extract
_get_engine_info_for_node()(single-node engine-info extraction) out of_get_engine_info_from_edge_program()— the latter keeps its one-engine contract used bypreprocess()— and resolvetarget_deviceper partition from that partition's own engine node.cuda:0), minus the spurious warning.Test
Verified on a coalesced TensorRT -> Another -> TensorRT model (two TRT engines plus one another delegate): the
.ptestill contains bothTensorRTBackendandAnotherBackend, and the "found 2 engines" warning no longer fires.