[AMD] Add DeepSeek-R1-0528 FP4 MI355X ATOM MTP3 benchmark, TP4 by seungrokj · Pull Request #1629 · SemiAnalysisAI/InferenceX

seungrokj · 2026-05-31T05:33:49Z

Summary

Update dsr1-fp4-mi355x-atom-mtp image: rocm/atom:rocm7.2.3_..._atom20260511 → rocm/atom-dev:nightly_202605301523
Update model: amd/DeepSeek-R1-0528-MXFP4 → amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4
Switch to TP=4 only (drop TP=8 search-space rows); enable previously-commented isl=8192 TP=4 sweep
Source: ATOM upstream benchmark run #26690241645 (all jobs passed)

Performance vs current InferenceX numbers (DeepSeek-R1-0528 fp4, spec_method=mtp, TP=4, 4×MI355X)

isl=1024, osl=1024

Conc	InferenceX tput/GPU	ATOM tput/GPU	Delta
4	285.71	313.73	+9.8%
8	443.18	516.84	+16.6%
16	714.89	755.43	+5.7%
32	1059.71	1167.40	+10.2%
64	1560.14	1747.02	+12.0%
128	2374.74	2359.81	-0.6%
256	3174.40	3112.98	-1.9%

isl=8192, osl=1024 (new — previously disabled in config)

Conc	ATOM tput/GPU
4	1210.53
8	1769.24
16	2503.06
32	3285.07
64	4294.76
128	4962.98
256	5406.06

Test plan

Verify server starts with rocm/atom-dev:nightly_202605301523 and model amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4
Confirm TP=4 benchmark numbers match expected tput/GPU values above

🤖 Generated with Claude Code

Note

Low Risk
Benchmark and CI config/script changes only; no production app logic, auth, or data-path changes.

Overview
Updates the DeepSeek-R1 FP4 MI355X ATOM MTP benchmark to a newer ATOM image (rocm7.2.4 / atom0.1.3) and model amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4.

dsr1-fp4-mi355x-atom-mtp in amd-master.yaml: drops TP=8 sweeps, keeps TP=4 with MTP for 1k/1k and 8k/1k, and extends concurrency to 1024 (including the previously disabled isl=8192 row).

dsr1_fp4_mi355x_atom_mtp.sh: builds server args via PARALLEL_ARGS (optional DP attention + expert parallel) and SPEC_ARGS (--method mtp, 3 speculative tokens); drops the old ISL-based --max-model-len logic except in eval mode.

Documents the change and expected throughput deltas in perf-changelog.yaml.

^{Reviewed by Cursor Bugbot for commit 14da3ee. Bugbot is set up for automated code reviews on this repo. Configure here.}

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-05-31T05:33:57Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-05-31T05:39:30Z

+- config-keys:
+    - dsr1-fp4-mi355x-atom-mtp
+  description:
+    - "Update ATOM image from rocm7.2.3_..._atom20260511 to nightly_202605301523"
+    - "Update model from amd/DeepSeek-R1-0528-MXFP4 to amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4"
+    - "Switch to TP=4 only (drop TP=8 search-space); enable isl=8192 TP=4 sweep"
+    - "isl=1024/osl=1024 TP=4: +5.7% to +16.6% improvement at conc 4-64 vs prior InferenceX numbers"
+    - "isl=1024/osl=1024 TP=4: -0.6% to -1.9% at high concurrency (conc 128-256)"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER


🔴 The new dsr1-fp4-mi355x-atom-mtp perf-changelog entry (lines 3333-3341) is inserted before the existing PR #1616 kimik2.5-fp4-mi355x-vllm block (lines 3343-3347) rather than appended to the end of the file. AGENTS.md §1.1 explicitly requires: "perf-changelog.yaml ... is read chronologically (oldest at top, newest at bottom) - new entries MUST be appended to the END, never inserted in the middle or prepended." Please move the new block to after line 3347 so it becomes the new tail.

Extended reasoning...

What's wrong

The PR adds a new perf-changelog entry for dsr1-fp4-mi355x-atom-mtp but places it in the middle of the file (lines 3333-3341), immediately after the PR #1624 dsv4-fp4-mi355x-vllm entry and before the existing PR #1616 kimik2.5-fp4-mi355x-vllm entry (lines 3343-3347). After this PR merges, the file's last entry is still PR #1616's block, not the new PR #1629 block.

Why this violates the documented convention

AGENTS.md is explicit on two points:

Line 54 — "perf-changelog.yaml triggers which configs to benchmark and is read chronologically (oldest at top, newest at bottom) - new entries MUST be appended to the END, never inserted in the middle or prepended."

Line 144 — "Never delete or modify whitespace in perf-changelog.yaml - CI depends on exact whitespace."

The reason PR #1616 currently sits at the end of the file (after #1624) is not a numbering accident — git log shows commit f9aafa9 (PR #1616, kimik2.5-fp4-mi355x-vllm image bump) was merged after c353742 (PR #1624, dsv4-fp4-mi355x-vllm image bump). So the existing tail correctly reflects merge order: …, #1602, #1624, #1616. This PR (#1629) is being merged after #1616, so its new block belongs after #1616, not before it.

Step-by-step proof

Run wc -l perf-changelog.yaml on the PR branch → 3347 lines total.

sed -n '3343,3347p' → the kimik2.5-fp4-mi355x-vllm block referencing pr-link: .../pull/1616. This is the tail.

sed -n '3333,3341p' → the new dsr1-fp4-mi355x-atom-mtp block referencing pr-link: .../pull/PLACEHOLDER. This was inserted between the PR [Klaud Cold] Update dsv4-fp4-mi355x-vllm vLLM ROCm image to v0.22.0 #1624 block (ending line 3331) and the PR [Klaud Cold] Update kimik2.5-fp4-mi355x-vllm vLLM ROCm image to v0.22.0 #1616 block (starting line 3343).

git log --oneline shows f9aafa9 (PR [Klaud Cold] Update kimik2.5-fp4-mi355x-vllm vLLM ROCm image to v0.22.0 #1616) was committed after c353742 (PR [Klaud Cold] Update dsv4-fp4-mi355x-vllm vLLM ROCm image to v0.22.0 #1624) and before this PR's commit 4af1764. So before this PR, the file ended with PR [Klaud Cold] Update kimik2.5-fp4-mi355x-vllm vLLM ROCm image to v0.22.0 #1616, and the new entry should have been appended after line 3347.

Impact

This violates an explicit MUST rule documented in AGENTS.md. Per the agent doc the file is intended to be read chronologically and AGENTS.md §1.1 also calls out that the prescribed merge-conflict resolution for this file is specifically to re-append at the tail — which presumes append-only ordering. While utils/process_changelog.py reads added lines from the git diff (not the whole file in order), so the CI benchmark trigger logic won't be broken by this misplacement, it still corrupts the chronological invariant that the convention exists to maintain and that future tooling / conflict-resolution may rely on.

Fix

Move the 9-line new block (currently lines 3333-3341) plus its leading blank line so it sits after the existing PR #1616 block — i.e. cut the block from its current position and paste it at the end of the file, separated by one blank line, so the file's final entry is the new dsr1-fp4-mi355x-atom-mtp block. No other changes needed; the block contents are fine as-is.

claude · 2026-05-31T05:39:30Z

+    - "Switch to TP=4 only (drop TP=8 search-space); enable isl=8192 TP=4 sweep"
+    - "isl=1024/osl=1024 TP=4: +5.7% to +16.6% improvement at conc 4-64 vs prior InferenceX numbers"
+    - "isl=1024/osl=1024 TP=4: -0.6% to -1.9% at high concurrency (conc 128-256)"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER


🔴 The new perf-changelog entry at line 3341 has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER — the literal token PLACEHOLDER was never substituted with the actual PR number. Every other entry in the file uses a real numeric PR id (the surrounding entries reference 1598, 1602, 1607, 1616, 1624), so after merge this URL will 404 and break changelog traceability. Replace PLACEHOLDER with 1629 before merging.

Extended reasoning...

What the bug is\n\nThe diff adds a new perf-changelog.yaml entry for dsr1-fp4-mi355x-atom-mtp whose pr-link is the literal string:\n\nyaml\npr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER\n\n\nThe token PLACEHOLDER was never substituted with this PR's actual number (1629). Once merged, the link in the changelog will resolve to https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER, which returns a 404 — the changelog entry loses traceability back to the PR that introduced the change.\n\n### How the bug manifests / Step-by-step proof\n\n1. Open perf-changelog.yaml at line 3341 on this PR's branch. The line reads exactly:\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER\n2. Compare to the immediately neighboring entries in the same file: pull/1598, pull/1602, pull/1607, pull/1616, pull/1624 — every other entry in the surrounding context uses an actual numeric PR id.\n3. After this PR merges to main, anyone clicking that link in the rendered changelog will hit /pull/PLACEHOLDER, which is not a valid PR slug on GitHub and returns 404.\n\n### Why existing tooling does not catch this\n\nThe repo has a canonical placeholder marker for unknown PR numbers — XXX — documented in AGENTS.md ("Use https://github.com/SemiAnalysisAI/InferenceX/pull/XXX ... replace XXX with the real PR number before merge") and .github/workflows/claude.yml ("Use XXX as the PR-link placeholder until the PR exists"). The merge-conflict helper merge_with_reuse.sh only auto-rewrites pr-links that contain XXX or already end with the real /pull/{pr} suffix. The literal token PLACEHOLDER matches neither convention, so the helper will silently skip this entry and the broken URL will land on main as-is on a clean merge.\n\n### Impact\n\nNon-functional but user-visible: the changelog is the canonical record of which PR introduced each perf-config change. A broken link here:\n- Breaks the round-trip from changelog entry → PR (cannot review the original context).\n- Pollutes main with a known-bad URL that future tooling and humans will have to clean up.\n- Will not be caught by the existing XXX-rewrite path in merge_with_reuse.sh.\n\n### Fix\n\nReplace PLACEHOLDER with 1629 on line 3341:\n\nyaml\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1629\n\n\nAlternatively, change PLACEHOLDER to the canonical XXX so future automation (e.g. merge_with_reuse.sh) can substitute it — but a one-line manual fix before merge is strictly simpler.

…attn support - Switch image from nightly_202605301523 to stable atom0.1.3 - Expand concurrency search space from 4-256 to 4-1024 - Refactor benchmark script to use PARALLEL_ARGS/SPEC_ARGS pattern - Remove ISL/OSL-based max-model-len calculation - Remove WIP comment from config - Update perf-changelog image reference Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 14da3ee. Configure here.}

cursor · 2026-05-31T05:42:54Z

-if [ "$ISL" = "1024" ] && [ "$OSL" = "1024" ]; then
-    CALCULATED_MAX_MODEL_LEN=""
-else
-    CALCULATED_MAX_MODEL_LEN=" --max-model-len 10240 "


Missing max-model-len for 8192

High Severity

The script always leaves CALCULATED_MAX_MODEL_LEN empty for normal throughput runs, but this commit also turns on the isl=8192 / osl=1024 scenario in amd-master.yaml. The previous ISL/OSL branch passed --max-model-len 10240 for that case; sibling ATOM scripts still do. Without a raised limit, the ATOM server may reject or truncate 8192-token prompts.

^{Reviewed by Cursor Bugbot for commit 14da3ee. Configure here.}

feat(atom/dsr1-fp4-mi355x-mtp): update image + model, switch to TP=4

4af1764

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

seungrokj requested a review from a team May 31, 2026 05:33

seungrokj requested review from 1am9trash, billishyahao, chunfangamd and yctseng0211 as code owners May 31, 2026 05:33

github-project-automation Bot added this to InferenceMAX Board May 31, 2026

chore: update perf-changelog pr-link to #1629

e705ffc

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

seungrokj changed the title ~~feat(atom): update dsr1-fp4-mi355x-atom-mtp image to nightly_202605301523, model to MTP-MoEFP4, TP=4~~ [AMD] Add DeepSeek-R1-0528 FP4 MI355X ATOM MTP3 benchmark, TP4 May 31, 2026

claude Bot reviewed May 31, 2026

View reviewed changes

functionstackx added the full-sweep-enabled label May 31, 2026

cursor Bot reviewed May 31, 2026

View reviewed changes

seungrokj added the AMD label May 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Add DeepSeek-R1-0528 FP4 MI355X ATOM MTP3 benchmark, TP4#1629

[AMD] Add DeepSeek-R1-0528 FP4 MI355X ATOM MTP3 benchmark, TP4#1629
seungrokj wants to merge 3 commits into
mainfrom
atom/dsr1-fp4-mi355x-atom-mtp-nightly-20260530

seungrokj commented May 31, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

claude Bot May 31, 2026

Uh oh!

claude Bot May 31, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

seungrokj commented May 31, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance vs current InferenceX numbers (DeepSeek-R1-0528 fp4, spec_method=mtp, TP=4, 4×MI355X)

isl=1024, osl=1024

isl=8192, osl=1024 (new — previously disabled in config)

Test plan

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

claude Bot May 31, 2026

Choose a reason for hiding this comment

What's wrong

Why this violates the documented convention

Step-by-step proof

Impact

Fix

Uh oh!

claude Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 31, 2026

Choose a reason for hiding this comment

Missing max-model-len for 8192

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

seungrokj commented May 31, 2026 •

edited by cursor Bot

Loading