Skip to content

[AMD] Add DeepSeek-R1-0528 FP4 MI355X ATOM MTP3 benchmark, TP4#1629

Open
seungrokj wants to merge 3 commits into
mainfrom
atom/dsr1-fp4-mi355x-atom-mtp-nightly-20260530
Open

[AMD] Add DeepSeek-R1-0528 FP4 MI355X ATOM MTP3 benchmark, TP4#1629
seungrokj wants to merge 3 commits into
mainfrom
atom/dsr1-fp4-mi355x-atom-mtp-nightly-20260530

Conversation

@seungrokj
Copy link
Copy Markdown
Collaborator

@seungrokj seungrokj commented May 31, 2026

Summary

  • Update dsr1-fp4-mi355x-atom-mtp image: rocm/atom:rocm7.2.3_..._atom20260511rocm/atom-dev:nightly_202605301523
  • Update model: amd/DeepSeek-R1-0528-MXFP4amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4
  • Switch to TP=4 only (drop TP=8 search-space rows); enable previously-commented isl=8192 TP=4 sweep
  • Source: ATOM upstream benchmark run #26690241645 (all jobs passed)

Performance vs current InferenceX numbers (DeepSeek-R1-0528 fp4, spec_method=mtp, TP=4, 4×MI355X)

isl=1024, osl=1024

Conc InferenceX tput/GPU ATOM tput/GPU Delta
4 285.71 313.73 +9.8%
8 443.18 516.84 +16.6%
16 714.89 755.43 +5.7%
32 1059.71 1167.40 +10.2%
64 1560.14 1747.02 +12.0%
128 2374.74 2359.81 -0.6%
256 3174.40 3112.98 -1.9%

isl=8192, osl=1024 (new — previously disabled in config)

Conc ATOM tput/GPU
4 1210.53
8 1769.24
16 2503.06
32 3285.07
64 4294.76
128 4962.98
256 5406.06

Test plan

  • Verify server starts with rocm/atom-dev:nightly_202605301523 and model amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4
  • Confirm TP=4 benchmark numbers match expected tput/GPU values above

🤖 Generated with Claude Code


Note

Low Risk
Benchmark and CI config/script changes only; no production app logic, auth, or data-path changes.

Overview
Updates the DeepSeek-R1 FP4 MI355X ATOM MTP benchmark to a newer ATOM image (rocm7.2.4 / atom0.1.3) and model amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4.

dsr1-fp4-mi355x-atom-mtp in amd-master.yaml: drops TP=8 sweeps, keeps TP=4 with MTP for 1k/1k and 8k/1k, and extends concurrency to 1024 (including the previously disabled isl=8192 row).

dsr1_fp4_mi355x_atom_mtp.sh: builds server args via PARALLEL_ARGS (optional DP attention + expert parallel) and SPEC_ARGS (--method mtp, 3 speculative tokens); drops the old ISL-based --max-model-len logic except in eval mode.

Documents the change and expected throughput deltas in perf-changelog.yaml.

Reviewed by Cursor Bugbot for commit 14da3ee. Bugbot is set up for automated code reviews on this repo. Configure here.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@seungrokj seungrokj changed the title feat(atom): update dsr1-fp4-mi355x-atom-mtp image to nightly_202605301523, model to MTP-MoEFP4, TP=4 [AMD] Add DeepSeek-R1-0528 FP4 MI355X ATOM MTP3 benchmark, TP4 May 31, 2026
Comment thread perf-changelog.yaml Outdated
Comment on lines +3333 to +3341
- config-keys:
- dsr1-fp4-mi355x-atom-mtp
description:
- "Update ATOM image from rocm7.2.3_..._atom20260511 to nightly_202605301523"
- "Update model from amd/DeepSeek-R1-0528-MXFP4 to amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4"
- "Switch to TP=4 only (drop TP=8 search-space); enable isl=8192 TP=4 sweep"
- "isl=1024/osl=1024 TP=4: +5.7% to +16.6% improvement at conc 4-64 vs prior InferenceX numbers"
- "isl=1024/osl=1024 TP=4: -0.6% to -1.9% at high concurrency (conc 128-256)"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new dsr1-fp4-mi355x-atom-mtp perf-changelog entry (lines 3333-3341) is inserted before the existing PR #1616 kimik2.5-fp4-mi355x-vllm block (lines 3343-3347) rather than appended to the end of the file. AGENTS.md §1.1 explicitly requires: "perf-changelog.yaml ... is read chronologically (oldest at top, newest at bottom) - new entries MUST be appended to the END, never inserted in the middle or prepended." Please move the new block to after line 3347 so it becomes the new tail.

Extended reasoning...

What's wrong

The PR adds a new perf-changelog entry for dsr1-fp4-mi355x-atom-mtp but places it in the middle of the file (lines 3333-3341), immediately after the PR #1624 dsv4-fp4-mi355x-vllm entry and before the existing PR #1616 kimik2.5-fp4-mi355x-vllm entry (lines 3343-3347). After this PR merges, the file's last entry is still PR #1616's block, not the new PR #1629 block.

Why this violates the documented convention

AGENTS.md is explicit on two points:

  • Line 54 — "perf-changelog.yaml triggers which configs to benchmark and is read chronologically (oldest at top, newest at bottom) - new entries MUST be appended to the END, never inserted in the middle or prepended."
  • Line 144 — "Never delete or modify whitespace in perf-changelog.yaml - CI depends on exact whitespace."

The reason PR #1616 currently sits at the end of the file (after #1624) is not a numbering accident — git log shows commit f9aafa9 (PR #1616, kimik2.5-fp4-mi355x-vllm image bump) was merged after c353742 (PR #1624, dsv4-fp4-mi355x-vllm image bump). So the existing tail correctly reflects merge order: …, #1602, #1624, #1616. This PR (#1629) is being merged after #1616, so its new block belongs after #1616, not before it.

Step-by-step proof

  1. Run wc -l perf-changelog.yaml on the PR branch → 3347 lines total.
  2. sed -n '3343,3347p' → the kimik2.5-fp4-mi355x-vllm block referencing pr-link: .../pull/1616. This is the tail.
  3. sed -n '3333,3341p' → the new dsr1-fp4-mi355x-atom-mtp block referencing pr-link: .../pull/PLACEHOLDER. This was inserted between the PR [Klaud Cold] Update dsv4-fp4-mi355x-vllm vLLM ROCm image to v0.22.0 #1624 block (ending line 3331) and the PR [Klaud Cold] Update kimik2.5-fp4-mi355x-vllm vLLM ROCm image to v0.22.0 #1616 block (starting line 3343).
  4. git log --oneline shows f9aafa9 (PR [Klaud Cold] Update kimik2.5-fp4-mi355x-vllm vLLM ROCm image to v0.22.0 #1616) was committed after c353742 (PR [Klaud Cold] Update dsv4-fp4-mi355x-vllm vLLM ROCm image to v0.22.0 #1624) and before this PR's commit 4af1764. So before this PR, the file ended with PR [Klaud Cold] Update kimik2.5-fp4-mi355x-vllm vLLM ROCm image to v0.22.0 #1616, and the new entry should have been appended after line 3347.

Impact

This violates an explicit MUST rule documented in AGENTS.md. Per the agent doc the file is intended to be read chronologically and AGENTS.md §1.1 also calls out that the prescribed merge-conflict resolution for this file is specifically to re-append at the tail — which presumes append-only ordering. While utils/process_changelog.py reads added lines from the git diff (not the whole file in order), so the CI benchmark trigger logic won't be broken by this misplacement, it still corrupts the chronological invariant that the convention exists to maintain and that future tooling / conflict-resolution may rely on.

Fix

Move the 9-line new block (currently lines 3333-3341) plus its leading blank line so it sits after the existing PR #1616 block — i.e. cut the block from its current position and paste it at the end of the file, separated by one blank line, so the file's final entry is the new dsr1-fp4-mi355x-atom-mtp block. No other changes needed; the block contents are fine as-is.

Comment thread perf-changelog.yaml Outdated
- "Switch to TP=4 only (drop TP=8 search-space); enable isl=8192 TP=4 sweep"
- "isl=1024/osl=1024 TP=4: +5.7% to +16.6% improvement at conc 4-64 vs prior InferenceX numbers"
- "isl=1024/osl=1024 TP=4: -0.6% to -1.9% at high concurrency (conc 128-256)"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new perf-changelog entry at line 3341 has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER — the literal token PLACEHOLDER was never substituted with the actual PR number. Every other entry in the file uses a real numeric PR id (the surrounding entries reference 1598, 1602, 1607, 1616, 1624), so after merge this URL will 404 and break changelog traceability. Replace PLACEHOLDER with 1629 before merging.

Extended reasoning...

What the bug is\n\nThe diff adds a new perf-changelog.yaml entry for dsr1-fp4-mi355x-atom-mtp whose pr-link is the literal string:\n\nyaml\npr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER\n\n\nThe token PLACEHOLDER was never substituted with this PR's actual number (1629). Once merged, the link in the changelog will resolve to https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER, which returns a 404 — the changelog entry loses traceability back to the PR that introduced the change.\n\n### How the bug manifests / Step-by-step proof\n\n1. Open perf-changelog.yaml at line 3341 on this PR's branch. The line reads exactly:\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/PLACEHOLDER\n2. Compare to the immediately neighboring entries in the same file: pull/1598, pull/1602, pull/1607, pull/1616, pull/1624 — every other entry in the surrounding context uses an actual numeric PR id.\n3. After this PR merges to main, anyone clicking that link in the rendered changelog will hit /pull/PLACEHOLDER, which is not a valid PR slug on GitHub and returns 404.\n\n### Why existing tooling does not catch this\n\nThe repo has a canonical placeholder marker for unknown PR numbers — XXX — documented in AGENTS.md ("Use https://github.com/SemiAnalysisAI/InferenceX/pull/XXX ... replace XXX with the real PR number before merge") and .github/workflows/claude.yml ("Use XXX as the PR-link placeholder until the PR exists"). The merge-conflict helper merge_with_reuse.sh only auto-rewrites pr-links that contain XXX or already end with the real /pull/{pr} suffix. The literal token PLACEHOLDER matches neither convention, so the helper will silently skip this entry and the broken URL will land on main as-is on a clean merge.\n\n### Impact\n\nNon-functional but user-visible: the changelog is the canonical record of which PR introduced each perf-config change. A broken link here:\n- Breaks the round-trip from changelog entry → PR (cannot review the original context).\n- Pollutes main with a known-bad URL that future tooling and humans will have to clean up.\n- Will not be caught by the existing XXX-rewrite path in merge_with_reuse.sh.\n\n### Fix\n\nReplace PLACEHOLDER with 1629 on line 3341:\n\nyaml\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1629\n\n\nAlternatively, change PLACEHOLDER to the canonical XXX so future automation (e.g. merge_with_reuse.sh) can substitute it — but a one-line manual fix before merge is strictly simpler.

…attn support

- Switch image from nightly_202605301523 to stable atom0.1.3
- Expand concurrency search space from 4-256 to 4-1024
- Refactor benchmark script to use PARALLEL_ARGS/SPEC_ARGS pattern
- Remove ISL/OSL-based max-model-len calculation
- Remove WIP comment from config
- Update perf-changelog image reference

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 14da3ee. Configure here.

if [ "$ISL" = "1024" ] && [ "$OSL" = "1024" ]; then
CALCULATED_MAX_MODEL_LEN=""
else
CALCULATED_MAX_MODEL_LEN=" --max-model-len 10240 "
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing max-model-len for 8192

High Severity

The script always leaves CALCULATED_MAX_MODEL_LEN empty for normal throughput runs, but this commit also turns on the isl=8192 / osl=1024 scenario in amd-master.yaml. The previous ISL/OSL branch passed --max-model-len 10240 for that case; sibling ATOM scripts still do. Without a raised limit, the ATOM server may reject or truncate 8192-token prompts.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 14da3ee. Configure here.

@seungrokj seungrokj added the AMD label May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants