[Klaud Cold] Update dsv4-fp8-h200-vllm (+mtp) vLLM image to v0.22.0 by functionstackx · Pull Request #1597 · SemiAnalysisAI/InferenceX

functionstackx · 2026-05-30T22:09:11Z

Summary

Update vLLM image from v0.21.0 to v0.22.0

Recipes touched: dsv4-fp8-h200-vllm, dsv4-fp8-h200-vllm-mtp

Test plan

full-sweep-enabled sweep passes.

🤖 Generated with Claude Code

Note

Low Risk
Config-only container version bump for two benchmark recipes; no serve flags, auth, or application logic changes.

Overview
Bumps the vLLM OpenAI container for the DeepSeek-V4-Pro H200 FP8 fixed-seq-len recipes dsv4-fp8-h200-vllm and dsv4-fp8-h200-vllm-mtp from v0.21.0 to v0.22.0 in nvidia-master.yaml. Benchmark scenarios, TP/EP search spaces, and MTP settings are unchanged.

Documents the image bump in perf-changelog.yaml for both config keys.

^{Reviewed by Cursor Bugbot for commit e8b0d15. Bugbot is set up for automated code reviews on this repo. Configure here.}

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-30T22:09:19Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-30T22:09:20Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-30T22:09:44Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26696211719
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26696211719

claude

LGTM — straightforward image bump for two recipes with matching changelog entry.

Extended reasoning...

Overview

This PR is a config-only bump of the vLLM image from v0.21.0 to v0.22.0 for the dsv4-fp8-h200-vllm and dsv4-fp8-h200-vllm-mtp benchmark recipes in .github/configs/nvidia-master.yaml, plus a matching entry appended to perf-changelog.yaml. No serve flags, scenarios, TP/EP search spaces, or MTP settings are touched.

Security risks

None. The change only updates a public image tag string in YAML; no auth, crypto, permissions, or executable code paths are affected.

Level of scrutiny

Low. This is a mechanical version bump in benchmark recipe config, the kind of change the full-sweep-enabled label is designed to validate empirically. The blast radius is contained to the two recipes' sweep runs; any regression would surface in the sweep, not in production-critical code.

Other factors

Only one finding from the bug hunter, and it's a pre-existing nit about stale rationale comments above both entries (the comments still reference cu129 and v0.20.1 tags that were superseded in an earlier bump). The staleness predates this PR and has no runtime impact — worth fixing in a follow-up but not blocking. The changelog entry is properly formatted and links the PR.

claude · 2026-05-30T22:17:03Z

 # Uses the cu129 image. H200 has no FP4 path, so the FP4 indexer cache
 # flag is omitted. Max-model-len is pinned at 800k per the recipe.
 dsv4-fp8-h200-vllm:
-  image: vllm/vllm-openai:v0.21.0
+  image: vllm/vllm-openai:v0.22.0


🟡 The rationale comments above both dsv4-fp8-h200-vllm (lines 2883–2885) and dsv4-fp8-h200-vllm-mtp (lines 2907–2909) are stale and now contradict the image lines this PR is editing. The non-MTP block still says "Uses the cu129 image" but the entry pins the stock vllm/vllm-openai:v0.22.0 tag, and the MTP block claims it uses "the canonical v0.20.1 image" while the non-MTP entry above is "still on the deepseekv4-cu129 tag" — both factually wrong post-PR (MTP is v0.22.0, non-MTP is v0.22.0). Pre-existing staleness (the prior v0.21.0 bump already failed to refresh these), but since this PR edits both image lines, it would be a natural place to refresh the adjacent rationale. Severity: nit.

Extended reasoning...

What the bug is

This PR bumps both dsv4-fp8-h200-vllm (line 2887) and dsv4-fp8-h200-vllm-mtp (line 2911) from vllm/vllm-openai:v0.21.0 to vllm/vllm-openai:v0.22.0. Both entries have explanatory comment blocks immediately above them whose factual claims no longer match the image they document.

Comment #1 — above dsv4-fp8-h200-vllm (lines 2883–2885):

# Uses the cu129 image. H200 has no FP4 path, so the FP4 indexer cache # flag is omitted. Max-model-len is pinned at 800k per the recipe. dsv4-fp8-h200-vllm: image: vllm/vllm-openai:v0.22.0 # <-- not a cu129-tagged image

The comment says "Uses the cu129 image" — historically this referred to a custom deepseekv4-cu129 build. The pinned tag is now the stock public semver v0.22.0, so the rationale-for-image-choice no longer applies.

Comment #2 — above dsv4-fp8-h200-vllm-mtp (lines 2907–2909):

# MTP variant of dsv4-fp8-h200-vllm. Uses the canonical v0.20.1 image # (the non-MTP entry above is still on the deepseekv4-cu129 tag) and adds # --speculative-config '{"method":"mtp","num_speculative_tokens":2}'. dsv4-fp8-h200-vllm-mtp: image: vllm/vllm-openai:v0.22.0

Both factual claims are wrong post-PR: (1) the MTP image is v0.22.0, not v0.20.1; (2) the non-MTP entry directly above is also v0.22.0, not deepseekv4-cu129.

Step-by-step proof

Open .github/configs/nvidia-master.yaml after this PR is applied.

Read lines 2883–2887. The comment claims "Uses the cu129 image"; the next non-comment line is image: vllm/vllm-openai:v0.22.0 — a stock public tag, not a cu129 build. Contradiction confirmed.

Read lines 2907–2911. The comment claims "Uses the canonical v0.20.1 image" — but line 2911 reads image: vllm/vllm-openai:v0.22.0. Contradiction [NVIDIA] Add TRT-LLM 70B FP8 via slurm #1 confirmed.

The same comment claims "the non-MTP entry above is still on the deepseekv4-cu129 tag" — but line 2887 (the non-MTP entry) reads image: vllm/vllm-openai:v0.22.0. Contradiction [NVIDIA] Add TRT 70B (FP8 and FP4) #2 confirmed.

git log -p on this file shows PR [Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Update dsv4-fp8-h200-vllm (+mtp) vLLM image to v0.21.0 #1461 bumped these entries from the original tags to v0.21.0 without updating the comments; the staleness is therefore pre-existing. This PR widens the gap (v0.21.0 → v0.22.0) but is editing the exact image lines the comments describe, so it is the natural place to refresh the rationale in the same change.

Impact

No runtime impact — these are YAML comments only. The hazard is reviewer/maintainer confusion: someone reading the file to understand why a specific image was chosen will get a misleading answer, and the MTP block in particular invites a "wait, the non-MTP entry must still be on cu129" mistake during the next bump.

How to fix

Either (a) drop the now-obsolete provenance sentences entirely and keep only the still-accurate parts (no FP4 path on H200, max-model-len 800k, MTP adds --speculative-config), or (b) replace the stale lines with neutral rationale that doesn't pin to a specific tag — e.g. for the MTP block, just say "MTP variant of dsv4-fp8-h200-vllm; adds --speculative-config ...." Avoid embedding image tags or cross-references between entries in prose, since those go stale on every bump.

Severity

Nit / pre-existing. The staleness was introduced before this PR (the v0.21.0 bump in #1461 already mismatched the cu129/v0.20.1 prose), and the runtime behavior is unaffected. Flagging because the PR is directly editing both image lines and is in the natural position to refresh the adjacent rationale in the same change.

github-actions · 2026-05-31T03:42:03Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26696213252
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26696213252

functionstackx requested a review from a team May 30, 2026 22:09

functionstackx added the full-sweep-enabled label May 30, 2026

functionstackx requested review from jgangani and kedarpotdar-nv as code owners May 30, 2026 22:09

github-project-automation Bot added this to InferenceMAX Board May 30, 2026

[Klaud Cold] Update dsv4-fp8-h200-vllm (+mtp) vLLM image to v0.22.0

e8b0d15

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

functionstackx force-pushed the klaud-cold/dsv4-fp8-h200-vllm-v0.22.0 branch from 19b2c60 to e8b0d15 Compare May 30, 2026 22:09

claude Bot reviewed May 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Klaud Cold] Update dsv4-fp8-h200-vllm (+mtp) vLLM image to v0.22.0#1597

[Klaud Cold] Update dsv4-fp8-h200-vllm (+mtp) vLLM image to v0.22.0#1597
functionstackx wants to merge 1 commit into
mainfrom
klaud-cold/dsv4-fp8-h200-vllm-v0.22.0

functionstackx commented May 30, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot May 30, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented May 30, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

functionstackx commented May 30, 2026 •

edited by cursor Bot

Loading