Skip to content

[Klaud Cold] Update dsv4-fp8-h200-vllm (+mtp) vLLM image to v0.22.0#1597

Open
functionstackx wants to merge 1 commit into
mainfrom
klaud-cold/dsv4-fp8-h200-vllm-v0.22.0
Open

[Klaud Cold] Update dsv4-fp8-h200-vllm (+mtp) vLLM image to v0.22.0#1597
functionstackx wants to merge 1 commit into
mainfrom
klaud-cold/dsv4-fp8-h200-vllm-v0.22.0

Conversation

@functionstackx
Copy link
Copy Markdown
Collaborator

@functionstackx functionstackx commented May 30, 2026

Summary

Update vLLM image from v0.21.0 to v0.22.0

Recipes touched: dsv4-fp8-h200-vllm, dsv4-fp8-h200-vllm-mtp

Test plan

  • full-sweep-enabled sweep passes.

🤖 Generated with Claude Code


Note

Low Risk
Config-only container version bump for two benchmark recipes; no serve flags, auth, or application logic changes.

Overview
Bumps the vLLM OpenAI container for the DeepSeek-V4-Pro H200 FP8 fixed-seq-len recipes dsv4-fp8-h200-vllm and dsv4-fp8-h200-vllm-mtp from v0.21.0 to v0.22.0 in nvidia-master.yaml. Benchmark scenarios, TP/EP search spaces, and MTP settings are unchanged.

Documents the image bump in perf-changelog.yaml for both config keys.

Reviewed by Cursor Bugbot for commit e8b0d15. Bugbot is set up for automated code reviews on this repo. Configure here.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@functionstackx functionstackx force-pushed the klaud-cold/dsv4-fp8-h200-vllm-v0.22.0 branch from 19b2c60 to e8b0d15 Compare May 30, 2026 22:09
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward image bump for two recipes with matching changelog entry.

Extended reasoning...

Overview

This PR is a config-only bump of the vLLM image from v0.21.0 to v0.22.0 for the dsv4-fp8-h200-vllm and dsv4-fp8-h200-vllm-mtp benchmark recipes in .github/configs/nvidia-master.yaml, plus a matching entry appended to perf-changelog.yaml. No serve flags, scenarios, TP/EP search spaces, or MTP settings are touched.

Security risks

None. The change only updates a public image tag string in YAML; no auth, crypto, permissions, or executable code paths are affected.

Level of scrutiny

Low. This is a mechanical version bump in benchmark recipe config, the kind of change the full-sweep-enabled label is designed to validate empirically. The blast radius is contained to the two recipes' sweep runs; any regression would surface in the sweep, not in production-critical code.

Other factors

Only one finding from the bug hunter, and it's a pre-existing nit about stale rationale comments above both entries (the comments still reference cu129 and v0.20.1 tags that were superseded in an earlier bump). The staleness predates this PR and has no runtime impact — worth fixing in a follow-up but not blocking. The changelog entry is properly formatted and links the PR.

Comment on lines 2884 to +2887
# Uses the cu129 image. H200 has no FP4 path, so the FP4 indexer cache
# flag is omitted. Max-model-len is pinned at 800k per the recipe.
dsv4-fp8-h200-vllm:
image: vllm/vllm-openai:v0.21.0
image: vllm/vllm-openai:v0.22.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The rationale comments above both dsv4-fp8-h200-vllm (lines 2883–2885) and dsv4-fp8-h200-vllm-mtp (lines 2907–2909) are stale and now contradict the image lines this PR is editing. The non-MTP block still says "Uses the cu129 image" but the entry pins the stock vllm/vllm-openai:v0.22.0 tag, and the MTP block claims it uses "the canonical v0.20.1 image" while the non-MTP entry above is "still on the deepseekv4-cu129 tag" — both factually wrong post-PR (MTP is v0.22.0, non-MTP is v0.22.0). Pre-existing staleness (the prior v0.21.0 bump already failed to refresh these), but since this PR edits both image lines, it would be a natural place to refresh the adjacent rationale. Severity: nit.

Extended reasoning...

What the bug is

This PR bumps both dsv4-fp8-h200-vllm (line 2887) and dsv4-fp8-h200-vllm-mtp (line 2911) from vllm/vllm-openai:v0.21.0 to vllm/vllm-openai:v0.22.0. Both entries have explanatory comment blocks immediately above them whose factual claims no longer match the image they document.

Comment #1 — above dsv4-fp8-h200-vllm (lines 2883–2885):

# Uses the cu129 image. H200 has no FP4 path, so the FP4 indexer cache
# flag is omitted. Max-model-len is pinned at 800k per the recipe.
dsv4-fp8-h200-vllm:
  image: vllm/vllm-openai:v0.22.0   # <-- not a cu129-tagged image

The comment says "Uses the cu129 image" — historically this referred to a custom deepseekv4-cu129 build. The pinned tag is now the stock public semver v0.22.0, so the rationale-for-image-choice no longer applies.

Comment #2 — above dsv4-fp8-h200-vllm-mtp (lines 2907–2909):

# MTP variant of dsv4-fp8-h200-vllm. Uses the canonical v0.20.1 image
# (the non-MTP entry above is still on the deepseekv4-cu129 tag) and adds
# --speculative-config '{"method":"mtp","num_speculative_tokens":2}'.
dsv4-fp8-h200-vllm-mtp:
  image: vllm/vllm-openai:v0.22.0

Both factual claims are wrong post-PR: (1) the MTP image is v0.22.0, not v0.20.1; (2) the non-MTP entry directly above is also v0.22.0, not deepseekv4-cu129.

Step-by-step proof

  1. Open .github/configs/nvidia-master.yaml after this PR is applied.
  2. Read lines 2883–2887. The comment claims "Uses the cu129 image"; the next non-comment line is image: vllm/vllm-openai:v0.22.0 — a stock public tag, not a cu129 build. Contradiction confirmed.
  3. Read lines 2907–2911. The comment claims "Uses the canonical v0.20.1 image" — but line 2911 reads image: vllm/vllm-openai:v0.22.0. Contradiction [NVIDIA] Add TRT-LLM 70B FP8 via slurm #1 confirmed.
  4. The same comment claims "the non-MTP entry above is still on the deepseekv4-cu129 tag" — but line 2887 (the non-MTP entry) reads image: vllm/vllm-openai:v0.22.0. Contradiction [NVIDIA] Add TRT 70B (FP8 and FP4) #2 confirmed.
  5. git log -p on this file shows PR [Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Update dsv4-fp8-h200-vllm (+mtp) vLLM image to v0.21.0 #1461 bumped these entries from the original tags to v0.21.0 without updating the comments; the staleness is therefore pre-existing. This PR widens the gap (v0.21.0 → v0.22.0) but is editing the exact image lines the comments describe, so it is the natural place to refresh the rationale in the same change.

Impact

No runtime impact — these are YAML comments only. The hazard is reviewer/maintainer confusion: someone reading the file to understand why a specific image was chosen will get a misleading answer, and the MTP block in particular invites a "wait, the non-MTP entry must still be on cu129" mistake during the next bump.

How to fix

Either (a) drop the now-obsolete provenance sentences entirely and keep only the still-accurate parts (no FP4 path on H200, max-model-len 800k, MTP adds --speculative-config), or (b) replace the stale lines with neutral rationale that doesn't pin to a specific tag — e.g. for the MTP block, just say "MTP variant of dsv4-fp8-h200-vllm; adds --speculative-config ...." Avoid embedding image tags or cross-references between entries in prose, since those go stale on every bump.

Severity

Nit / pre-existing. The staleness was introduced before this PR (the v0.21.0 bump in #1461 already mismatched the cu129/v0.20.1 prose), and the runtime behavior is unaffected. Flagging because the PR is directly editing both image lines and is in the natural position to refresh the adjacent rationale in the same change.

@github-actions
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant