perf(d-4 bootstrap): workflow_dispatch automation for baseline collection (standards#99)#26
Draft
hyperpolymath wants to merge 1 commit into
Draft
perf(d-4 bootstrap): workflow_dispatch automation for baseline collection (standards#99)#26hyperpolymath wants to merge 1 commit into
hyperpolymath wants to merge 1 commit into
Conversation
…tion
Phase D-4 of the single-lane HCG tier-2 channel (standards#91 / #99)
is "real baseline numbers populated in bench/baseline.json and the
perf-regression gate armed by flipping _status to active". The
rebaseline ritual in docs/perf-contract.md § Baseline lifecycle
defines step 2 as `just bench-collect` on a CI-equivalent target,
but the published reference per the same doc is `ubuntu-latest` GHA
runners — yet the ritual was authored as a manual local step that
requires the operator to have an Elixir 1.19 / OTP 28 toolchain on
their machine. That gap blocked D-4 from being executable by anyone
without a matching local dev env.
This PR adds the missing on-the-published-target automation. It does
NOT collect numbers or flip _status itself; those are still the
maintainer's deliberate acts. It provides the means.
What landed
───────────
* `.github/workflows/perf-rebaseline.yml` — manual workflow_dispatch
workflow that runs `mix run bench/gateway_latency.exs` on
ubuntu-latest (same target perf-regression.yml uses, so numbers
are comparable), pipes results through bench/rebaseline.exs, opens
a `perf: rebaseline (standards#99)` PR with the regenerated
bench/baseline.json. Uses the same SHA-pinned actions and the
same `runner.os-perf-${hashFiles(mix.lock)}` cache key as
perf-regression.yml so the first rebaseline run primes off the
warm cache. Deliberately NO concurrency cancel-in-progress (an
operator-initiated rebaseline should complete on its own; this
workflow has no obsolescence relationship the way per-PR runs do).
* `bench/rebaseline.exs` — reads bench/results.json + bench/baseline.json,
writes a new bench/baseline.json with real p50/p95/p99/ips per
scenario. Preserves _comment, _schema_version, tolerance, and
per-scenario _comment_* fields. Leaves _status as
"scaffold-placeholder" — flipping to "active" is the maintainer's
separate review step. Field order is preserved end-to-end via
Jason.OrderedObject (decode with `objects: :ordered_objects`,
encode back; Jason 1.4+ already in mix.lock) so the rebaseline
diff is review-grade (numbers move; structure does not).
* `Justfile` — adds `just rebaseline` (runs harness + regeneration
script) so the same two-step sequence the workflow runs is also
available locally for operators previewing a rebaseline. The
`just bench-collect` message is updated to point at `just rebaseline`.
* `docs/perf-contract.md` — splits § Baseline lifecycle into
"Automated (preferred — D-4 bootstrap)" and "Manual (for local
previews or operators without GHA access)". Both paths leave
_status as scaffold-placeholder; flipping to active is called out
as a separate deliberate decision in either path.
What is deliberately NOT in this PR
────────────────────────────────────
* The real baseline numbers themselves — those land in the generated
`perf: rebaseline (standards#99)` PR after the maintainer dispatches
the workflow.
* Flipping bench/baseline.json `_status` to "active" — maintainer
judgement on noise/spread; may land in the same generated PR or in
a follow-up.
* Tightening tolerance ratios — also a maintainer judgement, post-D-4.
* Schema-drift hardening in bench/compare.exs (new scenario in results
but not baseline silently passes in active mode) — separate defensive
D-3 follow-up, not coupled to D-4 collection.
Refs hyperpolymath/standards#91
Refs hyperpolymath/standards#99
(NOT Closes #99: joint-close is owner-only, and D-4 baseline collection
+ the _status flip still pend under #99 even after this lands. This PR
makes D-4 executable; the generated rebaseline PR + the flip-to-active
PR jointly close it.)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase D-4 of the single-lane HCG tier-2 channel (
standards#91/#99) is "real baseline numbers populated inbench/baseline.jsonand the perf-regression gate armed by flipping_statustoactive". The rebaseline ritual indocs/perf-contract.md§ Baseline lifecycle defines step 2 asjust bench-collecton a CI-equivalent target — but the published reference per the same doc isubuntu-latestGHA runners, and the ritual was authored as a manual local step that requires the operator to have an Elixir 1.19 / OTP 28 toolchain on their machine. That gap blocked D-4 from being executable by anyone without a matching local dev env.This PR adds the missing on-the-published-target automation. It does not collect numbers or flip
_statusitself; those remain the maintainer's deliberate acts. It provides the means.Refs hyperpolymath/standards#91Refs hyperpolymath/standards#99What landed
.github/workflows/perf-rebaseline.yml—workflow_dispatch-only workflow that runsmix run bench/gateway_latency.exsonubuntu-latest(same targetperf-regression.ymluses, so numbers are comparable), pipes results throughbench/rebaseline.exs, opens aperf: rebaseline (standards#99)PR with the regeneratedbench/baseline.json. SHA-pinned actions matchperf-regression.yml; reuses the samerunner.os-perf-${hashFiles(mix.lock)}cache key so the first rebaseline primes off the warm cache. Deliberately NOconcurrency: cancel-in-progress— an operator-initiated rebaseline should complete on its own (this workflow has no obsolescence relationship the way per-PR runs do).bench/rebaseline.exs— readsbench/results.json+bench/baseline.json, writes a newbench/baseline.jsonwith realp50/p95/p99/ipsper scenario. Preserves_comment,_schema_version,tolerance, and per-scenario_comment_*fields. Leaves_statusasscaffold-placeholder— flipping toactiveis the maintainer's separate review step. Field order is preserved end-to-end viaJason.OrderedObject(decode withobjects: :ordered_objects, encode back; Jason 1.4+ is already inmix.lock) so the rebaseline diff is review-grade (numbers move; structure does not).Justfile— addsjust rebaseline(runs harness + regeneration script) so the same two-step sequence the workflow runs is also available locally for operators previewing a rebaseline. Thejust bench-collectmessage is updated to point atjust rebaseline.docs/perf-contract.md— splits § Baseline lifecycle into "Automated (preferred — D-4 bootstrap)" and "Manual (for local previews or operators without GHA access)". Both paths leave_statusasscaffold-placeholder; flipping toactiveis called out as a separate deliberate decision in either path.What is deliberately NOT in this PR
perf: rebaseline (standards#99)PR after the maintainer dispatches the workflow.bench/baseline.json_statustoactive— maintainer judgement on noise/spread; may land in the same generated PR or in a follow-up.bench/compare.exs(new scenario in results but not baseline silently passes inactivemode) — separate defensive D-3 follow-up, not coupled to D-4 collection.Test plan
perf-regression.ymlwill also fire — it stays in scaffold mode sincebench/baseline.json _statusis unchanged.Perf Rebaselineworkflow does NOT auto-fire (workflow_dispatchonly) on this PR's push.Perf Rebaselineworkflow from the Actions tab → it should run the harness, regeneratebench/baseline.json, push aperf/rebaseline-<run-id>branch, and open aperf: rebaseline (standards#99)PR.bench/results.json+bench/console.logare attached asperf-rebaseline-resultsartefact for 30 days._status→activein the same PR (single-PR D-4 + D-3 close) or merge as-is and follow up with a_statusflip PR.Downstream unblock
The boj-server rollout-prerequisite checklist in
docs/integration/hcg-tier2-rollout-runbook.md§ 1.1 lists D-3 (gate armed) and D-4 (numbers populated) as the remaining open items gating Phase E rollout. After this PR lands, the path to ticking both boxes is one workflow dispatch + one (or two) maintainer review/merge events, no local Elixir/OTP toolchain required.Owner merges; not for admin-merge.
🤖 Generated with Claude Code
Generated by Claude Code