You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make the orchestrator's in-flight work durable across a crash by mirroring it to two GitHub labels — agent:in-progress (a sandbox is implementing the issue) and agent:in-review (the AI reviewer is running on the open PR). Design + rationale: ADR-0021 (docs/adr/0021-durable-agent-lifecycle-labels.md, PR #78).
Problem
Today an issue's lifecycle has four phases but only the endpoints leave a durable artifact: ready-for-agent (workable) and a closed issue (done). The two middle phases — sandbox implementing and reviewer running on the open PR — live only in the orchestrator's in-memory State.inFlight. If the orchestrator is killed mid-flight, that issue is claimed (label removed) but unlabelled, and is silently abandoned. #76 re-queues on a SandboxFailedevent, but process death is not an event, so the crash gap is real and uncovered.
Goal (end-to-end value)
A crashed orchestrator's successor reconciles abandoned work from GitHub labels at startup: a half-implemented issue is re-queued, an in-review PR is re-reviewed — no human babysitting. State.inFlight stays the runtime source of truth; the labels are a durable mirror written at each transition and read only at boot.
Acceptance criteria
Labels exist: create agent:in-progress and agent:in-review (gh label create, distinct colors). ready-for-agent is kept as-is (no rename — it's the READY_LABEL constant referenced across reduce.ts, CLAUDE.md, to-issues, and open issues).
Transitions live in the pure reducer (reduce.ts), emitting label actions the orchestrator carries out (extend the existing Relabel/add a SetLabel/RemoveLabel action as needed — keep the reducer pure):
PR opened (SandboxFinished): agent:in-progress → agent:in-review.
Verdict (ReviewFinished): remove agent:in-review (downstream the PR + CI status are the artifact; pass→EnableAutoMerge, changes-requested/fail-safe→WaitForHuman, unchanged from ADR-0020).
agent:in-review found, PR already open → re-run the read-only review on the existing PR (idempotent — the reviewer reads the PR diff and only comments). Do not re-queue (that spawns a duplicate sandbox + second PR) and do not leave it stuck.
Labels are NOT the hot-path source of truth:State.inFlight stays authoritative at runtime; the labels are read from GitHub only at boot (no per-tick GitHub-API label reads).
Happy path unchanged: a normal afk/hitl run with no crash behaves exactly as today, just with the two extra labels appearing/disappearing as it progresses.
Doc: mark ADR-0021 Accepted and add the two labels + ready-for-agent to the CLAUDE.md label/lifecycle notes.
Non-goals (state, do not build — per ADR-0021)
Atomic double-claim prevention / a hard cross-process lock. Skipping agent:in-progress/agent:in-review issues is a best-effort defensive guardrail only; GitHub labels have no compare-and-swap and there is no concurrent-orchestrator scenario today. Single-writer-per-project holds. Do not build locking on labels.
Renaming ready-for-agent. Full-pipeline labels (agent:ready/…merging). The open PR + CI already are the artifact for post-review phases.
Testing (highest, single seam — logic in TypeScript)
Reuse the reducer seam — reduce.test.ts (node:test + node:assert/strict):
Each transition emits the right label action: claim → set agent:in-progress; SandboxFinished → swap to agent:in-review; ReviewFinished → remove agent:in-review; SandboxFailed → back to ready-for-agent/exhausted (existing Re-queue failed sandboxes: handle SandboxFailed with bounded retry #76 cases still pass).
Reconcile (pure):agent:in-progress (no PR) → re-queue action; agent:in-review (PR open) → re-review action, not re-queue; a clean snapshot → no actions. Mirror parseBlockedBy / reviewVerdict pure-mapper prior art.
Live execution stays behind the gated integration.test.ts (SANDCASTLE_INTEGRATION=1); default npm test and the sandcastle CI job are unaffected. Do not add a new seam.
Unchanged (do not touch)
The review gate verdict mapping (#74/ADR-0020), SandboxFailed retry counting (#76), merge mechanics, dependency unblocking (#2). This only adds the durable label mirror + boot reconcile.
Constraints (per repo workflow)
Implement with /tdd per criterion, run shell via /exec, scope to reduce.ts + reduce.test.ts + the main.ts boot-reconcile wiring + the label setup + ADR/CLAUDE.md, no new dependencies, no pushing to main.
Make the orchestrator's in-flight work durable across a crash by mirroring it to two GitHub labels —
agent:in-progress(a sandbox is implementing the issue) andagent:in-review(the AI reviewer is running on the open PR). Design + rationale: ADR-0021 (docs/adr/0021-durable-agent-lifecycle-labels.md, PR #78).Problem
Today an issue's lifecycle has four phases but only the endpoints leave a durable artifact:
ready-for-agent(workable) and a closed issue (done). The two middle phases — sandbox implementing and reviewer running on the open PR — live only in the orchestrator's in-memoryState.inFlight. If the orchestrator is killed mid-flight, that issue is claimed (label removed) but unlabelled, and is silently abandoned.#76re-queues on aSandboxFailedevent, but process death is not an event, so the crash gap is real and uncovered.Goal (end-to-end value)
A crashed orchestrator's successor reconciles abandoned work from GitHub labels at startup: a half-implemented issue is re-queued, an in-review PR is re-reviewed — no human babysitting.
State.inFlightstays the runtime source of truth; the labels are a durable mirror written at each transition and read only at boot.Acceptance criteria
agent:in-progressandagent:in-review(gh label create, distinct colors).ready-for-agentis kept as-is (no rename — it's theREADY_LABELconstant referenced acrossreduce.ts,CLAUDE.md,to-issues, and open issues).reduce.ts), emitting label actions the orchestrator carries out (extend the existingRelabel/add aSetLabel/RemoveLabelaction as needed — keep the reducer pure):Tick→StartSandbox):ready-for-agent→agent:in-progress.SandboxFinished):agent:in-progress→agent:in-review.ReviewFinished): removeagent:in-review(downstream the PR + CI status are the artifact; pass→EnableAutoMerge, changes-requested/fail-safe→WaitForHuman, unchanged from ADR-0020).agent:in-progress→ready-for-agent(retry) or unlabelled + exhaustion comment (cap reached). The existing retry logic is unchanged; only the label it removes/adds is adjusted.agent:in-progressfound, no open PR → re-queue toready-for-agent(normal tick re-claims and starts fresh;resetAgentBranch(Orchestrator reuses stale agent/issue-N branches, causing merge conflicts #23) already wipes the staleagent/issue-Nbranch).agent:in-reviewfound, PR already open → re-run the read-only review on the existing PR (idempotent — the reviewer reads the PR diff and only comments). Do not re-queue (that spawns a duplicate sandbox + second PR) and do not leave it stuck.State.inFlightstays authoritative at runtime; the labels are read from GitHub only at boot (no per-tick GitHub-API label reads).Acceptedand add the two labels +ready-for-agentto the CLAUDE.md label/lifecycle notes.Non-goals (state, do not build — per ADR-0021)
agent:in-progress/agent:in-reviewissues is a best-effort defensive guardrail only; GitHub labels have no compare-and-swap and there is no concurrent-orchestrator scenario today. Single-writer-per-project holds. Do not build locking on labels.ready-for-agent. Full-pipeline labels (agent:ready/…merging). The open PR + CI already are the artifact for post-review phases.Testing (highest, single seam — logic in TypeScript)
Reuse the reducer seam —
reduce.test.ts(node:test+node:assert/strict):agent:in-progress;SandboxFinished→ swap toagent:in-review;ReviewFinished→ removeagent:in-review;SandboxFailed→ back toready-for-agent/exhausted (existing Re-queue failed sandboxes: handle SandboxFailed with bounded retry #76 cases still pass).agent:in-progress(no PR) → re-queue action;agent:in-review(PR open) → re-review action, not re-queue; a clean snapshot → no actions. MirrorparseBlockedBy/reviewVerdictpure-mapper prior art.Live execution stays behind the gated
integration.test.ts(SANDCASTLE_INTEGRATION=1); defaultnpm testand thesandcastleCI job are unaffected. Do not add a new seam.Unchanged (do not touch)
The review gate verdict mapping (#74/ADR-0020),
SandboxFailedretry counting (#76), merge mechanics, dependency unblocking (#2). This only adds the durable label mirror + boot reconcile.Constraints (per repo workflow)
Implement with
/tddper criterion, run shell via/exec, scope toreduce.ts+reduce.test.ts+ themain.tsboot-reconcile wiring + the label setup + ADR/CLAUDE.md, no new dependencies, no pushing to main.