Skip to content

Durable agent lifecycle labels (agent:in-progress / agent:in-review) for crash recovery #79

Description

@lsfera

Make the orchestrator's in-flight work durable across a crash by mirroring it to two GitHub labels — agent:in-progress (a sandbox is implementing the issue) and agent:in-review (the AI reviewer is running on the open PR). Design + rationale: ADR-0021 (docs/adr/0021-durable-agent-lifecycle-labels.md, PR #78).

Problem

Today an issue's lifecycle has four phases but only the endpoints leave a durable artifact: ready-for-agent (workable) and a closed issue (done). The two middle phases — sandbox implementing and reviewer running on the open PR — live only in the orchestrator's in-memory State.inFlight. If the orchestrator is killed mid-flight, that issue is claimed (label removed) but unlabelled, and is silently abandoned. #76 re-queues on a SandboxFailed event, but process death is not an event, so the crash gap is real and uncovered.

Goal (end-to-end value)

A crashed orchestrator's successor reconciles abandoned work from GitHub labels at startup: a half-implemented issue is re-queued, an in-review PR is re-reviewed — no human babysitting. State.inFlight stays the runtime source of truth; the labels are a durable mirror written at each transition and read only at boot.

Acceptance criteria

  • Labels exist: create agent:in-progress and agent:in-review (gh label create, distinct colors). ready-for-agent is kept as-is (no rename — it's the READY_LABEL constant referenced across reduce.ts, CLAUDE.md, to-issues, and open issues).
  • Transitions live in the pure reducer (reduce.ts), emitting label actions the orchestrator carries out (extend the existing Relabel/add a SetLabel/RemoveLabel action as needed — keep the reducer pure):
    • Claim (TickStartSandbox): ready-for-agentagent:in-progress.
    • PR opened (SandboxFinished): agent:in-progressagent:in-review.
    • Verdict (ReviewFinished): remove agent:in-review (downstream the PR + CI status are the artifact; pass→EnableAutoMerge, changes-requested/fail-safe→WaitForHuman, unchanged from ADR-0020).
    • SandboxFailed (Re-queue failed sandboxes: handle SandboxFailed with bounded retry #76): agent:in-progressready-for-agent (retry) or unlabelled + exhaustion comment (cap reached). The existing retry logic is unchanged; only the label it removes/adds is adjusted.
  • Startup reconcile (a pure function over the boot-time label snapshot → events/actions, so it is unit-testable):
    • agent:in-progress found, no open PR → re-queue to ready-for-agent (normal tick re-claims and starts fresh; resetAgentBranch (Orchestrator reuses stale agent/issue-N branches, causing merge conflicts #23) already wipes the stale agent/issue-N branch).
    • agent:in-review found, PR already open → re-run the read-only review on the existing PR (idempotent — the reviewer reads the PR diff and only comments). Do not re-queue (that spawns a duplicate sandbox + second PR) and do not leave it stuck.
  • Labels are NOT the hot-path source of truth: State.inFlight stays authoritative at runtime; the labels are read from GitHub only at boot (no per-tick GitHub-API label reads).
  • Happy path unchanged: a normal afk/hitl run with no crash behaves exactly as today, just with the two extra labels appearing/disappearing as it progresses.
  • Doc: mark ADR-0021 Accepted and add the two labels + ready-for-agent to the CLAUDE.md label/lifecycle notes.

Non-goals (state, do not build — per ADR-0021)

  • Atomic double-claim prevention / a hard cross-process lock. Skipping agent:in-progress/agent:in-review issues is a best-effort defensive guardrail only; GitHub labels have no compare-and-swap and there is no concurrent-orchestrator scenario today. Single-writer-per-project holds. Do not build locking on labels.
  • Renaming ready-for-agent. Full-pipeline labels (agent:ready/…merging). The open PR + CI already are the artifact for post-review phases.

Testing (highest, single seam — logic in TypeScript)

Reuse the reducer seam — reduce.test.ts (node:test + node:assert/strict):

  • Each transition emits the right label action: claim → set agent:in-progress; SandboxFinished → swap to agent:in-review; ReviewFinished → remove agent:in-review; SandboxFailed → back to ready-for-agent/exhausted (existing Re-queue failed sandboxes: handle SandboxFailed with bounded retry #76 cases still pass).
  • Reconcile (pure): agent:in-progress (no PR) → re-queue action; agent:in-review (PR open) → re-review action, not re-queue; a clean snapshot → no actions. Mirror parseBlockedBy / reviewVerdict pure-mapper prior art.
    Live execution stays behind the gated integration.test.ts (SANDCASTLE_INTEGRATION=1); default npm test and the sandcastle CI job are unaffected. Do not add a new seam.

Unchanged (do not touch)

The review gate verdict mapping (#74/ADR-0020), SandboxFailed retry counting (#76), merge mechanics, dependency unblocking (#2). This only adds the durable label mirror + boot reconcile.

Constraints (per repo workflow)

Implement with /tdd per criterion, run shell via /exec, scope to reduce.ts + reduce.test.ts + the main.ts boot-reconcile wiring + the label setup + ADR/CLAUDE.md, no new dependencies, no pushing to main.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions