Skip to content

Reviewer produce pass re-runs up to 8× — make it single-pass #80

Description

@lsfera

The AI reviewer's produce pass re-runs the entire review up to 8 times per PR because it has no completion signal. A review is a single-pass job — read the diff, check acceptance criteria, emit a verdict — so the redundant iterations waste tokens/time and, by lengthening each review, widen the window to hit the transient claude-code exited with code 1 that already parked PR #77 (#76's review) via the fail-safe. This makes the gate (ADR-0020) fragile in practice.

Evidence

A live run of ReviewerAdapter.review() against an open PR produced this log:

Iteration 1/8 … Agent started … [complete review + verdict] … Agent stopped
Iteration 2/8 … Agent started … [re-reviews the whole diff again]

reviewer-adapter.ts pass-1 calls run({ …, maxIterations: 8 }) with no completionSignal, so sandcastle keeps re-invoking the agent toward a completion it never declares. The "final" verdict becomes whatever the last iteration emitted, not the first complete one.

Goal (end-to-end value)

The reviewer's produce pass runs exactly one complete review, then hands off to the existing extraction pass. Reviews are ~8× faster/cheaper and far less likely to hit a mid-run transient, while the verdict and fail-safe behavior are unchanged.

Acceptance criteria

  • Single-pass produce: reviewer-adapter.ts pass-1 stops after one complete review — either maxIterations: 1, or a completionSignal the prompt emits (reference run-with-extraction pattern). Prefer whichever keeps the adapter simplest; document the choice in a code comment.
  • Pass-2 extraction unchanged: the resume→Output.object extraction (and the !reviewResult.resume fallback to a fresh run) is untouched.
  • Fail-safe unchanged: any pass-1/pass-2 exception still returns nullreviewVerdict(null)changes-requested (never auto-merge). Do not touch reduce.ts.
  • Verdict quality preserved: one pass is sufficient — the agent reads the diff + ADRs + acceptance criteria and emits a verdict in a single run (as observed in the evidence log).
  • Verify: a live/integration run shows exactly one Iteration 1/1 (or one completion-signalled pass) in .sandcastle/logs/…-review-issue-*.log, not a re-review.

Testing (highest, single seam — logic in TypeScript)

The adapter spawns a live sandbox, so the testable seam is the pure run-config builder, mirroring buildAgentInput in sandbox-runner.ts:

  • Extract a small pure function that returns the pass-1 run options (model, sandbox, promptFile, name, and the iteration/completion config) from the adapter's inputs — no Docker, no network.
  • Unit-test (in a *.test.ts using node:test + node:assert/strict) that the produce-pass config caps at a single pass (maxIterations === 1 or a non-empty completionSignal). Prior art: buildAgentInput is already a pure builder tested without spawning.
  • Keep live reviewer execution behind the gated integration.test.ts (SANDCASTLE_INTEGRATION=1); default npm test and the sandcastle CI job stay green. Do not add a new seam beyond the pure config builder.

Unchanged (do not touch)

reduce.ts (verdict mapping / review gate, #74/ADR-0020), the extraction pass, parseDiffLines/filterInlineComments, the noSandbox() + permissionMode: "auto" choices, and the SandboxFailed retry (#76).

Constraints (per repo workflow)

Implement with /tdd, run shell via /exec, scope to reviewer-adapter.ts + the new pure builder + its test (+ a one-line code comment), no new dependencies, no pushing to main.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions