You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The AI review gate (ADR-0020) never produces a real verdict — its extraction pass crashes deterministically, so every review fail-safes to changes-requested and parks for a human. In practice /afk has been silently degrading to /hitl on every PR. Discovered dogfooding #80 (the reviewer ran on PR #81, pass-1 succeeded, then pass-2 crashed).
The bug
In reviewer-adapter.ts, review() runs a two-pass produce→extract flow. Pass-2 resumes the pass-1 session with an inline prompt string:
resume() carries the pass-1 run's promptArgs into this inline-prompt call, and sandcastle rejects the combination:
(FiberFailure) PromptError: promptArgs is only supported with promptFile.
Inline prompts (prompt: "...") are passed to the agent as-is — interpolate values
directly in JavaScript, or switch to promptFile to use {{KEY}} substitution.
at @ai-hero/sandcastle/src/run.ts:661
review() catches it → returns null → reviewVerdict(null) → changes-requested → WaitForHuman. Deterministic on every review. (Note: review-extraction.md has no {{KEY}} placeholders, so the extraction pass needs no promptArgs at all.)
Goal (end-to-end value)
The reviewer's extraction pass succeeds and returns the schema-validated structured verdict, so a genuine pass actually enables auto-merge instead of every PR parking for a human. The gate becomes functional, not a disguised /hitl.
Acceptance criteria
Extraction pass no longer crashes: the pass-2 resume invocation does not combine promptArgs with an inline prompt. Fix per the sandcastle error's guidance — pass extraction via promptFile (preferred: keeps review-extraction.md as the source of truth and {{KEY}}-substitution available), or pass the prompt inline with nopromptArgs. The session-resume (so the agent recalls its pass-1 review) must be preserved — do not silently fall back to a fresh run() that re-reviews from scratch.
Real verdict round-trips: a successful review returns the validated ReviewOutput ({verdict, summary, comments}); a genuine pass → EnableAutoMerge. The existing validate-then-filter (parseDiffLines/filterInlineComments) and fail-safe (null → changes-requested) are unchanged.
Fail-safe still intact: a real extraction failure (timeout, malformed <output>, schema violation) still returns null → changes-requested. Don't widen the catch into a fail-open.
Gated integration (SANDCASTLE_INTEGRATION=1, integration.test.ts): run review() against a tiny real PR branch and assert it returns a non-nullReviewOutput (proves pass-2 actually round-trips). Keep this out of the default npm test / sandcastle CI job so they stay green without network.
#80's unit tests assert the pass-1 config (maxIterations, promptArgs, name) — they never invoke the live resume(), where this bug lives. Hence the gated integration check above.
Constraints (per repo workflow)
Implement with /tdd, run shell via /exec, scope to reviewer-adapter.ts + its test (+ a gated integration case), no new dependencies, no pushing to main.
The AI review gate (ADR-0020) never produces a real verdict — its extraction pass crashes deterministically, so every review fail-safes to
changes-requestedand parks for a human. In practice/afkhas been silently degrading to/hitlon every PR. Discovered dogfooding #80 (the reviewer ran on PR #81, pass-1 succeeded, then pass-2 crashed).The bug
In
reviewer-adapter.ts,review()runs a two-pass produce→extract flow. Pass-2 resumes the pass-1 session with an inline prompt string:resume()carries the pass-1 run'spromptArgsinto this inline-prompt call, and sandcastle rejects the combination:review()catches it → returnsnull→reviewVerdict(null)→changes-requested→WaitForHuman. Deterministic on every review. (Note:review-extraction.mdhas no{{KEY}}placeholders, so the extraction pass needs no promptArgs at all.)Goal (end-to-end value)
The reviewer's extraction pass succeeds and returns the schema-validated structured verdict, so a genuine
passactually enables auto-merge instead of every PR parking for a human. The gate becomes functional, not a disguised/hitl.Acceptance criteria
promptArgswith an inline prompt. Fix per the sandcastle error's guidance — pass extraction viapromptFile(preferred: keepsreview-extraction.mdas the source of truth and {{KEY}}-substitution available), or pass the prompt inline with nopromptArgs. The session-resume (so the agent recalls its pass-1 review) must be preserved — do not silently fall back to a freshrun()that re-reviews from scratch.ReviewOutput({verdict, summary, comments}); a genuinepass→EnableAutoMerge. The existingvalidate-then-filter(parseDiffLines/filterInlineComments) and fail-safe (null→changes-requested) are unchanged.<output>, schema violation) still returnsnull→changes-requested. Don't widen the catch into a fail-open.buildReviewerPassOneConfig,maxIterations: 1) stays as-is. Mirror it with a pure config builder for pass-2 if it makes the fix testable.reduce.ts(verdict mapping / gate, Review gate before auto-merge in /afk (AI Reviewer step) #74/ADR-0020) or pass-1.Testing (highest seam — logic in TypeScript, plus one gated live check)
The bug lives in the live sandcastle resume interaction, which a pure unit test cannot fully catch — so cover it at two levels:
npm test, no Docker/network): extract abuildReviewerPassTwoConfig-style pure function (mirroringbuildReviewerPassOneConfigfrom Reviewer produce pass re-runs up to 8× — make it single-pass #80) returning the extraction invocation shape, and assert it does not passpromptArgsalongside an inline prompt (e.g. it usespromptFile, or sets nopromptArgs). Prior art:reviewer-adapter.test.ts(Reviewer produce pass re-runs up to 8× — make it single-pass #80).SANDCASTLE_INTEGRATION=1,integration.test.ts): runreview()against a tiny real PR branch and assert it returns a non-nullReviewOutput(proves pass-2 actually round-trips). Keep this out of the defaultnpm test/sandcastleCI job so they stay green without network.Why the #80 tests didn't catch this
#80's unit tests assert the pass-1 config (
maxIterations, promptArgs, name) — they never invoke the liveresume(), where this bug lives. Hence the gated integration check above.Constraints (per repo workflow)
Implement with
/tdd, run shell via/exec, scope toreviewer-adapter.ts+ its test (+ a gated integration case), no new dependencies, no pushing to main.