Skip to content

AI review gate non-functional: pass-2 extraction crashes (promptArgs + inline resume prompt) #82

Description

@lsfera

The AI review gate (ADR-0020) never produces a real verdict — its extraction pass crashes deterministically, so every review fail-safes to changes-requested and parks for a human. In practice /afk has been silently degrading to /hitl on every PR. Discovered dogfooding #80 (the reviewer ran on PR #81, pass-1 succeeded, then pass-2 crashed).

The bug

In reviewer-adapter.ts, review() runs a two-pass produce→extract flow. Pass-2 resumes the pass-1 session with an inline prompt string:

const extractResult = reviewResult.resume
  ? await reviewResult.resume(
      await loadPromptFile(join(_dir, "review-extraction.md")), // inline prompt STRING
      { output: Output.object({ tag: "output", schema: reviewOutputSchema }), cwd },
    )
  : await run({ /* fallback: promptFile, output, ... */ });

resume() carries the pass-1 run's promptArgs into this inline-prompt call, and sandcastle rejects the combination:

(FiberFailure) PromptError: promptArgs is only supported with promptFile.
Inline prompts (prompt: "...") are passed to the agent as-is — interpolate values
directly in JavaScript, or switch to promptFile to use {{KEY}} substitution.
  at @ai-hero/sandcastle/src/run.ts:661

review() catches it → returns nullreviewVerdict(null)changes-requestedWaitForHuman. Deterministic on every review. (Note: review-extraction.md has no {{KEY}} placeholders, so the extraction pass needs no promptArgs at all.)

Goal (end-to-end value)

The reviewer's extraction pass succeeds and returns the schema-validated structured verdict, so a genuine pass actually enables auto-merge instead of every PR parking for a human. The gate becomes functional, not a disguised /hitl.

Acceptance criteria

  • Extraction pass no longer crashes: the pass-2 resume invocation does not combine promptArgs with an inline prompt. Fix per the sandcastle error's guidance — pass extraction via promptFile (preferred: keeps review-extraction.md as the source of truth and {{KEY}}-substitution available), or pass the prompt inline with no promptArgs. The session-resume (so the agent recalls its pass-1 review) must be preserved — do not silently fall back to a fresh run() that re-reviews from scratch.
  • Real verdict round-trips: a successful review returns the validated ReviewOutput ({verdict, summary, comments}); a genuine passEnableAutoMerge. The existing validate-then-filter (parseDiffLines/filterInlineComments) and fail-safe (nullchanges-requested) are unchanged.
  • Fail-safe still intact: a real extraction failure (timeout, malformed <output>, schema violation) still returns nullchanges-requested. Don't widen the catch into a fail-open.
  • Builds on Reviewer produce pass re-runs up to 8× — make it single-pass #80: the pass-1 single-iteration fix (buildReviewerPassOneConfig, maxIterations: 1) stays as-is. Mirror it with a pure config builder for pass-2 if it makes the fix testable.
  • Don't touch reduce.ts (verdict mapping / gate, Review gate before auto-merge in /afk (AI Reviewer step) #74/ADR-0020) or pass-1.

Testing (highest seam — logic in TypeScript, plus one gated live check)

The bug lives in the live sandcastle resume interaction, which a pure unit test cannot fully catch — so cover it at two levels:

  • Pure seam (default npm test, no Docker/network): extract a buildReviewerPassTwoConfig-style pure function (mirroring buildReviewerPassOneConfig from Reviewer produce pass re-runs up to 8× — make it single-pass #80) returning the extraction invocation shape, and assert it does not pass promptArgs alongside an inline prompt (e.g. it uses promptFile, or sets no promptArgs). Prior art: reviewer-adapter.test.ts (Reviewer produce pass re-runs up to 8× — make it single-pass #80).
  • Gated integration (SANDCASTLE_INTEGRATION=1, integration.test.ts): run review() against a tiny real PR branch and assert it returns a non-null ReviewOutput (proves pass-2 actually round-trips). Keep this out of the default npm test / sandcastle CI job so they stay green without network.

Why the #80 tests didn't catch this

#80's unit tests assert the pass-1 config (maxIterations, promptArgs, name) — they never invoke the live resume(), where this bug lives. Hence the gated integration check above.

Constraints (per repo workflow)

Implement with /tdd, run shell via /exec, scope to reviewer-adapter.ts + its test (+ a gated integration case), no new dependencies, no pushing to main.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions