docker · Sayt-0 · Jul 3, 2026 · Jul 2, 2026 · Jul 2, 2026 · Jul 3, 2026
diff --git a/.github/workflows/review-pr.yml b/.github/workflows/review-pr.yml
@@ -46,6 +46,11 @@ on:
         required: false
         type: string
         default: 'moderate'
+      incremental:
+        description: "When true (default), re-reviews only the commits pushed since the last completed docker-agent review instead of the full PR diff. Falls back to a full review when no previous review exists, the history was rewritten (force-push/rebase), or the base branch was merged in. Set to false to force full reviews."
+        required: false
+        type: boolean
+        default: true
       trigger-run-id:
         description: "Workflow run ID from pr-review-trigger.yml — used to download the event context artifact. When set, resolve-context job downloads the artifact and routes to the appropriate job."
         required: false
@@ -418,6 +423,7 @@ jobs:
           exclude-paths: ${{ inputs.exclude-paths }}
           max-diff-lines: ${{ inputs.max-diff-lines }}
           confidence-threshold: ${{ inputs.confidence-threshold }}
+          incremental: ${{ inputs.incremental }}
           model: ${{ inputs.model }}
           github-token: ${{ env.GITHUB_APP_TOKEN || github.token }}
           anthropic-api-key: ${{ env.ANTHROPIC_API_KEY_FROM_SSM || secrets.ANTHROPIC_API_KEY }}

diff --git a/AGENTS.md b/AGENTS.md
@@ -39,10 +39,18 @@ Anything else here (workflows under `.github/workflows/`, scripts, tests) exists
 │   │   ├── aws-credentials.ts
 │   │   ├── github-app.ts            # Reads docker-agent-action/github-app from Secrets Manager; exports GITHUB_APP_TOKEN (a PAT) + ORG_MEMBERSHIP_TOKEN.
 │   │   └── __tests__/
+│   ├── dedupe-findings/             # Drops review comments duplicating findings from previous review cycles.
+│   │   ├── index.ts                 # CLI entry → bundled to dist/dedupe-findings.js (staged at /tmp/dedupe-findings.js for the agent).
+│   │   ├── dedupe-findings.ts       # Core dedupeComments() pure function (path + line proximity + heading similarity).
+│   │   └── __tests__/
 │   ├── filter-diff/                 # Strips excluded-path sections from a unified diff.
 │   │   ├── index.ts                 # CLI entry → bundled to dist/filter-diff.js
 │   │   ├── filter-diff.ts           # Core filterDiff() pure function + applyFilter() I/O wrapper.
 │   │   └── __tests__/
+│   ├── incremental-review/          # Narrows pr.diff to commits since the last completed review.
+│   │   ├── index.ts                 # CLI entry → bundled to dist/incremental-review.js
+│   │   ├── incremental-review.ts    # Core planIncrementalReview()/findLastReviewedSha() pure functions.
+│   │   └── __tests__/
 │   ├── score-confidence/            # Per-finding confidence scoring for the PR review pipeline.
 │   │   ├── index.ts                 # CLI entry → bundled to dist/score-confidence.js
 │   │   ├── score-confidence.ts      # Core scoreFinding()/scoreFindings() pure functions + posting policy.
@@ -172,6 +180,7 @@ The action runs untrusted input (PR titles, bodies, comments, diffs) through an
   - `pull_request` action `review_requested` when `github.event.requested_reviewer.login == 'docker-agent'`
   - `@docker-agent` mentions on PR/issue comments — these run the `.github/actions/mention-reply` handler (sets `should-reply` and builds the context prompt) and then the `review-pr/mention-reply` sub-action (referenced from a pinned SHA, not present as a local path on every commit). The `pr-review-mention-reply.yaml` agent handles the actual reply.
 - Diffs over 1500 lines are **chunked at file boundaries** in `review-pr/action.yml` (see "Split diff into chunks"). Per-file **risk scoring** (security paths, line counts, error-handling patterns) prioritizes verifier attention.
+- **Incremental reviews** (default on, `incremental` input): a re-review only diffs `last-reviewed-SHA..HEAD` instead of the full PR diff. The last reviewed SHA is read from the `commit_id` GitHub records on the bot's completed reviews (assessment or LGTM bodies only — timeout/failure fallbacks don't count), so state survives across runs with no extra writes. `src/incremental-review/` plans the mode and falls back to a full review on force-push/rebase (SHA missing or not an ancestor of HEAD), base-branch merge-ins, net-zero changes, or any error. In incremental mode the original full diff is preserved at `pr_full.diff` for stale-thread resolution and suggestion-anchor validation (GitHub validates anchors against the full PR diff). On re-reviews, `src/dedupe-findings/` (staged at `/tmp/dedupe-findings.js`, run by the agent per `posting-format.md` against the pre-fetched `/tmp/existing_review_comments.json`) drops findings matching already-posted bot comments by file path + line proximity (±3) + finding-heading similarity, so a full re-review after a rebase doesn't duplicate threads.
 - Per-finding **confidence scoring** assigns each verified finding a precise 0–100 score (band: strong/moderate/weak/negligible) from the verifier's `verdict`, `evidence_strength`, and `context_completeness`, plus drafter↔verifier severity concordance and scope. `src/score-confidence/score-confidence.ts` is the **single source of truth** for the model (weights, bands, threshold, posting policy); the "Confidence Scoring" section of `review-pr/agents/pr-review.yaml` mirrors it as a strict lookup table so the orchestrator can apply it inline (the gitignored `dist/` is not available at agent runtime). Change one, change both — the unit tests pin every value. Security and high-severity CONFIRMED/LIKELY findings are always posted regardless of score; below-threshold findings are surfaced in a summary rather than silently dropped. The inline-posting cutoff is configurable via the `confidence-threshold` action input (a band name or a number clamped to 30–100, default `moderate` = 55); the action resolves it by invoking the bundled `dist/score-confidence.js resolve-threshold` CLI (so the resolution logic stays in TypeScript, not bash) and injects it into the agent prompt, and `scoreFinding`/`scoreFindings` accept a matching `postThreshold` option.
 - Stale review threads on lines no longer in the diff are auto-resolved via GraphQL `resolveReviewThread`. Threads with no `<!-- docker-agent-review -->` marker are never touched.
 

diff --git a/review-pr/README.md b/review-pr/README.md
@@ -262,6 +262,7 @@ When using `docker/docker-agent-action/.github/workflows/review-pr.yml`:
 | `model`             | Model override (e.g., `anthropic/claude-haiku-4-5`)                    | -       |
 | `add-prompt-files`  | Comma-separated files to append to the prompt                          | -       |
 | `confidence-threshold` | Min confidence to post a finding inline: band (`strong`/`moderate`/`medium`/`weak`) or a number (clamped to 30–100) | `moderate` |
+| `incremental`       | Review only the commits pushed since the last completed review (falls back to a full review when unsafe)               | `true`  |
 
 ### `review-pr` (Composite Action)
 
@@ -285,6 +286,7 @@ PR number and comment ID are auto-detected from `github.event` when not provided
 | `github-token`             | GitHub token                                                     | No       |
 | `add-prompt-files`         | Comma-separated files to append to the prompt                    | No       |
 | `confidence-threshold`     | Min confidence to post a finding inline: band (`strong`/`moderate`/`medium`/`weak`) or a number clamped to 30–100 (default `moderate`) | No       |
+| `incremental`              | Review only commits since the last completed review (default `true`; falls back to full when unsafe) | No       |
 
 \*API keys are optional when using the reusable workflow (credentials are fetched via OIDC). Only required when using the composite action directly without OIDC.
 
@@ -357,6 +359,44 @@ AGENTS.md + PR Diff → Drafter (hypotheses) → Verifier (confirm + evidence si
                     → Confidence score (0–100) → Post Comments
 ```
 
+### Incremental Reviews
+
+By default, a re-review only covers the **commits pushed since the last completed
+review** instead of re-reviewing the full PR diff. This saves tokens, avoids
+duplicate comments, and skips code that has not changed since the previous cycle.
+
+How it works:
+
+1. The last reviewed commit is read from the metadata GitHub records on every
+   posted review (`commit_id` on the Reviews API is the PR head SHA at posting
+   time), so the state survives across workflow runs with no extra bookkeeping.
+   Only completed reviews count — a timed-out or failed run does not mark
+   commits as reviewed.
+2. The diff handed to the agent becomes `git diff <last-reviewed-sha>..HEAD`,
+   restricted to files that are still part of the full PR diff (so inline
+   comments never anchor outside what GitHub accepts).
+3. The action **falls back to a full review** whenever incremental diffing
+   would be unsafe or meaningless:
+
+| Situation | Behavior |
+| --------- | -------- |
+| No previous completed review               | Full review |
+| Force-push or rebase rewrote the history   | Full review |
+| Base branch merged into the PR since last review | Full review |
+| Last reviewed SHA missing from the clone   | Full review |
+| Changes since last review cancel out       | Full review |
+| `incremental: false` input                 | Full review |
+
+4. On any full re-review, findings are **deduplicated** against the comments
+   already posted on the PR (matched by file path, line proximity, and finding
+   heading similarity — see [`src/dedupe-findings/`](../src/dedupe-findings/dedupe-findings.ts)),
+   so a rebase does not produce duplicate threads. The plumbing that decides
+   between incremental and full mode is implemented and unit-tested in
+   [`src/incremental-review/`](../src/incremental-review/incremental-review.ts).
+
+Set `incremental: false` (workflow or action input) to force a full review on
+every trigger.
+
 Each verified finding gets a precise **confidence score** (0–100) and a band
 (strong / moderate / weak / negligible), computed deterministically from the
 verifier's verdict, evidence strength, and context completeness, plus the

diff --git a/review-pr/action.yml b/review-pr/action.yml
@@ -74,6 +74,10 @@ inputs:
     description: "Minimum confidence for posting a finding as an inline comment. Accepts a band name (`strong`=80, `moderate`/`medium`=55, `weak`=30) or a number (clamped to 30-100; the weak-band floor is the lowest meaningful cutoff, so e.g. `10` is raised to `30`). Findings that score below the threshold (and are not security or high-severity) are collapsed into the lower-confidence summary instead of posted inline; they are never silently dropped. Security findings and high-severity CONFIRMED/LIKELY findings are always posted regardless. Default: moderate (55), which preserves the prior behavior."
     required: false
     default: "moderate"
+  incremental:
+    description: "When 'true' (default), re-reviews only the commits pushed since the last completed docker-agent review instead of the full PR diff. Falls back to a full review when no previous review exists, the history was rewritten (force-push/rebase), or the base branch was merged in. Set to 'false' to force a full review."
+    required: false
+    default: "true"
 
 outputs:
   exit-code:
@@ -266,6 +270,25 @@ runs:
           fi
         fi
 
+    # Incremental review: when a previous completed review exists, narrow pr.diff
+    # to the commits pushed since it. The original full diff is preserved at
+    # pr_full.diff for stale-thread resolution and suggestion-anchor validation.
+    # All decision logic (state tracking via review commit_id metadata, rebase and
+    # force-push detection, base-merge detection) lives in src/incremental-review.
+    - name: Compute incremental review range
+      if: hashFiles('pr.diff') != '' && steps.lock-check.outputs.skip != 'true'
+      id: incremental
+      shell: bash
+      env:
+        ACTION_PATH: ${{ github.action_path }}
+        GITHUB_TOKEN: ${{ steps.resolve-token.outputs.token }}
+        PR_NUMBER: ${{ steps.resolve-context.outputs.pr-number }}
+        INCREMENTAL: ${{ inputs.incremental }}
+      run: |
+        BASE_REF=$(jq -r '.baseRefName // ""' pr_metadata.json 2>/dev/null || echo "")
+        export BASE_REF
+        node "$ACTION_PATH/../dist/incremental-review.js" pr.diff
+
     - name: Filter excluded paths from diff
       if: hashFiles('pr.diff') != '' && inputs.exclude-paths != ''
       shell: bash
@@ -398,8 +421,14 @@ runs:
         # A. Parse diff → file:line set
         # Tracks current file from "diff --git" headers, line numbers from @@ hunks,
         # and emits "file:line" for each added (+) line.
-        if [ ! -f pr.diff ]; then
-          echo "ℹ️  No pr.diff found — skipping stale thread resolution"
+        # In incremental mode pr.diff only covers commits since the last review, so
+        # staleness is judged against the preserved full PR diff (pr_full.diff).
+        DIFF_FILE=pr.diff
+        if [ -f pr_full.diff ]; then
+          DIFF_FILE=pr_full.diff
+        fi
+        if [ ! -f "$DIFF_FILE" ]; then
+          echo "ℹ️  No $DIFF_FILE found — skipping stale thread resolution"
           exit 0
         fi
 
@@ -426,7 +455,7 @@ runs:
           }
           /^ / { line++ }
           /^-/ { next }
-        ' pr.diff | sort -u > $DIFF_LINES_FILE
+        ' "$DIFF_FILE" | sort -u > $DIFF_LINES_FILE
 
         DIFF_LINE_COUNT=$(wc -l < $DIFF_LINES_FILE | tr -d ' ')
         echo "📄 Found $DIFF_LINE_COUNT unique file:line pairs in diff"
@@ -709,6 +738,8 @@ runs:
         REPO: ${{ github.repository }}
         CONFIDENCE_SCORE: ${{ steps.resolve-confidence.outputs.score }}
         CONFIDENCE_LABEL: ${{ steps.resolve-confidence.outputs.label }}
+        INCREMENTAL_MODE: ${{ steps.incremental.outputs.mode }}
+        LAST_REVIEWED_SHA: ${{ steps.incremental.outputs.last-reviewed-sha }}
       run: |
         title=$(jq -r '.title' pr_metadata.json)
         author=$(jq -r '.author.login' pr_metadata.json)
@@ -740,6 +771,15 @@ runs:
           echo "- **Inline confidence threshold**: ${CONFIDENCE_SCORE}/100 (${CONFIDENCE_LABEL} band). When applying the Confidence Scoring posting policy, post a non-forced finding as an inline comment only when its confidence score is **${CONFIDENCE_SCORE} or higher**. A finding that is in scope and not dismissed but scores **below ${CONFIDENCE_SCORE}** must NOT be posted inline — surface it under \"Lower-confidence findings (not posted inline)\" in the review body instead, exactly as the posting policy already prescribes for below-threshold findings (the policy's negligible-band rules still govern what is summarized vs. dropped)."
           echo "- This threshold does NOT override the security floor or the high-severity always-post rule: \`security\` findings and high-severity CONFIRMED/LIKELY findings are ALWAYS posted inline regardless of the threshold."
           echo ""
+          if [ "$INCREMENTAL_MODE" = "incremental" ]; then
+            echo "## Incremental Review"
+            echo ""
+            echo "This is an INCREMENTAL review: \`pr.diff\` contains only the changes pushed since the last completed review (commit \`${LAST_REVIEWED_SHA}\`). Earlier commits were already reviewed — do NOT re-review them or fetch the full PR diff yourself."
+            echo "- The \"Changed Files\" list above still describes the whole PR; review only the files present in \`pr.diff\`."
+            echo "- Start the review body with a note that this review covers only the commits since \`${LAST_REVIEWED_SHA:0:12}\`, and scope the assessment label to those changes."
+            echo "- If \`pr.diff\` is missing, do NOT fall back to fetching the full diff — post a COMMENT review explaining the incremental diff was unavailable."
+            echo ""
+          fi
           echo "---"
           echo ""
           echo "## Instructions"
@@ -786,6 +826,34 @@ runs:
         else
           echo "::warning::validate-suggestions.js not found in dist — suggestion validation will be skipped"
         fi
+        # Stage the finding deduplicator the same way (see posting-format.md).
+        if [ -f "$ACTION_PATH/../dist/dedupe-findings.js" ]; then
+          cp "$ACTION_PATH/../dist/dedupe-findings.js" /tmp/dedupe-findings.js
+        else
+          echo "::warning::dedupe-findings.js not found in dist — finding deduplication will be skipped"
+        fi
+
+    # Pre-fetch the PR's existing inline review comments so the agent can drop
+    # findings already posted in a previous review cycle (the dedupe CLI is
+    # fail-open: a missing or empty file simply skips deduplication).
+    - name: Fetch existing review comments
+      if: steps.lock-check.outputs.skip != 'true'
+      continue-on-error: true
+      shell: bash
+      env:
+        GH_TOKEN: ${{ steps.resolve-token.outputs.token }}
+        PR_NUMBER: ${{ steps.resolve-context.outputs.pr-number }}
+        REPO: ${{ github.repository }}
+      run: |
+        set -o pipefail
+        if gh api --paginate "repos/$REPO/pulls/$PR_NUMBER/comments" \
+          | jq -s 'map(if type == "array" then . else [.] end) | add // []' \
+          > /tmp/existing_review_comments.json; then
+          echo "✅ Fetched $(jq 'length' /tmp/existing_review_comments.json) existing review comment(s)"
+        else
+          rm -f /tmp/existing_review_comments.json
+          echo "::warning::Failed to fetch existing review comments — deduplication will be skipped"
+        fi
 
     # ========================================
     # RUN REVIEW using root docker-agent-action

diff --git a/review-pr/agents/pr-review.yaml b/review-pr/agents/pr-review.yaml
@@ -242,7 +242,10 @@ agents:
            security findings (review manually)" citing the verifier's stated mitigation.
          Before posting, verify each inline comment per Confidence Scoring: its body ends with a
          valid confidence table, and no non-forced comment scores below the inline threshold T.
-         Then post the review.
+         Then run the deduplication step from the posting template
+         (`node /tmp/dedupe-findings.js /tmp/review_comments.json /tmp/existing_review_comments.json`)
+         so findings already posted in a previous review cycle are not posted twice, and post
+         the review.
       10. Order inline comments forced-first: list the forced comments (high-severity and
           security CONFIRMED/LIKELY) first, then the remaining (non-forced) comments by confidence,
           highest first — so a forced finding is never buried beneath a higher-scoring non-forced
@@ -462,8 +465,10 @@ agents:
       otherwise keep the suggestion as prose. See `/tmp/refs/posting-format.md` for the exact
       field layout. GitHub is strict: a suggestion anchored outside the diff or on a deleted
       line makes the WHOLE review fail (422). The mandatory validation step in the posting
-      template (`node /tmp/validate-suggestions.js /tmp/review_comments.json pr.diff`) strips
-      any malformed suggestion block before posting — run it; do not skip it.
+      template strips any malformed suggestion block before posting — run it; do not skip it.
+      It validates against `pr_full.diff` when that file exists (incremental mode preserves
+      the full PR diff there — the diff GitHub validates anchors against) and `pr.diff`
+      otherwise; the posting template selects the right file.
 
       ## Domain-Specific Review Output