Skip to content

Add defect_root_cause template: multi-reasoner quality-defect RCA#89

Open
cafzal wants to merge 5 commits into
mainfrom
claude/zen-matsumoto-6f619f
Open

Add defect_root_cause template: multi-reasoner quality-defect RCA#89
cafzal wants to merge 5 commits into
mainfrom
claude/zen-matsumoto-6f619f

Conversation

@cafzal

@cafzal cafzal commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

What

A new Manufacturing template implementing quality-defect root-cause analysis as a backward multi-reasoner chain — the inverse of bom-reachability. Given a final-test defect spike, it traces each unit's genealogy backward through the bill of materials, contrast-scores candidate factors against good units, and solves a minimal set-cover MILP for the smallest, most specific set of root causes.

Why it's complementary to the portfolio

  • Archetype: introduces abduction (effects → minimal causes). The rest of the portfolio reasons forward — allocate, schedule, forecast, trace blast-radius downstream.
  • Technique: the first set-cover / minimal-hitting-set MILP in the repo.
  • Industry: grows Manufacturing (the smallest bucket, 4 → 5), as the clean inverse of bom-reachability (same BOM, run backward from failures).

The chain (Graph → Rules → Prescriptive)

  1. Graph — transitive backward reachability over the lot genealogy (Unit.touches_lot), attributing a contaminated lot two tiers down to every finished unit that carries it.
  2. Rules — contrast scoring by defect lift vs. baseline (Factor.lift, Factor.is_suspect), screening out high-coverage / low-lift trunk factors.
  3. Prescriptive — minimal-diagnosis set-cover MILP (Factor.is_root_cause, DiagnosisResult), preferring the single deep root over the proximate sub-assemblies that merely carry it.

On the bundled corpus (2,500 units, 6.28% failure rate), the chain diagnoses {SP-0423 (paste lot), REF-02 (reflow oven)} explaining 90% of failures — correctly resisting the high-volume trunk factors that dominate a naive raw-count scan.

Verification

  • Runs end-to-end against a live engine (HiGHS); py_compile and ruff clean.
  • Runbook blind-paste-tested: a fresh agent reproduced the same diagnosis from the prompts + rai-* skills + data alone, never seeing the solution script.
  • All Factor.touches lot incidence independently checked against a pure-Python genealogy closure.

Files

  • v1/defect_root_cause/ — README, runbook, pyproject.toml (relationalai==1.13.0), defect_root_cause.py (3-stage pipeline), generate_data.py, and 10 data CSVs.
  • Manufacturing index updated in README.md and v1/README.md.

Backward RCA chain (Graph reachability -> Rules contrast scoring -> Prescriptive set-cover MILP) over a serialized electronics-assembly genealogy. Diagnoses a final-test defect spike to a minimal set of root causes -- the inverse of bom-reachability. Adds the abduction archetype and the first set-cover / minimal-diagnosis MILP to the portfolio; grows Manufacturing 4->5. Engine-validated end-to-end; runbook blind-paste-tested.
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

The docs preview for this pull request has been deployed to Vercel!

✅ Preview: https://relationalai-docs-ntpnvgjz0-relationalai.vercel.app/build/templates
🔍 Inspect: https://vercel.com/relationalai/relationalai-docs/8JuWrPoj8N4yMPauRRGzQLDiVuHt

Each named cause is now reported with its defect signature (paste->cold solder, reflow->bridging), and a kind-specific tell -- supplier and receipt date for a lot, calibration age for a machine -- read back from ontology. Adds Lot.received_date. Diagnosis unchanged; this makes the script produce the corroboration the runbook already interprets.
…porally coherent

Adds a descriptive 'when did the spike start?' examine step (failure rate by build week) ahead of the genealogy chain, and wires the data so both root causes share a week-2 incident onset: the contaminated paste lot is received then, the oven drifts then, and pre-onset units cannot consume the contaminated boards. The timeline is now a real signal (1.9 -> 6.6 -> 8.3%), and the paste lot's receipt date aligns exactly with the onset. Diagnosis unchanged in shape -- {SP-0423, REF-02} explain 120/142 (85%). Regenerated data and indexes; README and runbook resynced.
…sses PR #80 review feedback)

Rewrites the README to the sample-template standard the reviewer is applying to new templates: 'What this template is for' is now a broad-audience problem statement and motivation with a bold reasoning-type sentence; adds assumed knowledge, an Access/Tools Prerequisites split, a Model overview, and Customize subsections; and uses words and sentences rather than symbols and fragments throughout the prose. Also drops trailing commas after the last where(...) condition per the cleanup-template-code convention (a no-op; re-run confirms identical output). Validated: sample output matches the live run line-for-line, all code snippets are verbatim from the script, and the runbook reproduced blind end-to-end.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant