Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions data-fabrication-anomaly-assistant/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Data Fabrication Anomaly Assistant

This is a focused AI-Powered Research Assistant Suite slice for issue #16. It reviews synthetic manuscript data packets before AI peer-review output is trusted and emits release, review, or hold decisions for data-forensics red flags.

The assistant checks:

- repeated measurement rows that need raw-data verification
- invalid collection timestamps that cannot be audited
- terminal-digit preference that suggests rounding or synthetic entry
- unusually smooth measurement series
- perfect group separation that needs preregistered exclusion and raw-row provenance

All fixtures are synthetic. The module does not call external AI services, publisher APIs, private datasets, credential stores, payment systems, or live manuscript systems.

## Run

```sh
npm run check
npm test
npm run demo
```

The demo writes reviewer artifacts under `reports/`.
31 changes: 31 additions & 0 deletions data-fabrication-anomaly-assistant/demo.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import fs from "node:fs";
import path from "node:path";
import {
evaluateFabricationAnomalyPacket,
renderSummarySvg,
summarizeReview,
} from "./src/assistant.js";
import { cleanPacket, riskyPacket } from "./src/samplePackets.js";

const reportsDir = path.join(process.cwd(), "reports");
fs.mkdirSync(reportsDir, { recursive: true });

const clean = evaluateFabricationAnomalyPacket(cleanPacket);
const risky = evaluateFabricationAnomalyPacket(riskyPacket);

fs.writeFileSync(
path.join(reportsDir, "clean-review.json"),
`${JSON.stringify(clean, null, 2)}\n`,
);
fs.writeFileSync(
path.join(reportsDir, "risky-review.json"),
`${JSON.stringify(risky, null, 2)}\n`,
);
fs.writeFileSync(path.join(reportsDir, "risky-review.md"), summarizeReview(risky));
fs.writeFileSync(path.join(reportsDir, "summary.svg"), renderSummarySvg(risky));

console.log("Wrote data fabrication anomaly assistant reports:");
console.log("- reports/clean-review.json");
console.log("- reports/risky-review.json");
console.log("- reports/risky-review.md");
console.log("- reports/summary.svg");
50 changes: 50 additions & 0 deletions data-fabrication-anomaly-assistant/make-demo-video.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import { spawnSync } from "node:child_process";
import path from "node:path";
import { riskyPacket } from "./src/samplePackets.js";
import { evaluateFabricationAnomalyPacket } from "./src/assistant.js";

const reportsDir = path.join(process.cwd(), "reports");
const demoMp4 = path.join(reportsDir, "demo.mp4");
const resultPacket = evaluateFabricationAnomalyPacket(riskyPacket);

function escapeDrawtext(text) {
return text.replaceAll("\\", "\\\\").replaceAll(":", "\\:").replaceAll("'", "\\'");
}

const font = "C\\:/Windows/Fonts/arial.ttf";
const lines = [
"Data Fabrication Anomaly Assistant",
`Decision ${resultPacket.decision} | Findings ${resultPacket.findingCount}`,
"Flags duplicate rows, invalid timestamps, digit preference, smooth series, and perfect separation.",
`Audit digest ${resultPacket.auditDigest}`,
];
const drawText = lines
.map(
(line, index) =>
`drawtext=fontfile='${font}':text='${escapeDrawtext(line)}':x=48:y=${64 + index * 72}:fontsize=${index === 0 ? 34 : 24}:fontcolor=${index === 1 ? "0xffdddd" : "white"}`,
)
.join(",");

const result = spawnSync(
"ffmpeg",
[
"-y",
"-f",
"lavfi",
"-i",
"color=c=0x111827:s=960x540:r=12",
"-t",
"4",
"-vf",
`${drawText},format=yuv420p`,
"-an",
demoMp4,
],
{ encoding: "utf8" },
);

if (result.status !== 0) {
throw new Error(result.stderr || "ffmpeg failed to render demo.mp4");
}

console.log(`Wrote ${path.relative(process.cwd(), demoMp4)}`);
13 changes: 13 additions & 0 deletions data-fabrication-anomaly-assistant/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"name": "data-fabrication-anomaly-assistant",
"version": "1.0.0",
"private": true,
"description": "Synthetic data-forensics review assistant for SCIBASE AI research review packets.",
"type": "module",
"scripts": {
"check": "node --check src/assistant.js && node --check src/samplePackets.js && node --check test.js && node --check demo.js && node --check make-demo-video.js",
"test": "node test.js",
"demo": "node demo.js",
"demo:video": "node make-demo-video.js"
}
}
9 changes: 9 additions & 0 deletions data-fabrication-anomaly-assistant/reports/clean-review.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"packetId": "clean-growth-study",
"manuscriptTitle": "Enzyme response in replicated plant growth chambers",
"decision": "RELEASE",
"findingCount": 0,
"findings": [],
"auditDigest": "5db12124295e2085",
"assistantScope": "Synthetic data-fabrication anomaly red flags before AI peer-review output is trusted."
}
Binary file not shown.
45 changes: 45 additions & 0 deletions data-fabrication-anomaly-assistant/reports/risky-review.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
{
"packetId": "risky-cytokine-study",
"manuscriptTitle": "Cytokine response claims from a disputed assay batch",
"decision": "HOLD",
"findingCount": 5,
"findings": [
{
"id": "duplicate-measurement-rows",
"severity": "high",
"title": "Repeated measurement rows need raw-data verification",
"evidence": "1 duplicate row pair(s): #12/#13",
"action": "Hold AI review output until source instruments, audit trail, and import logs explain the repeated rows."
},
{
"id": "invalid-collection-timestamps",
"severity": "high",
"title": "Collection timestamps are not machine-auditable",
"evidence": "Invalid timestamp rows: #14",
"action": "Require normalized ISO-8601 collection timestamps before the assistant cites this dataset."
},
{
"id": "terminal-digit-preference",
"severity": "medium",
"title": "Terminal digit distribution is unusually concentrated",
"evidence": "50% of numeric measurements end in 0",
"action": "Ask for raw instrument exports or an explanation of rounding rules before trusting fine-grained effects."
},
{
"id": "over-smooth-measurement-series",
"severity": "medium",
"title": "Measurement series is smoother than expected for independent observations",
"evidence": "control delta sd/value sd=0",
"action": "Route to statistical reviewer for instrument drift, interpolation, or synthetic-row checks."
},
{
"id": "perfect-between-group-separation",
"severity": "medium",
"title": "Groups separate perfectly without overlap",
"evidence": "control: n=6, range=10-12.5; treated: n=8, range=20-23",
"action": "Require preregistered exclusion rules and raw row provenance before strong causal or classification claims are released."
}
],
"auditDigest": "6466f6cc655b23a7",
"assistantScope": "Synthetic data-fabrication anomaly red flags before AI peer-review output is trusted."
}
22 changes: 22 additions & 0 deletions data-fabrication-anomaly-assistant/reports/risky-review.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Data fabrication anomaly review: Cytokine response claims from a disputed assay batch

Decision: **HOLD**
Audit digest: `6466f6cc655b23a7`

## Findings

- **HIGH** Repeated measurement rows need raw-data verification (duplicate-measurement-rows)
Evidence: 1 duplicate row pair(s): #12/#13
Action: Hold AI review output until source instruments, audit trail, and import logs explain the repeated rows.
- **HIGH** Collection timestamps are not machine-auditable (invalid-collection-timestamps)
Evidence: Invalid timestamp rows: #14
Action: Require normalized ISO-8601 collection timestamps before the assistant cites this dataset.
- **MEDIUM** Terminal digit distribution is unusually concentrated (terminal-digit-preference)
Evidence: 50% of numeric measurements end in 0
Action: Ask for raw instrument exports or an explanation of rounding rules before trusting fine-grained effects.
- **MEDIUM** Measurement series is smoother than expected for independent observations (over-smooth-measurement-series)
Evidence: control delta sd/value sd=0
Action: Route to statistical reviewer for instrument drift, interpolation, or synthetic-row checks.
- **MEDIUM** Groups separate perfectly without overlap (perfect-between-group-separation)
Evidence: control: n=6, range=10-12.5; treated: n=8, range=20-23
Action: Require preregistered exclusion rules and raw row provenance before strong causal or classification claims are released.
11 changes: 11 additions & 0 deletions data-fabrication-anomaly-assistant/reports/summary.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading