Add structured abstract consistency assistant#419
Conversation
|
Hardening update pushed in 4ac8b8c: preserved same-code findings for different evidence targets, so methods and results sample-size mismatches are both reported instead of one being collapsed. Validation refreshed locally: npm run check, npm test (4 tests), npm run demo, npm run video, ffprobe on reports/demo.mp4, git diff --check, and a sensitive-term scan with no matches. |
|
Hardening update pushed in 5e04a92: structured abstract results now must name the actual primary endpoint instead of passing on generic I added a regression that failed before the fix with Validation refreshed locally:
|
|
Follow-up competitive hardening pass for the structured abstract consistency assistant. What changed:
Validation:
|
|
Follow-up competitive hardening pass for the structured abstract consistency assistant. What changed in 8608676:
Validation refreshed locally:
|
|
Follow-up competitive hardening pass for the structured abstract consistency assistant. What changed in 8388acc:
Validation refreshed locally:
|
|
Follow-up competitive hardening pass for the structured abstract consistency assistant. What changed in
Validation refreshed locally:
|
|
Follow-up competitive hardening pass for the structured abstract consistency assistant. What changed in
Validation refreshed locally:
|
|
Follow-up competitive hardening pass for the structured abstract consistency assistant. What changed in
Validation refreshed locally:
|
|
Follow-up competitive hardening pass for the structured abstract consistency assistant. What changed in
Validation refreshed locally:
|
|
Follow-up competitive hardening pass for the structured abstract consistency assistant. What changed in
Validation refreshed locally:
|
|
Follow-up competitive hardening pass for the structured abstract consistency assistant. What changed in
Validation refreshed locally:
|
|
Follow-up competitive hardening pass for the structured abstract consistency assistant. What changed in
Why this matters:
Validation refreshed locally:
|
|
Follow-up competitive hardening pass for the structured abstract consistency assistant. What changed in
Why this matters:
Validation refreshed locally:
|
|
Follow-up competitive hardening pass for the structured abstract consistency assistant. What changed in
Why this matters:
Validation refreshed locally:
|
|
Follow-up competitive hardening pass for the structured abstract consistency assistant. What changed in
Why this matters:
Validation refreshed locally:
|
|
Follow-up competitive hardening pass for the structured abstract consistency assistant. What changed in
Why this matters:
Validation refreshed locally:
|
|
Pushed a targeted hardening update in What changed:
Fresh validation:
|
|
Hardening update pushed in What changed:
Why this matters:
Validation refreshed locally:
|
|
Hardening update pushed in What changed:
Fresh validation:
|
|
Hardening update pushed in Validation refreshed locally:
|
|
Hardening update pushed in Fresh validation from
|
|
Pushed a focused hardening commit for source endpoint reconciliation: New regression now blocks release when methods evidence and results evidence name different primary endpoints, emits |
|
Pushed focused hardening commit ce6eb35 for missing source methods primary-endpoint evidence. New regression now holds release with MISSING_METHODS_ENDPOINT when methods evidence omits primaryEndpoint while results/abstract evidence names the endpoint, routes attach_source_evidence, and marks methodsAligned=false. Verification passed: red regression captured first, npm test (32), npm run check, npm run demo, npm run video, 20 JSON packet parses including missing-methods-endpoint-packet.json, ffprobe H.264 1280x720 24fps 7.5s, diff checks, staged allowlist check, and focused restricted-string scan. PR body has the refreshed evidence list. |
|
Pushed focused hardening commit Before the patch, Verified locally after the TDD regression:
This is a cooldown/comment-budget exception because the gap was a newly documented high-severity crash before AI peer-review gating. |
|
Hardening update pushed in Fresh validation from
|
|
Hi SCIBASE maintainers — quick check-in on this bounty claim. This PR is still ready from my side: the latest hardening keeps the structured-abstract release gate conservative around invalid assessment timestamps, and the thread has the focused test/demo evidence. Is there anything specific you'd like changed, simplified, or clarified to make review/selection easier? Happy to adjust quickly. Thanks! |
|
Hardened structured-abstract assessment timestamps in Fresh gap closed:
Verification refreshed from
|
|
Quick refresh on the current hardened head for this PR: the structured abstract consistency assistant is still ready from my side. The package gates AI peer-review/editor packet release on source evidence, endpoint and sample-size alignment, result/conclusion direction, limitation language, and strict assessment timestamp validity. Current local validation from the package directory: I am leaving the head stable unless maintainers would prefer a narrower shape or specific clarification for review. |
Portfolio Comparison Refresh (2026-06-27)
1884f22; no GitHub check runs or status contexts are attached, and Algora remains Pending with Total paid$0./claim #16 ## Summary Adds a distinct
structured-abstract-consistency-assistant/slice for Scientific Bounty System issue #16. The assistant evaluates structured manuscript abstracts before AI peer-review packets or editor summaries are shown. It checks required abstract sections, affirmed methods design, negated design wording, source methods/results endpoint availability and reconciliation, negated primary endpoint wording, target-specific sample-size counts, primary endpoint naming, bidirectional result direction, certainty overclaims under exploratory or null-crossing evidence, limitation language, safety/adverse-outcome wording, and deterministic audit evidence. ## Hardening Updates - Holds abstracts when source methods/results evidence packets are missing, so polished abstract text cannot release without authoritative comparison data. - Holds otherwise complete abstracts with invalid or missingassessedAttimestamps, so AI peer-review packets cannot release with unauditable structured-abstract timing evidence. - Holds ISO-looking but calendar-impossibleassessedAttimestamps such as2026-02-30T10:00:00Z, so JavaScript date normalization cannot release AI peer-review packets with shifted timing evidence. - Holds present source methods evidence packets that omitprimaryEndpoint, so abstract/results alignment cannot release without a named methods endpoint anchor. - Holds present source results evidence packets that omitprimaryEndpoint, so generic primary-endpoint claims cannot release without a named results evidence anchor. - Holds methods/results source endpoint disagreements, so an abstract cannot release by following only one source evidence packet. - Holds abstracts that mention the expected methods design only to deny it, such asnot a retrospective cohort, so substring matches cannot release contradictory method summaries. - Holds ordinal measurement wording such as96th percentilewhen it appears where manuscript/participant counts are required. - Holds hyphenated measurement wording such as96-houror96-pointwhen it appears where sample-size counts are required. - Holds duration/effect measurements such as96 hoursor96 minuteswhen they masquerade as count evidence. - Holds abbreviated scientific/time units such as96 h,96 mg, and96 mmHgwhen they masquerade as count evidence. - Holds decimal values such as0.96and percentage wording such as96%unless the actual count is also stated. - Accepts normal comma-formatted counts such as1,200as valid count evidence. - Blocks generic primary-endpoint wording unless the named endpoint from the result packet is present. - Holds abstracts that mention the expected primary endpoint only to deny it, such asnot comment triage time, so substring matches cannot release contradictory result summaries. - Blocks result and conclusion direction drift in both directions, including improvement claims over no-effect/worse evidence and worse/no-effect wording over improved evidence. - Blocks no-difference and equivalence outcome wording such asno meaningful difference,no superiority, oroutcomes were comparablewhen the result packet records improvement. - Blocks result-section or conclusion-section certainty overclaims such as statistically significant, clinically meaningful, robust, proven, or definitive when evidence is exploratory or confidence intervals cross null. - Treats mixed phrasing such asnot statistically significant but clinically meaningfulas an overclaim instead of clearing it through the negated significant phrase. - Requires specific limitation language beyond weak hedges such asmaywhen evidence is exploratory or null-crossing. - Allows accurate adverse-outcome wording when the result packet also records a worse direction, while blocking safety-benefit or negated-safety-concern conclusions over worsened adverse outcomes. ## Non-overlap This is scoped to structured abstract consistency before AI review release. It does not duplicate broad assistant suites, evidence/protocol trace modules, statistics review, research-gap planning, rebuttal packs, ethics/data review, citation context, reporting guidelines, benchmark leakage, figure/table consistency, analysis-variable provenance, domain templates, grant fit, limitations disclosure, uncertainty calibration, supplement readiness, prompt safety, study power, COI/funding, retraction, preregistration, external validity, image integrity, assay-control/calibration, literature freshness, randomization/blinding, Bayesian prior sensitivity, systematic screening drift, sample chain-of-custody, or model-assumption diagnostics slices. ## Validation - Latest impossible-calendar timestamp regression failed before implementation withrelease_peer_review_packetinstead ofhold_peer_review_packetwhenassessedAtwas2026-02-30T10:00:00Z. - Latest invalid-assessment-timestamp regression failed before implementation withrelease_peer_review_packetinstead ofhold_peer_review_packetwhen an otherwise complete abstract usedassessedAt: not-a-date. - Latest missing-methods-endpoint regression failed before implementation withrelease_peer_review_packetinstead ofhold_peer_review_packetwhen source methods evidence omittedprimaryEndpointbut the abstract/results evidence namedcomment triage time. - Latest source-endpoint regression failed before implementation withrelease_peer_review_packetinstead ofhold_peer_review_packetwhen methods evidence namedreviewer workload scoreand results evidence namedcomment triage time. - Latest missing-results-endpoint regression failed before implementation withrelease_peer_review_packetinstead ofhold_peer_review_packetwhen source results evidence omittedprimaryEndpointbut the abstract claimed the primary endpoint improved. - Latest no-difference/equivalence outcome regression failed before implementation withrelease_peer_review_packetinstead ofhold_peer_review_packetwhen improved result evidence was summarized asno meaningful difference/ comparable outcomes. - Prior abbreviated-unit regression failed before implementation withrelease_peer_review_packetinstead ofhold_peer_review_packetwhen abstract methods/results used96 hduration wording where count evidence was required. - Prior missing-source-evidence regression failed before implementation withrelease_peer_review_packetinstead ofhold_peer_review_packetwhen a complete abstract lacked source methods/results evidence packets. - Prior negated-endpoint regression failed before implementation withrelease_peer_review_packetinstead ofhold_peer_review_packetwhen an abstract statednot comment triage timewhile the result packet requiredcomment triage time. - Prior negated-design regression failed before implementation withrelease_peer_review_packetinstead ofhold_peer_review_packetwhen an abstract statednot a retrospective cohortwhile the method packet requiredretrospective cohort. -npm testfromstructured-abstract-consistency-assistant-> structured-abstract-consistency-assistant tests passed (34). -npm run check-> syntax checks passed for index, sample data, test, and demo files. -npm run demo-> regenerated 23 JSON packets plus Markdown/SVG evidence, includinginvalid-assessed-at-packet.jsonandmalformed-manuscript-packet.json. -npm run video-> regeneratedreports/demo.mp4. - All 23 generated JSON packets parsed successfully. -ffprobeverifiedreports/demo.mp4as H.264, 1280x720, 24 fps, 7.5s, 115,991 bytes. -git diff --checkandgit diff --cached --checkpassed; only Windows line-ending normalization warnings appeared before staging. - Focused restricted-string scan returned no credential, payout, or token strings. - GitHub PR merge state after push:CLEAN; no checks are reported for this branch. ## Demo Artifacts -structured-abstract-consistency-assistant/reports/blocked-packet.json-structured-abstract-consistency-assistant/reports/revision-packet.json-structured-abstract-consistency-assistant/reports/negated-design-packet.json-structured-abstract-consistency-assistant/reports/negated-primary-endpoint-packet.json-structured-abstract-consistency-assistant/reports/missing-source-evidence-packet.json-structured-abstract-consistency-assistant/reports/result-certainty-packet.json-structured-abstract-consistency-assistant/reports/mixed-certainty-packet.json-structured-abstract-consistency-assistant/reports/conclusion-certainty-packet.json-structured-abstract-consistency-assistant/reports/weak-limitation-packet.json-structured-abstract-consistency-assistant/reports/percentage-sample-size-packet.json-structured-abstract-consistency-assistant/reports/decimal-sample-size-packet.json-structured-abstract-consistency-assistant/reports/duration-sample-size-packet.json-structured-abstract-consistency-assistant/reports/abbreviated-unit-sample-size-packet.json-structured-abstract-consistency-assistant/reports/hyphenated-measurement-sample-size-packet.json-structured-abstract-consistency-assistant/reports/ordinal-sample-size-packet.json-structured-abstract-consistency-assistant/reports/no-difference-outcome-packet.json-structured-abstract-consistency-assistant/reports/missing-results-endpoint-packet.json-structured-abstract-consistency-assistant/reports/missing-methods-endpoint-packet.json-structured-abstract-consistency-assistant/reports/source-endpoint-mismatch-packet.json-structured-abstract-consistency-assistant/reports/invalid-assessed-at-packet.json-structured-abstract-consistency-assistant/reports/impossible-assessed-at-packet.json-structured-abstract-consistency-assistant/reports/clean-packet.json-structured-abstract-consistency-assistant/reports/abstract-consistency-report.md-structured-abstract-consistency-assistant/reports/summary.svg-structured-abstract-consistency-assistant/reports/demo.mp4Synthetic data only. No external services, credentials, live databases, private manuscripts, or payment data are used. AI-assisted with OpenAI Codex; I reviewed and locally verified the diff before submitting.