Skip to content

feat(runtime): publish the optimization suite + corpus flywheel wiring#209

Merged
drewstone merged 2 commits into
mainfrom
feat/publish-strategy-suite
Jun 9, 2026
Merged

feat(runtime): publish the optimization suite + corpus flywheel wiring#209
drewstone merged 2 commits into
mainfrom
feat/publish-strategy-suite

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

What

Closes packaging gap G1 from the integration playbook and wires the corpus flywheel (the across-run layer's write side) — the two parallel tracks.

Published suite (graduates from bench/ R&D into src/, exported via /loops):

  • src/runtime/strategy.ts — the domain-blind core: the Environment seam (AgenticSurface), the canonical depth/breadth drivers (Supervisor + observe() — the +16.4pp configuration), open Strategy + sample/refine, defineStrategy (author a loop in ~20 lines from shot()+critique()), adaptiveRefine, runAgentic.
  • src/runtime/run-benchmark.tsrunBenchmark/printBenchmarkReport; paired lift now via agent-eval's pairedBootstrap (substrate stats, not a bench-local copy).
  • bench consumers rewired to package imports; bench/src/agentic.ts + run-benchmark.mts deleted (−1000 LOC of R&D duplication). Products can now import { Environment, defineStrategy, runBenchmark } from '@tangle-network/agent-runtime/loops' — the playbook's step 3/4 unblocked.

Corpus flywheel wiring (Track A's substrate):

  • AgenticOptions.corpus/corpusTags — the analyst's observe() pass appends trace-derived facts (zero extra LLM calls); priming = the caller folds corpus.query() facts into the task prompt.
  • bench/src/eops-corpus-ab.mts — THE across-run experiment (primed-vs-cold, same stream, equal compute; slope + paired lift + frozen holdout; the four falsifiers from layer-across-run.md designed in). Smoke verified the loop: run 1 wrote 3 facts, run 2 injected them. Full n=16+holdout run in flight — result reported separately.

Test

typecheck clean (both packages) · 680 tests pass · strategy-demo runs all four strategies end-to-end through the published exports · corpus A/B smoke verified write+read.

drewstone added 2 commits June 9, 2026 17:16
AgenticOptions gains corpus/corpusTags — the analyst's observe() pass now appends
trace-derived facts (the flywheel write side) with zero extra LLM calls; priming (the
read side) is the caller folding corpus.query() facts into the task systemPrompt.

eops-corpus-ab.mts: THE across-run experiment (layer-across-run.md). Two arms over the
same task stream, same order, canonical depth: cold (fresh every run) vs primed
(query top-k facts before, observe-append after). Equal compute by construction.
Reports paired lift, the SLOPE (first-half vs second-half — the flywheel signature is
a growing advantage), fact-injection counts, and a frozen holdout (fresh tasks, corpus
read-only). Smoke verified: run 1 wrote 3 facts, run 2 injected them.
…defineStrategy/runBenchmark

Closes packaging gap G1 (docs/research/product-integration-playbook.md): the suite
graduates from bench/ R&D into the published package, importable by any product.

- src/runtime/strategy.ts — the domain-blind core moved from bench/src/agentic.ts:
  the AgenticSurface seam (Environment), the canonical depth/breadth drivers
  (Supervisor + observe() — the +16.4pp configuration), open Strategy + sample/refine,
  defineStrategy with the shot()/critique() steps, adaptiveRefine, runAgentic, and the
  corpus flywheel threading (AgenticOptions.corpus → the analyst's observe() appends
  trace-derived facts).
- src/runtime/run-benchmark.ts — runBenchmark/printBenchmarkReport over an Environment;
  the paired lift now uses agent-eval's pairedBootstrap (the substrate's stats) instead
  of a bench-local copy.
- Exported via /loops. bench consumers (agentic-eops, agentic-run, eops-gepa,
  eops-corpus-ab, examples/strategy-demo) now import from the package; bench/src/
  agentic.ts + run-benchmark.mts deleted (-1000 LOC of R&D duplication).

Verified: typecheck clean both packages; 680 tests pass; strategy-demo runs all four
strategies end-to-end through the published exports.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — d879ac93

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-09T23:21:07Z

@drewstone drewstone merged commit 8bb31a0 into main Jun 9, 2026
1 check passed
@drewstone drewstone deleted the feat/publish-strategy-suite branch June 9, 2026 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants