feat(runtime): publish the optimization suite + corpus flywheel wiring#209
Merged
Conversation
AgenticOptions gains corpus/corpusTags — the analyst's observe() pass now appends trace-derived facts (the flywheel write side) with zero extra LLM calls; priming (the read side) is the caller folding corpus.query() facts into the task systemPrompt. eops-corpus-ab.mts: THE across-run experiment (layer-across-run.md). Two arms over the same task stream, same order, canonical depth: cold (fresh every run) vs primed (query top-k facts before, observe-append after). Equal compute by construction. Reports paired lift, the SLOPE (first-half vs second-half — the flywheel signature is a growing advantage), fact-injection counts, and a frozen holdout (fresh tasks, corpus read-only). Smoke verified: run 1 wrote 3 facts, run 2 injected them.
…defineStrategy/runBenchmark Closes packaging gap G1 (docs/research/product-integration-playbook.md): the suite graduates from bench/ R&D into the published package, importable by any product. - src/runtime/strategy.ts — the domain-blind core moved from bench/src/agentic.ts: the AgenticSurface seam (Environment), the canonical depth/breadth drivers (Supervisor + observe() — the +16.4pp configuration), open Strategy + sample/refine, defineStrategy with the shot()/critique() steps, adaptiveRefine, runAgentic, and the corpus flywheel threading (AgenticOptions.corpus → the analyst's observe() appends trace-derived facts). - src/runtime/run-benchmark.ts — runBenchmark/printBenchmarkReport over an Environment; the paired lift now uses agent-eval's pairedBootstrap (the substrate's stats) instead of a bench-local copy. - Exported via /loops. bench consumers (agentic-eops, agentic-run, eops-gepa, eops-corpus-ab, examples/strategy-demo) now import from the package; bench/src/ agentic.ts + run-benchmark.mts deleted (-1000 LOC of R&D duplication). Verified: typecheck clean both packages; 680 tests pass; strategy-demo runs all four strategies end-to-end through the published exports.
tangletools
approved these changes
Jun 9, 2026
tangletools
left a comment
Contributor
There was a problem hiding this comment.
✅ Auto-approved PR — d879ac93
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-09T23:21:07Z
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Closes packaging gap G1 from the integration playbook and wires the corpus flywheel (the across-run layer's write side) — the two parallel tracks.
Published suite (graduates from
bench/R&D intosrc/, exported via/loops):src/runtime/strategy.ts— the domain-blind core: theEnvironmentseam (AgenticSurface), the canonical depth/breadth drivers (Supervisor +observe()— the +16.4pp configuration), openStrategy+sample/refine,defineStrategy(author a loop in ~20 lines fromshot()+critique()),adaptiveRefine,runAgentic.src/runtime/run-benchmark.ts—runBenchmark/printBenchmarkReport; paired lift now via agent-eval'spairedBootstrap(substrate stats, not a bench-local copy).bench/src/agentic.ts+run-benchmark.mtsdeleted (−1000 LOC of R&D duplication). Products can nowimport { Environment, defineStrategy, runBenchmark } from '@tangle-network/agent-runtime/loops'— the playbook's step 3/4 unblocked.Corpus flywheel wiring (Track A's substrate):
AgenticOptions.corpus/corpusTags— the analyst'sobserve()pass appends trace-derived facts (zero extra LLM calls); priming = the caller foldscorpus.query()facts into the task prompt.bench/src/eops-corpus-ab.mts— THE across-run experiment (primed-vs-cold, same stream, equal compute; slope + paired lift + frozen holdout; the four falsifiers from layer-across-run.md designed in). Smoke verified the loop: run 1 wrote 3 facts, run 2 injected them. Full n=16+holdout run in flight — result reported separately.Test
typecheck clean (both packages) · 680 tests pass · strategy-demo runs all four strategies end-to-end through the published exports · corpus A/B smoke verified write+read.