docs(research): optimization-space suite — axes, layer stress-tests, operator playbook#207
Merged
Merged
Conversation
…er stress tests, operator playbook Answers "does GEPA/steerers/HALO contextualize everything we should think about?" — no. Reframes the program as a 6-AXIS space (timescale · target · objective · validity scope · serving architecture · authorship) instead of a single ladder, maps every cell to its evidence status, and stress-tests each layer against the canon (architecture.md / learning-flywheel.md / eval-substrate.md / roadmap-rsi.md). Index: optimization-space.md — the taxonomy, the evidence map (the program over-sampled within-run × single-objective × itsm while the canon's own success criterion, the across-run flywheel, has n=0), the canon-compatibility audit (compatible; two corrections forced: "steering is NEGATIVE on stateless, not null"; the multi-objective mandate is the largest practice-vs-canon inconsistency), and the ranked portfolio. Layer docs: within-run (boundary law settled; topology = the one open lever), across-run (the corpus A/B design + four falsifiers — THE priority), economics (lift-per-dollar; tool augmentation +70pp dominates), domain-generality (n=1-domain exposure; csm/hr replication nearly free), intelligence-serving (Tangle Intelligence is export-only today; split by timescale — in-loop critic local, across-run memory served; the server-side judge firewall is non-negotiable), agent-authored (defineStrategy skillification, R0→R3 ladder, two structural safety properties). product-integration-playbook.md: the 8-step product wiring (gtm first), the operator role table (humans own "what good means" + the ship decision; the system owns the rest), and the three packaging gaps (publish the suite from bench/ to src/, corpus inflow from production traces, the first product Environment).
tangletools
approved these changes
Jun 9, 2026
tangletools
left a comment
Contributor
There was a problem hiding this comment.
✅ Auto-approved PR — 754bfb8a
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-09T22:43:42Z
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The strategy map requested after the steering/GEPA gate series: rethink the layers, stress-test them against the vision docs, and write the product-integration operator playbook.
8 docs under
docs/research/(indexed in the README):optimization-space.md— the 6-axis taxonomy (timescale · target · objective · validity-scope · serving-architecture · authorship) replacing the single-ladder frame; the evidence map showing the program over-sampled one cell (within-run × single-objective × itsm) while the canon's own success criterion (across-run, Gate B) has n=0; the canon-compatibility audit; the ranked experiment portfolio.layer-*.mdstress tests — each with its evidence table, strongest objections, and concrete next experiments. Highlights: the within-run boundary law is settled (negative stateless / positive stateful+keep-best); across-run is the priority (the corpus A/B + 4 falsifiers); the multi-objective mandate is the largest practice-vs-canon inconsistency; tool augmentation (+70pp) is the largest effect measured anywhere; Tangle Intelligence is export-only today and is the natural home of the across-run memory (with the server-side judge firewall as the non-negotiable);defineStrategymakes agent-authored strategies feasible (R0→R3 ladder).product-integration-playbook.md— the 8-step wiring for gtm/tax/creative/etc., the operator role table (humans own the deployable checks, thresholds, and the weekly ship decision; the system owns the rest), and the 3 packaging gaps (publish the suite frombench/tosrc/, production-trace→corpus inflow, the first product Environment).Canon verdict: the new framing is compatible with architecture.md/learning-flywheel.md — and the audit forced two corrections onto it (steering is negative-not-null on stateless; the ladder is one path through the axis space). Two documentation-debt items flagged for follow-up, not edited here.