From ea61b4197afe9be8b6677e2de4da48233ac7738c Mon Sep 17 00:00:00 2001 From: Drew Stone Date: Tue, 9 Jun 2026 16:58:12 -0600 Subject: [PATCH] docs(research): adopt the agent-genome frame + block-coordinate credit assignment MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit External convergence (GEPA/SkillOpt/OPRO/Reflexion/Voyager/DSPy lineage): an agent is a policy induced by editable external state; optimize the genome from trajectories. Two upgrades folded into the map: (1) the Target axis becomes the full genome decomposition (prompt · skills · tool grants · topology · memory/retrieval · routing/policy · verifier · curriculum); (2) block-coordinate credit assignment as standing discipline — attribute failures via counterfactual reruns (the /autopsy move systematized), then edit the implicated coordinate; never re-descend a flat one. Reinterprets the GEPA holdout tie as a flat COORDINATE, not a flat landscape, and publishes the measured gradient table (tool grants +70pp ≫ architecture ~20pp ≫ model > strategy > prompt ~0) as the empirical prior. Two corrections imposed on the frame before adopting its mechanisms: deployable checkers only in the reward vector; selector≠judge still binds (reflection may see the judge, steering/selection may not). Mixture-of-genomes + bandit routing noted as gated. --- docs/research/optimization-space.md | 36 ++++++++++++++++++++++++++++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/docs/research/optimization-space.md b/docs/research/optimization-space.md index d471157..231b128 100644 --- a/docs/research/optimization-space.md +++ b/docs/research/optimization-space.md @@ -20,7 +20,7 @@ L0→L1→L2 rungs) is one *path* through it — not the space itself. | Axis | Values | Where this repo is today | |---|---|---| | **Timescale** | within-run · across-run · meta (optimizer-of-optimizer) | almost all effort within-run; across-run n=0 | -| **Target** | prompt (content) · topology/strategy (structure) · knowledge/corpus (memory) · policy (routing, ask-vs-act, budget) · tasks (curriculum) | prompt = measured (tie); topology = open; the rest untouched | +| **Target** | the **agent genome**: prompt · skills · tool schemas/grants · topology/strategy · knowledge/corpus (memory + retrieval policy) · routing/policy (ask-vs-act, budget, model config) · verifier · tasks (curriculum) | prompt = measured (flat); topology = open; tool grants = the largest measured effect; the rest untouched | | **Objective** | single score · multi-objective vector (correct·fast·secure·cheap) | every gate so far single-objective — **in tension with the canon** (see audit) | | **Validity scope** | one domain · cross-domain · live product | n=1 domain (EOPS-itsm) for the headline result | | **Serving architecture** | in-process (observe()/Corpus) · platform-served (Tangle Intelligence) | all in-process; Intelligence is export-only today | @@ -32,6 +32,40 @@ hides: objective shape, validity scope, serving topology, authorship. Both frame compatible; the ladder answers "is level n real?" (lift on level n−1), the axes answer "where is the unexplored headroom?". +## The genome frame + block-coordinate credit assignment (adopted 2026-06-09) + +External convergence: the "agent genome" framing (an agent = a policy induced by +editable external state θ; optimize θ from trajectories — the GEPA/SkillOpt/OPRO/ +Reflexion/Voyager/DSPy lineage) matches this map, and contributes one discipline we +adopt: **block-coordinate credit assignment**. A failed trajectory can be caused by any +genome component; *attribute first* (counterfactual reruns — the `/autopsy` discipline +systematized: rerun with ONE coordinate changed), then edit the implicated coordinate. +Never re-descend a coordinate measured flat. + +Under this frame the GEPA holdout tie is reinterpreted: not "optimization is flat" but +"**the analyst-prompt coordinate is flat**." And the program's measured effect sizes +already form the empirical gradient table the frame calls for: + +| genome coordinate | measured gradient | +|---|---| +| tool/harness grants | **+70pp** (search tool → cheap models reach parity) | +| loop architecture | **~20pp swing** (flat vs canonical loop, same model/domain) | +| domain/state structure | sign-flipping (the boundary law) | +| model choice | ~10–35pp | +| strategy (at fixed architecture) | +16.4pp where stateful; negative stateless | +| analyst prompt | **~0** (frozen-holdout tie) | + +The gradient is concentrated at the top — which re-weights the portfolio toward the +augmentation sweep (layer-economics) and keeps prompt search retired. + +Two corrections this map imposes on the genome frame before any adoption of its +mechanisms: (1) every reward component must be a **deployable checker** (no un-firewalled +LLM-judge fields in the vector — eval-substrate discipline); (2) **selector ≠ judge** +still binds: judge-informed *reflection* (outer loop) is legitimate GEPA design; +judge-informed *steering or selection* is the Goodhart hole. The frame's +mixture-of-genomes + contextual-bandit routing is promising but mechanism-rich — gated +on cheap evidence that specialized variants dominate one tuned agent on task slices. + ## The map with evidence status (2026-06-09) | Region | Evidence | Verdict |