From ea61b4197afe9be8b6677e2de4da48233ac7738c Mon Sep 17 00:00:00 2001
From: Drew Stone <drewstone329@gmail.com>
Date: Tue, 9 Jun 2026 16:58:12 -0600
Subject: [PATCH] docs(research): adopt the agent-genome frame +
 block-coordinate credit assignment
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

External convergence (GEPA/SkillOpt/OPRO/Reflexion/Voyager/DSPy lineage): an agent is a
policy induced by editable external state; optimize the genome from trajectories. Two
upgrades folded into the map: (1) the Target axis becomes the full genome decomposition
(prompt · skills · tool grants · topology · memory/retrieval · routing/policy · verifier
· curriculum); (2) block-coordinate credit assignment as standing discipline — attribute
failures via counterfactual reruns (the /autopsy move systematized), then edit the
implicated coordinate; never re-descend a flat one. Reinterprets the GEPA holdout tie as
a flat COORDINATE, not a flat landscape, and publishes the measured gradient table
(tool grants +70pp ≫ architecture ~20pp ≫ model > strategy > prompt ~0) as the empirical
prior. Two corrections imposed on the frame before adopting its mechanisms: deployable
checkers only in the reward vector; selector≠judge still binds (reflection may see the
judge, steering/selection may not). Mixture-of-genomes + bandit routing noted as gated.
---
 docs/research/optimization-space.md | 36 ++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/docs/research/optimization-space.md b/docs/research/optimization-space.md
index d471157..231b128 100644
--- a/docs/research/optimization-space.md
+++ b/docs/research/optimization-space.md
@@ -20,7 +20,7 @@ L0→L1→L2 rungs) is one *path* through it — not the space itself.
 | Axis | Values | Where this repo is today |
 |---|---|---|
 | **Timescale** | within-run · across-run · meta (optimizer-of-optimizer) | almost all effort within-run; across-run n=0 |
-| **Target** | prompt (content) · topology/strategy (structure) · knowledge/corpus (memory) · policy (routing, ask-vs-act, budget) · tasks (curriculum) | prompt = measured (tie); topology = open; the rest untouched |
+| **Target** | the **agent genome**: prompt · skills · tool schemas/grants · topology/strategy · knowledge/corpus (memory + retrieval policy) · routing/policy (ask-vs-act, budget, model config) · verifier · tasks (curriculum) | prompt = measured (flat); topology = open; tool grants = the largest measured effect; the rest untouched |
 | **Objective** | single score · multi-objective vector (correct·fast·secure·cheap) | every gate so far single-objective — **in tension with the canon** (see audit) |
 | **Validity scope** | one domain · cross-domain · live product | n=1 domain (EOPS-itsm) for the headline result |
 | **Serving architecture** | in-process (observe()/Corpus) · platform-served (Tangle Intelligence) | all in-process; Intelligence is export-only today |
@@ -32,6 +32,40 @@ hides: objective shape, validity scope, serving topology, authorship. Both frame
 compatible; the ladder answers "is level n real?" (lift on level n−1), the axes answer
 "where is the unexplored headroom?".
 
+## The genome frame + block-coordinate credit assignment (adopted 2026-06-09)
+
+External convergence: the "agent genome" framing (an agent = a policy induced by
+editable external state θ; optimize θ from trajectories — the GEPA/SkillOpt/OPRO/
+Reflexion/Voyager/DSPy lineage) matches this map, and contributes one discipline we
+adopt: **block-coordinate credit assignment**. A failed trajectory can be caused by any
+genome component; *attribute first* (counterfactual reruns — the `/autopsy` discipline
+systematized: rerun with ONE coordinate changed), then edit the implicated coordinate.
+Never re-descend a coordinate measured flat.
+
+Under this frame the GEPA holdout tie is reinterpreted: not "optimization is flat" but
+"**the analyst-prompt coordinate is flat**." And the program's measured effect sizes
+already form the empirical gradient table the frame calls for:
+
+| genome coordinate | measured gradient |
+|---|---|
+| tool/harness grants | **+70pp** (search tool → cheap models reach parity) |
+| loop architecture | **~20pp swing** (flat vs canonical loop, same model/domain) |
+| domain/state structure | sign-flipping (the boundary law) |
+| model choice | ~10–35pp |
+| strategy (at fixed architecture) | +16.4pp where stateful; negative stateless |
+| analyst prompt | **~0** (frozen-holdout tie) |
+
+The gradient is concentrated at the top — which re-weights the portfolio toward the
+augmentation sweep (layer-economics) and keeps prompt search retired.
+
+Two corrections this map imposes on the genome frame before any adoption of its
+mechanisms: (1) every reward component must be a **deployable checker** (no un-firewalled
+LLM-judge fields in the vector — eval-substrate discipline); (2) **selector ≠ judge**
+still binds: judge-informed *reflection* (outer loop) is legitimate GEPA design;
+judge-informed *steering or selection* is the Goodhart hole. The frame's
+mixture-of-genomes + contextual-bandit routing is promising but mechanism-rich — gated
+on cheap evidence that specialized variants dominate one tuned agent on task slices.
+
 ## The map with evidence status (2026-06-09)
 
 | Region | Evidence | Verdict |