I build systems that make AI agents predictable. Not smarter. Predictable.
Agents fail when they have to guess. The fix isn't better prompts; it's better contracts. Explicit scope. Typed learnings. Tiered routing based on complexity. The agent doesn't interpret. It executes against constraints.
Systems architecture applied to language models. Poet's precision for the prompts; engineer's rigor for the infrastructure.
Stop Copying Other People's AI Setups. Build One That's Actually Yours.
A rule you copied never had to prove itself to you. Extract every rule as a falsifiable hypothesis, mine your own session history for evidence, score it against a standard set in advance, keep only what survives. Most popular workflows don't.
What an AI Detector Actually Measures
AI detectors measure predictability, not authorship. Low perplexity flags machine writing, careful human prose, and familiar phrasing identically — a false-alarm rate far past the advertised number. A tool that can't define its own score is very confident about something it can't measure. People lose jobs over it.
Opus 4.8 vs 4.7 vs Sonnet vs Haiku
Twenty-one real tasks across every tier. For extraction and writing, the cheap models land within a hair of Opus at a fraction of the cost. Reaching for the biggest model by default is the most expensive habit in AI — reserve it for genuinely multi-step reasoning, not everything.
Agents need constraints, not capability.
Most AI workflows operate on vibes. You describe a task. The agent interprets. If the output matches what you imagined, ship it. If not, iterate until exhausted.
I build systems that eliminate the interpretation step:
- Contracts before code: Explicit acceptance criteria, scope boundaries, failure conditions. Written before implementation, not discovered during review.
- Learnings as database objects: Not notes. Typed rows: failures, patterns, gotchas. Queryable by category. Evidence-gated. Surviving session boundaries.
- Tiered routing: Complexity determines pipeline. 1-2 files: execute directly. 3-5 files: contract required. 6+: full verification with adversarial QA.
The agents aren't getting smarter. They're getting access to what their predecessors learned. That turns out to be enough.
I measure how models compute, not whether they're correct.
latent-diagnostics: Representation-level analysis of LLMs via SAE attribution graphs. Grammar tasks show d=1.08 higher influence than reasoning tasks. After length control, genuine computational regime differences emerge.
universal-spectroscopy-engine: Treats LLM activations as light spectra. 52% reduction in SAE reconstruction loss with structured vs natural language syntax. LLMs are vector computers pretending to be text processors.
experiments: Append-only specimen archive for LLM experiments.
| Brink | iOS journaling with private AI and biometrics. SwiftUI, HealthKit. |
| HeyContent | Content management for creators. Cross-platform insights, conversational persona generation. |
| ModelMind | Duolingo-style app for genuinely understanding how AI works — not how to prompt it, how it thinks. In development. |
| KERNEL | Claude Code that learns from itself. SQLite-backed agent memory, multi-agent orchestration, and an experiment engine that proves which workflows actually hold up. Active development, open source. Cursor version. |
| conductor | MCP server bridging Claude Desktop and Claude Code. |
| metabrain | Zero-dependency SQLite memory layer for AI agents. The "Stop Writing Markdown" architecture as a drop-in library. |
| memory-pool | Memory isn't a timeline. Structured architecture for persistent AI context. |
| event-horizon | Physics-informed encryption. SYK scrambling, chimera camouflage, resonance locking. |
Currently writing Intelligence Architecture — a principles-first book on building with AI. In progress.
Engineering the Soul: AI consciousness read through four novels. Every answer sounds like the person asking the question.
The Agent-Ready Web: Making a site legible to AI agents — llms.txt, MCP, markdown negotiation, content-signals. Half the checks are policy, not config.
Self-Learning Agent Civilization: The original system that started everything.
Stop Building Chatbots: Why the chat interface is a dead end.
How to Make Claude Code Actually Work: The full guide to KERNEL — structure, memory, and multi-agent workflows.
KERNEL: Configuration that adapts as you work.
Semantic Drift is Quantum Decoherence: Multi-agent coordination through physics.
Why Prompt Engineering Can't Fix Hallucinations: The case for mechanistic intervention.
Python · TypeScript · Swift
FastAPI · Next.js · SvelteKit
Claude · SAEs · Modal



