Skip to content

fix([cos-agent-output-write-amplification]): batch streamed agent output to end per-line state-file rewrites#612

Merged
atomantic merged 4 commits into
mainfrom
claim/cos-agent-output-write-amplification
Jun 1, 2026
Merged

fix([cos-agent-output-write-amplification]): batch streamed agent output to end per-line state-file rewrites#612
atomantic merged 4 commits into
mainfrom
claim/cos-agent-output-write-amplification

Conversation

@atomantic
Copy link
Copy Markdown
Owner

Summary

The audit item asked us to verify that every producer calling appendAgentOutput directly is human-pace, not a hot loop — and migrate any that aren't. It was not a no-op audit: two hot producers were still doing a full agent-state loadState+saveState per output line/chunk.

  • CoS Runner path (subAgentSpawner.js): the runner emits agent:output per parsed line (cos-runner/index.js), and the handler did one full state write per event. This is the primary output path when the runner is available (production default).
  • Direct-CLI path (agentCliSpawning.js): the non-stream stdout/stderr branches wrote state per chunk; the stream-json/codex chunk branches batched per chunk but without the documented ~250ms debounce.

Only the TUI spawner honored the CLAUDE.md "High-frequency state writes must batch" convention.

Change

  • Add a shared createAgentOutputBatcher(agentId) to cosAgents.js — a 250ms-debounced batcher over appendAgentOutputLines with in-flight re-schedule and swallow+log on write failure (so neither the timer nor a caller's await flush() can throw into a child-process/timer callback). Callers await flush() in their finish/cleanup path so the final lines land before the completion event.
  • Route the runner agent:output handler through a per-agent batcher map, drained on agent:completed / agent:error / agents:orphaned.
  • Route the direct-CLI stdout/stderr (all branches) + close-handler parser/codex flush through a per-spawn batcher, drained in the close and error handlers before finalize.

No change to the live output tail (events still emit per line); only the state-write cadence changes — from dozens of whole-file rewrites/sec to a few.

Test plan

  • cosAgents.test.js — new createAgentOutputBatcher suite: coalesces N lines into one saveState, no-op flush when empty, captures lines pushed mid-drain, swallows+logs a write failure (flush never rejects).
  • agentCliSpawning.test.js — the two stream-containment tests updated for the batched flow: a failed batch flush on close is logged with a ❌ prefix and never leaks as an unhandled rejection (stdout + stderr).
  • Full suites pass: cosAgents (5), agentCliSpawning (9), agentTuiSpawning (19), subAgentSpawner (85).

atomantic added 4 commits June 1, 2026 14:38
…put to end per-line state-file rewrites

The CoS Runner emits agent:output per parsed line and the direct-CLI
non-stream stdout/stderr handlers fired per chunk; each call ran a full
loadState+saveState of the agent-state JSON. On a chatty agent that is
dozens of whole-file rewrites per second — the exact write-amplification
the CLAUDE.md streaming-producer convention warns about, which only the
TUI spawner and the stream-json chunk paths were honoring.

Add a shared createAgentOutputBatcher to cosAgents.js (250ms debounce,
in-flight re-schedule, swallow+log on write failure) and route the two
hot producers through it: subAgentSpawner's runner agent:output handler
(per-agent batcher map, drained on completed/error/orphaned) and
agentCliSpawning's stdout/stderr/parser-flush paths (per-spawn batcher,
drained in the close + error handlers before finalize).
…unner-batcher Map leak, drain-loop on flush

- subAgentSpawner: flush the runner batcher before deleting its Map entry
  (a line racing in during the awaited flush lands in the same batcher,
  not an orphaned new one) and drop agent:output for agents no longer in
  runnerAgents — the runner registers the agent before spawning, so this
  only ignores post-completion strays that would otherwise leak a batcher.
  (claude + agy review)
- cosAgents: flush() now drains in a while-loop so a push racing in during
  the second drain is cleared synchronously rather than stranded to the
  debounce timer. (agy review)
- agentLifecycle: drop the now-dead appendAgentOutput import. (agy + claude)
…user terminate/kill before marking complete

codex review flagged that the new ~250ms output batching opened a window
where a user terminate/kill marks the agent complete before its pending
output is drained (and, on the runner path, could leak the batcher Map
entry if no later completion event arrives).

- agentCliSpawning: expose a flushOutput() hook on the activeAgents entry.
- agentManagement terminateAgent/killAgent (direct): await agent.flushOutput?.()
  before completeAgent.
- agentManagement terminateRunnerAgent: drain + drop the runner batcher
  (flushRunnerOutputBatcher, dynamic-imported to avoid the static cycle)
  before completeAgent + runnerAgents.delete.
- subAgentSpawner: export flushRunnerOutputBatcher.
- Add source-contract tests asserting flush precedes completeAgent on all
  three paths.
@atomantic atomantic force-pushed the claim/cos-agent-output-write-amplification branch from 050d033 to 4e76ba3 Compare June 1, 2026 21:39
@atomantic atomantic merged commit 7930b05 into main Jun 1, 2026
2 checks passed
@atomantic atomantic deleted the claim/cos-agent-output-write-amplification branch June 1, 2026 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant