feat(prompts): Kiro-style agent identity + explicit edit contract by JessicaMulein · Pull Request #566 · cecli-dev/cecli

JessicaMulein · 2026-06-08T18:27:40Z

Summary

Rewrites the agent-family system prompts so models follow the editing contract up front instead of learning it only after a failed edit. The previous agent prompt was a thin directive list that (a) encouraged long exploration loops, (b) never stated the #1 cause of failed turns (edit ordering), and (c) had no identity/voice.

Changes

prompts/agent.yml main_system — expert-engineer identity; investigate-before-claiming; scope discipline; failure-loop recognition (stop after two failures, diagnose); and an explicit editing contract: ContextManager -> ReadRange -> EditText, one file per EditText call, @000/000@ markers for empty files. Drops the loop-encouraging 'no task takes too long' line.
Bugfix: the agent prompt never referenced {final_reminders}, so overeager_prompt and the MCP tool_prompt were silently dropped for the agent coder. Now referenced exactly once.
prompts/subagent.yml — now inherits the agent identity/contract instead of re-overriding main_system with stale directives; keeps only the sub-agent-specific verbose Yield summary.
prompts/ask.yml / prompts/architect.yml — same direct voice and ground-answers-in-code discipline; architect plans now name verification steps and edge cases.

Test plan

New tests/basic/test_a-ent_prompt_contract.py (deterministic, no LLM): prompts render via str.format with no stray braces and no unknown keys; {final_reminders} appears exactly once across main_system+system_reminder; edit contract is stated (ReadRange before EditText, one file, empty-file markers); sub-agent inherits the agent main_system; legacy 'no task takes too long' directive is gone.
pytest tests/basic/test_prompts.py — 27 pass (inheritance chains unchanged).
pytest tests/coders/test_copypaste_coder.py tests/subagents/ — pass.
Behaviorally validated end-to-end against a real local model (qwen3-coder:30b) via the consuming app's harness: scoped edit task completed, zero edit failures, ReadRange preceded the edit.

Notes for reviewers (scope & how to split)

These changes fall into two buckets, and I'm happy to split if you'd prefer to land them separately:

Objective fixes (recommend landing regardless):
- {final_reminders} was never referenced in the agent prompt, so overeager_prompt and the MCP tool_prompt were silently dropped for the agent coder. Now referenced exactly once.
- subagent.yml re-overrode main_system with a stale copy, so sub-agents diverged from the main agent every time it improved. Now inherits.
- The editing-contract block restates rules that come straight from cecli's own edit_text.py / read_range.py error messages (ReadRange-before-EditText, one file per call, @000/000@), so it should reduce failed turns broadly — most on smaller/local models.
Opinionated / negotiable:
- The identity/voice rewrite is a matter of taste.
- main_system got longer (~2x); for tiny-context local models that's a real token cost. Happy to trim.
- I removed the "no task takes too long / be persistent" line because it encouraged exploration loops in our headless harness. A pure-CLI user may prefer to keep it — call it and I'll restore.

If you want only the objective bucket, I can drop the persona/voice + length changes into a follow-up and keep this PR to the two bugfixes + the contract block.

Honest scope of validation: the new test proves the prompt states the right things; behavioral improvement was validated end-to-end on one model (qwen3-coder:30b) on one scoped edit task (completed, zero edit failures, ReadRange preceded the edit). It is not a multi-model benchmark.

Rewrite the agent-family system prompts so models follow the editing contract up front instead of learning it after a failure: - agent.yml main_system: expert-engineer identity, investigate-before-claiming, scope discipline, failure-loop recognition, and an explicit editing contract (ContextManager -> ReadRange -> EditText, one file per EditText call, @000/000@ markers for empty files). Reference {final_reminders} so overeager_prompt and the MCP tool_prompt actually reach the agent coder. - subagent.yml: inherit the agent identity/contract instead of re-overriding main_system with stale directives; keep only sub-agent-specific finishing guidance (verbose Yield summary for the parent). - ask.yml / architect.yml: same direct voice and ground-answers-in-code discipline; architect plans now name verification steps and edge cases. Adds tests/basic/test_agent_prompt_contract.py: deterministic, no-LLM checks that the prompts render via str.format with no stray braces, that {final_reminders} appears exactly once, that the edit contract is stated, and that the sub-agent inherits the agent identity.

…ntract Pin cecli submodule to a653ce9f0 (dev-integration), which carries the Kiro-style agent prompt rewrite + explicit edit contract and the {final_reminders}/sub-agent-inheritance fixes (upstream PR cecli-dev/cecli#566). Adds the BrightVision-side prompt-quality eval harness: - bright_vision_core/agent_eval.py: objective behavioral scorer reusing the agent_turn.py signal parsers (edit failures, ReadRange-before-edit, ls-spam, token limit, rounds) + tests/core/test_agent_eval.py. - bright_vision_core/agent_judge.py: opt-in LLM-as-judge rubric (scope, directness, investigation, summary quality) with robust JSON parsing + tests/core/test_agent_judge.py. - tests/core/test_agent_prompt_eval.py + 'eval:prompts' script: real-Ollama behavioral eval scoring one scoped edit turn (E2E_LLM, BV_PROMPT_JUDGE). - docs: ROADMAP #54, TESTING 'Measuring prompt quality' section; .gitignore for the regenerated eval workspace.

JessicaMulein force-pushed the pr/agent-prompt-kiro branch from 886097a to be997ee Compare June 8, 2026 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(prompts): Kiro-style agent identity + explicit edit contract#566

feat(prompts): Kiro-style agent identity + explicit edit contract#566
JessicaMulein wants to merge 1 commit into
cecli-dev:mainfrom
Digital-Defiance:pr/agent-prompt-kiro

JessicaMulein commented Jun 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JessicaMulein commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Notes for reviewers (scope & how to split)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JessicaMulein commented Jun 8, 2026 •

edited

Loading