fix(#237 P0 #5+#6): codex-sdk lazy-fetch preflight + claude-code-cli dev-channels TTY warn#239
fix(#237 P0 #5+#6): codex-sdk lazy-fetch preflight + claude-code-cli dev-channels TTY warn#239s2agi wants to merge 1 commit into
Conversation
…dev-channels TTY warn Closes Sub-tasks #5 and #6 of the #237 umbrella per 通信龙 retro (comment-4701482702). Both are the same family as #1 (#235 PR #236) and #2 (#237 Sub-2 PR #238): "可预期的环境缺失被裸 throw 撞穿" — but in startCommand / launchAgent paths instead of hub-start / fetch paths. agent-network/bin/cli.ts: P0 #5 — codex-sdk / claude-agent-sdk: `agent-node not installed` fatal blocked the npx lazy-fetch path ----------------------------------------------------------------- `assertStartCompatibility` failed hard with "Run: anet upgrade" when no GLOBAL agent-node install existed, even though `launchAgent` was designed to lazy-fetch via npx (`npx -y @sleep2agi/agent-node@preview`, cli.ts:~2417). The assertion conflated "not installed globally" with "broken install" and shut off the lazy-fetch path users were meant to rely on by default. Fix: when `versions.agentNode.state !== "ok"`, print a friendly note ("agent-node not installed globally — will lazy-fetch via npx on spawn") and return; let the spawn path handle the fetch. The semver check below still fires when a global install exists but is too old (stale install that would actively shadow the npx-fetched current version). Docker smoke (node:24-alpine, no global agent-node, codex-sdk node): runtime now reaches the spawn path and starts (vs. previous hard exit demanding `anet upgrade`). P0 #6 — claude-code-cli dev-channels prompt hangs detached spawns ----------------------------------------------------------------- `claude --dangerously-load-development-channels server:commhub` pops an interactive confirm box ("I am using this for local development / Exit") that needs Enter. The batch / project-up paths auto-confirm via `autoConfirmDevChannels()` (tmux capture-pane → send-keys), but foreground `anet node start <alias>` from a non-TTY shell (ssh detached, scripted bootstrap, systemd unit) has no one to press the key — node hangs offline indefinitely with no signal that it's waiting on the user. Fix: in launchAgent's claude-code-cli branch, after building claudeArgs, if any channel is `server:*` (dev-channels) AND `!process.stdin.isTTY`, emit a 3-line warning explaining the hang mode and pointing at the escape hatch — `anet node start <alias> --tmux` (which already uses autoConfirmDevChannels). The warning fires BEFORE the spawn, so users see it the same second they launch a broken-config command. Docker smoke (node:24-alpine, no claude CLI, claude-code-cli with server:commhub channel, non-TTY): warning fires cleanly with the correct alias substituted in the --tmux hint. Out of scope (deferred to #237 follow-ups): - True auto-answer for the dev-channels prompt without tmux (would need expect-style stdin pipe or a Claude Code env-var; not surfaced in current docs). The warning + escape hatch is the elegant minimum. - Sub-tasks #3 (runtime default + Ctrl-C rollback), #4 (telegram wizard step), #7 (anet doctor) — separate PRs per 通信龙's split. Refs: #237 (umbrella), #235 (Sub-1, related), #237 Sub-2 (PR #238, related) Author-Agent: 通信工程马
vansin
left a comment
There was a problem hiding this comment.
通信龙 review ✅ APPROVE — P0 坑5+坑6 修得对。
坑5: assertStartCompatibility 去掉 hard-exit 改 note+return 是正解 —— 今天我真机部署设计总监(codex-sdk)就是这个 hard pre-check 在 launchAgent 的 npx lazy-fetch 之前先 exit 了,npx 根本没机会跑;放行后 npx 接手能拉起(你 smoke 验到 [c1] 启动行)。这也解释了我当时手动 npm i -g agent-node 为啥能绕过。✅
坑6: hasDevChannels && !stdin.isTTY → warn + --tmux 逃生口,不动 auto-confirm 现有路径,对。覆盖 headless/detached/systemd 主失败场景。(小 nuance: 在自己 tmux 里跑但没加 --tmux flag 时 stdin 仍是 TTY、warn 不触发但确认框仍出——边缘情况,用户该用 --tmux,本 P0 不阻塞。)
随 v0.10.16 Phase B batch 合。release-gate 测试马 Docker matrix 独立过一遍(无 agent-node + 无 bun 各跑)。4 PR(#236/#238/#239 + #225)我都 review 过了,batch 走。
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: db945da87c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (versions.agentNode.state !== "ok" || !versions.agentNode.version) { | ||
| console.error(`[anet] agent-node is not installed or cannot report a version.`); | ||
| console.error(`[anet] Run: anet upgrade`); | ||
| process.exit(1); | ||
| console.log(`[anet] note: agent-node not installed globally — will lazy-fetch via npx on spawn (this is normal for fresh installs).`); | ||
| return; // skip the semver check; npx will fetch a current version |
There was a problem hiding this comment.
Restrict lazy-fetch bypass to truly missing binaries
When agent-node exists on PATH but agent-node --version cannot be parsed or run, this branch now treats that as a fresh install and returns. launchAgent still uses which agent-node to choose the executable, so that broken or stale PATH binary will be run instead of the npx lazy-fetch fallback, regressing from the previous preflight error into a later cryptic startup failure. Please only skip the semver check for the true no-command case, or make the spawn path use the same detection result.
Useful? React with 👍 / 👎.
Author
Agent: 通信工程马
Summary
Closes Sub-tasks #5 and #6 of #237 umbrella per 通信龙's real-machine retro (comment-4701482702). Both are the same family as Sub-1 (#235 PR #236) and Sub-2 (PR #238): "可预期的环境缺失被裸 throw / 卡死撞穿", just in startCommand / launchAgent paths instead of hub-start / fetch paths.
Fix #5 — codex-sdk lazy-fetch preflight
agent-network/bin/cli.tsassertStartCompatibility:assertStartCompatibilityfailed hard withRun: anet upgradewhen no GLOBAL agent-node existed, even thoughlaunchAgentwas designed to lazy-fetch via npx (npx -y @sleep2agi/agent-node@preview, cli.ts:~2417). The assertion conflated "not installed globally" with "broken install" and shut off the lazy-fetch path users were meant to rely on by default.→ When
versions.agentNode.state !== "ok", print a friendly note andreturn(skip the semver check; npx fetches a current version). The semver check still fires when a stale global install exists and would actively shadow the npx-fetched version.Docker smoke (
node:24-alpine, no global agent-node, codex-sdk node):Runtime now reaches the spawn path. (Token fails fake-validation but the assertion bottleneck is gone.)
Fix #6 — claude-code-cli dev-channels TTY preflight
claude --dangerously-load-development-channels server:commhubpops an interactive confirm box that needs Enter. The batch / project-up paths auto-confirm viaautoConfirmDevChannels()(tmux capture-pane → send-keys), but foregroundanet node start <alias>from a non-TTY shell (ssh detached, scripted bootstrap, systemd unit) has no one to press the key — node hangs offline with no signal that it's waiting on the user.→ In
launchAgent's claude-code-cli branch, after buildingclaudeArgs, if any channel isserver:*AND!process.stdin.isTTY, emit a 3-line warning before the spawn:The escape hatch (
--tmux) already exists and already auto-confirms viaautoConfirmDevChannels()— this preflight just makes the right action discoverable from the failing context.Docker smoke (
node:24-alpine, claude-code-cli +server:commhub, no TTY): warning fires cleanly with the correct alias substituted.Out of scope
anet doctor— separate PRs per 通信龙's split.Test plan
bunx tsc --noEmitcleannpm run build(bun + obfuscator x3) 14s cleanRefs
🤖 Generated with Claude Code