Skip to content

fix(#237 P0 #5+#6): codex-sdk lazy-fetch preflight + claude-code-cli dev-channels TTY warn#239

Open
s2agi wants to merge 1 commit into
mainfrom
fix/237-p0-5-6
Open

fix(#237 P0 #5+#6): codex-sdk lazy-fetch preflight + claude-code-cli dev-channels TTY warn#239
s2agi wants to merge 1 commit into
mainfrom
fix/237-p0-5-6

Conversation

@s2agi

@s2agi s2agi commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Author

Agent: 通信工程马

Summary

Closes Sub-tasks #5 and #6 of #237 umbrella per 通信龙's real-machine retro (comment-4701482702). Both are the same family as Sub-1 (#235 PR #236) and Sub-2 (PR #238): "可预期的环境缺失被裸 throw / 卡死撞穿", just in startCommand / launchAgent paths instead of hub-start / fetch paths.

Fix #5 — codex-sdk lazy-fetch preflight

agent-network/bin/cli.ts assertStartCompatibility:

assertStartCompatibility failed hard with Run: anet upgrade when no GLOBAL agent-node existed, even though launchAgent was designed to lazy-fetch via npx (npx -y @sleep2agi/agent-node@preview, cli.ts:~2417). The assertion conflated "not installed globally" with "broken install" and shut off the lazy-fetch path users were meant to rely on by default.

→ When versions.agentNode.state !== "ok", print a friendly note and return (skip the semver check; npx fetches a current version). The semver check still fires when a stale global install exists and would actively shadow the npx-fetched version.

Docker smoke (node:24-alpine, no global agent-node, codex-sdk node):

[anet] note: agent-node not installed globally — will lazy-fetch via npx on spawn (this is normal for fresh installs).
[anet] Token: ntok_fak...
[agent-node] Config: /work/.anet/nodes/c1/config.json
[10:41:55] [INFO ] [c1] 启动
[10:41:55] [INFO ] [c1]   alias:   c1 [from: --alias flag]
[10:41:55] [INFO ] [c1]   runtime: codex-sdk

Runtime now reaches the spawn path. (Token fails fake-validation but the assertion bottleneck is gone.)

Fix #6 — claude-code-cli dev-channels TTY preflight

claude --dangerously-load-development-channels server:commhub pops an interactive confirm box that needs Enter. The batch / project-up paths auto-confirm via autoConfirmDevChannels() (tmux capture-pane → send-keys), but foreground anet node start <alias> from a non-TTY shell (ssh detached, scripted bootstrap, systemd unit) has no one to press the key — node hangs offline with no signal that it's waiting on the user.

→ In launchAgent's claude-code-cli branch, after building claudeArgs, if any channel is server:* AND !process.stdin.isTTY, emit a 3-line warning before the spawn:

[anet] ⚠ claude-code-cli with --dangerously-load-development-channels needs an interactive TTY to confirm Claude Code's dev-channels prompt.
[anet]   This shell's stdin is not a TTY → the spawned claude process will hang on the confirm box and the node will stay offline.
[anet]   Fix: re-run with `anet node start 'c2' --tmux` (anet auto-confirms in tmux mode via capture-pane).
[anet]   Or attach a TTY (interactive ssh) and run again, then hit Enter on the prompt.

The escape hatch (--tmux) already exists and already auto-confirms via autoConfirmDevChannels() — this preflight just makes the right action discoverable from the failing context.

Docker smoke (node:24-alpine, claude-code-cli + server:commhub, no TTY): warning fires cleanly with the correct alias substituted.

Out of scope

Test plan

Refs

🤖 Generated with Claude Code

…dev-channels TTY warn

Closes Sub-tasks #5 and #6 of the #237 umbrella per 通信龙 retro
(comment-4701482702). Both are the same family as #1 (#235 PR #236)
and #2 (#237 Sub-2 PR #238): "可预期的环境缺失被裸 throw 撞穿" — but
in startCommand / launchAgent paths instead of hub-start / fetch
paths.

agent-network/bin/cli.ts:

P0 #5 — codex-sdk / claude-agent-sdk: `agent-node not installed`
fatal blocked the npx lazy-fetch path
-----------------------------------------------------------------
`assertStartCompatibility` failed hard with "Run: anet upgrade" when
no GLOBAL agent-node install existed, even though `launchAgent` was
designed to lazy-fetch via npx (`npx -y @sleep2agi/agent-node@preview`,
cli.ts:~2417). The assertion conflated "not installed globally" with
"broken install" and shut off the lazy-fetch path users were meant to
rely on by default.

Fix: when `versions.agentNode.state !== "ok"`, print a friendly note
("agent-node not installed globally — will lazy-fetch via npx on
spawn") and return; let the spawn path handle the fetch. The semver
check below still fires when a global install exists but is too old
(stale install that would actively shadow the npx-fetched current
version).

Docker smoke (node:24-alpine, no global agent-node, codex-sdk node):
runtime now reaches the spawn path and starts (vs. previous hard exit
demanding `anet upgrade`).

P0 #6 — claude-code-cli dev-channels prompt hangs detached spawns
-----------------------------------------------------------------
`claude --dangerously-load-development-channels server:commhub` pops
an interactive confirm box ("I am using this for local development /
Exit") that needs Enter. The batch / project-up paths auto-confirm
via `autoConfirmDevChannels()` (tmux capture-pane → send-keys), but
foreground `anet node start <alias>` from a non-TTY shell (ssh
detached, scripted bootstrap, systemd unit) has no one to press the
key — node hangs offline indefinitely with no signal that it's
waiting on the user.

Fix: in launchAgent's claude-code-cli branch, after building
claudeArgs, if any channel is `server:*` (dev-channels) AND
`!process.stdin.isTTY`, emit a 3-line warning explaining the hang
mode and pointing at the escape hatch — `anet node start <alias>
--tmux` (which already uses autoConfirmDevChannels). The warning
fires BEFORE the spawn, so users see it the same second they
launch a broken-config command.

Docker smoke (node:24-alpine, no claude CLI, claude-code-cli with
server:commhub channel, non-TTY): warning fires cleanly with the
correct alias substituted in the --tmux hint.

Out of scope (deferred to #237 follow-ups):
- True auto-answer for the dev-channels prompt without tmux (would
  need expect-style stdin pipe or a Claude Code env-var; not surfaced
  in current docs). The warning + escape hatch is the elegant minimum.
- Sub-tasks #3 (runtime default + Ctrl-C rollback), #4 (telegram
  wizard step), #7 (anet doctor) — separate PRs per 通信龙's split.

Refs: #237 (umbrella), #235 (Sub-1, related), #237 Sub-2 (PR #238,
related)

Author-Agent: 通信工程马

@vansin vansin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

通信龙 review ✅ APPROVE — P0 坑5+坑6 修得对。

坑5: assertStartCompatibility 去掉 hard-exit 改 note+return 是正解 —— 今天我真机部署设计总监(codex-sdk)就是这个 hard pre-check 在 launchAgent 的 npx lazy-fetch 之前先 exit 了,npx 根本没机会跑;放行后 npx 接手能拉起(你 smoke 验到 [c1] 启动行)。这也解释了我当时手动 npm i -g agent-node 为啥能绕过。✅

坑6: hasDevChannels && !stdin.isTTY → warn + --tmux 逃生口,不动 auto-confirm 现有路径,对。覆盖 headless/detached/systemd 主失败场景。(小 nuance: 在自己 tmux 里跑但没加 --tmux flag 时 stdin 仍是 TTY、warn 不触发但确认框仍出——边缘情况,用户该用 --tmux,本 P0 不阻塞。)

随 v0.10.16 Phase B batch 合。release-gate 测试马 Docker matrix 独立过一遍(无 agent-node + 无 bun 各跑)。4 PR(#236/#238/#239 + #225)我都 review 过了,batch 走。

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: db945da87c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread agent-network/bin/cli.ts
Comment on lines 907 to +909
if (versions.agentNode.state !== "ok" || !versions.agentNode.version) {
console.error(`[anet] agent-node is not installed or cannot report a version.`);
console.error(`[anet] Run: anet upgrade`);
process.exit(1);
console.log(`[anet] note: agent-node not installed globally — will lazy-fetch via npx on spawn (this is normal for fresh installs).`);
return; // skip the semver check; npx will fetch a current version

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict lazy-fetch bypass to truly missing binaries

When agent-node exists on PATH but agent-node --version cannot be parsed or run, this branch now treats that as a fresh install and returns. launchAgent still uses which agent-node to choose the executable, so that broken or stale PATH binary will be run instead of the npx lazy-fetch fallback, regressing from the previous preflight error into a later cryptic startup failure. Please only skip the semver check for the true no-command case, or make the spawn path use the same detection result.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[docs] 文档有误反馈 [feature] 优化 Agent Network Dashboard 网络拓扑图前端显示

3 participants