feat: CLI telemetry for all commands with centralized lifecycle#122
feat: CLI telemetry for all commands with centralized lifecycle#122nicknisi wants to merge 35 commits into
Conversation
Closes security-audit finding #1 on PR #122 (telemetry message sanitization). `error.message` was flowing into 4 capture sites unsanitized, leaking homedir paths (and rarely, credentials) to the WorkOS gateway. - Add `sanitizeMessage()` in crash-reporter.ts: homedir strip + Bearer/ sk_*/JWT redaction + 1KB truncation. - Factor secret redaction into shared `redactSecrets()` used by both `sanitizeMessage` and `sanitizeStack` (Node echoes `.message` into the leading `Error.stack` line, so message-only sanitization was insufficient). - Add private `extractErrorFields()` chokepoint on `Analytics`; route all 4 capture sites through it (`captureException`, `stepCompleted`, `commandExecuted`, `captureUnhandledCrash`). `replaceLastCommandEvent` inherits sanitization via its delegation to `commandExecuted`. - `captureUnhandledCrash` now uses `sanitizeStack` instead of inline truncation, providing defense-in-depth for callers that bypass the crash-reporter wrapper. - Add regression guard test (`telemetry-sanitize.spec.ts`): poisons every capture method with homedir + Bearer + sk_live_ + JWT, asserts no marker reaches the serialized queue. Reviewed: ideation:reviewer cycle 1 PASS (0 critical, 0 high).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds CLI telemetry pipeline (command/session/crash), device-id persistence and store‑and‑forward, crash sanitization and reporting, telemetry client queue/persist/flush/auth modes, analytics event emission (emitCommandEvent/captureUnhandledCrash), structured CliExit/exit-code mapping, API error enrichment, CLI middleware/handler wrapping, and corresponding tests/docs. ChangesTelemetry & CLI Exit
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (4)
src/utils/telemetry-types.ts (1)
58-58: ⚡ Quick winAlign
startTimestampoptionality with backward-compat contract.
telemetry-schema.spec.tsexplicitly acceptsstep/agent.toolwithoutstartTimestamp, but these interfaces currently require it. Making both optional avoids contract drift.Suggested patch
export interface StepEvent extends TelemetryEvent { type: 'step'; name: string; - startTimestamp: string; + startTimestamp?: string; durationMs: number; success: boolean; error?: { @@ export interface AgentToolEvent extends TelemetryEvent { type: 'agent.tool'; toolName: string; - startTimestamp: string; + startTimestamp?: string; durationMs: number; success: boolean; }Also applies to: 70-70
src/utils/telemetry-client.spec.ts (1)
60-105: ⚡ Quick winAdd coverage for claim-token auth headers.
setClaimTokenAuth()is new behavior in this PR, but there’s no test assertingx-workos-claim-token+x-workos-client-idemission (and bearer-token precedence when both exist).Also applies to: 124-218
src/utils/telemetry-store-forward.ts (1)
1-3: ⚡ Quick winUse async fs APIs for startup recovery path.
recoverPendingEvents()(Line 25+) is async but uses sync disk calls (Lines 27-56), which blocks startup and conflicts with the project rule for TS files.As per coding guidelines "Avoid Node-specific sync APIs (crypto, fs sync) unless necessary".
Also applies to: 27-56
src/utils/output.ts (1)
123-124: ⚡ Quick winProtect process termination from telemetry failures
analytics.recordTermination(...)on Line 123 can throw and prevent Line 124 from executing. Error exits should remain deterministic even when telemetry is unhealthy.🔧 Proposed fix
const reason = error.apiContext ? 'api_error' : codeReason; - analytics.recordTermination(reason, error.code, error.apiContext); - process.exit(exit); + try { + analytics.recordTermination(reason, error.code, error.apiContext); + } finally { + process.exit(exit); + } }
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c45afda9-659c-43a0-9646-cd08dd0cabe3
📒 Files selected for processing (27)
CLAUDE.mdsrc/bin.tssrc/commands/debug.tssrc/lib/api-error-handler.spec.tssrc/lib/api-error-handler.tssrc/lib/command-aliases.tssrc/lib/device-id.spec.tssrc/lib/device-id.tssrc/lib/run-with-core.tssrc/utils/analytics.spec.tssrc/utils/analytics.tssrc/utils/command-telemetry.spec.tssrc/utils/command-telemetry.tssrc/utils/crash-reporter.spec.tssrc/utils/crash-reporter.tssrc/utils/exit-codes.spec.tssrc/utils/exit-codes.tssrc/utils/output.spec.tssrc/utils/output.tssrc/utils/register-subcommand.tssrc/utils/telemetry-client.spec.tssrc/utils/telemetry-client.tssrc/utils/telemetry-sanitize.spec.tssrc/utils/telemetry-schema.spec.tssrc/utils/telemetry-store-forward.spec.tssrc/utils/telemetry-store-forward.tssrc/utils/telemetry-types.ts
| try { | ||
| if (fs.existsSync(filePath)) { | ||
| const raw = fs.readFileSync(filePath, 'utf8').trim(); | ||
| if (UUID_REGEX.test(raw)) { | ||
| cached = raw; | ||
| return raw; | ||
| } | ||
| } | ||
|
|
||
| const id = crypto.randomUUID(); | ||
| fs.mkdirSync(path.dirname(filePath), { recursive: true, mode: 0o700 }); | ||
| fs.writeFileSync(filePath, id, { encoding: 'utf8', mode: 0o600 }); | ||
| cached = id; |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift
Replace sync fs usage in the command path.
Line 31–43 and Line 49 use synchronous fs/crypto APIs on a hot CLI path. Please move this to async (node:fs/promises) and cache a pending promise to keep call sites simple.
As per coding guidelines src/**/*.ts: “Avoid Node-specific sync APIs (crypto, fs sync) unless necessary”.
Also applies to: 49-50
| async flush(): Promise<boolean> { | ||
| if (this.events.length === 0) return true; | ||
| if (!this.gatewayUrl) { | ||
| debug('[Telemetry] No gateway URL configured, skipping flush'); | ||
| return; | ||
| return false; | ||
| } | ||
|
|
||
| const payload: TelemetryRequest = { events: [...this.events] }; | ||
| this.events = []; | ||
| const count = this.events.length; | ||
| const payload: TelemetryRequest = { events: this.events.slice(0, count) }; | ||
|
|
There was a problem hiding this comment.
Guard against concurrent flush() calls to prevent event loss/duplication.
Line 85 currently allows overlapping flushes. Two concurrent calls can send the same snapshot twice, and later splice(0, count) can remove events queued after the first flush.
Suggested patch
export class TelemetryClient {
private events: TelemetryEvent[] = [];
+ private flushInFlight: Promise<boolean> | null = null;
@@
async flush(): Promise<boolean> {
+ if (this.flushInFlight) return this.flushInFlight;
+ this.flushInFlight = this.flushInternal();
+ try {
+ return await this.flushInFlight;
+ } finally {
+ this.flushInFlight = null;
+ }
+ }
+
+ private async flushInternal(): Promise<boolean> {
if (this.events.length === 0) return true;
@@
- }
+ }
}Also applies to: 135-137, 143-144
Greptile SummaryThis PR replaces the fragile provisional-event/patch-chain telemetry design with a centralized
Confidence Score: 4/5Safe to merge with awareness of two edge cases in the telemetry subsystem that do not affect CLI command correctness. The core lifecycle change is well-structured and the crash reporter correctly handles src/utils/telemetry-store-forward.ts and src/commands/dev.ts warrant a second look before the next release. Important Files Changed
|
Closes security-audit finding #1 on PR #122 (telemetry message sanitization). `error.message` was flowing into 4 capture sites unsanitized, leaking homedir paths (and rarely, credentials) to the WorkOS gateway. - Add `sanitizeMessage()` in crash-reporter.ts: homedir strip + Bearer/ sk_*/JWT redaction + 1KB truncation. - Factor secret redaction into shared `redactSecrets()` used by both `sanitizeMessage` and `sanitizeStack` (Node echoes `.message` into the leading `Error.stack` line, so message-only sanitization was insufficient). - Add private `extractErrorFields()` chokepoint on `Analytics`; route all 4 capture sites through it (`captureException`, `stepCompleted`, `commandExecuted`, `captureUnhandledCrash`). `replaceLastCommandEvent` inherits sanitization via its delegation to `commandExecuted`. - `captureUnhandledCrash` now uses `sanitizeStack` instead of inline truncation, providing defense-in-depth for callers that bypass the crash-reporter wrapper. - Add regression guard test (`telemetry-sanitize.spec.ts`): poisons every capture method with homedir + Bearer + sk_live_ + JWT, asserts no marker reaches the serialized queue. Reviewed: ideation:reviewer cycle 1 PASS (0 critical, 0 high).
31829ec to
0475ea8
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e6b7b384-44b3-47fe-9030-070bd92bf44e
📒 Files selected for processing (39)
CLAUDE.mdsrc/bin.tssrc/commands/api/index.spec.tssrc/commands/api/index.tssrc/commands/api/interactive.spec.tssrc/commands/api/interactive.tssrc/commands/debug.tssrc/commands/dev.tssrc/commands/doctor.tssrc/commands/emulate.tssrc/commands/env.tssrc/commands/install-skill.tssrc/commands/login.tssrc/commands/uninstall-skill.tssrc/lib/api-error-handler.spec.tssrc/lib/api-error-handler.tssrc/lib/command-aliases.tssrc/lib/device-id.spec.tssrc/lib/device-id.tssrc/lib/run-with-core.tssrc/utils/analytics.spec.tssrc/utils/analytics.tssrc/utils/command-telemetry.spec.tssrc/utils/command-telemetry.tssrc/utils/crash-reporter.spec.tssrc/utils/crash-reporter.tssrc/utils/exit-codes.spec.tssrc/utils/exit-codes.tssrc/utils/help-json.tssrc/utils/output.spec.tssrc/utils/output.tssrc/utils/register-subcommand.tssrc/utils/telemetry-client.spec.tssrc/utils/telemetry-client.tssrc/utils/telemetry-sanitize.spec.tssrc/utils/telemetry-schema.spec.tssrc/utils/telemetry-store-forward.spec.tssrc/utils/telemetry-store-forward.tssrc/utils/telemetry-types.ts
✅ Files skipped from review due to trivial changes (1)
- src/commands/emulate.ts
🚧 Files skipped from review as they are similar to previous changes (22)
- CLAUDE.md
- src/lib/command-aliases.ts
- src/utils/register-subcommand.ts
- src/utils/telemetry-client.spec.ts
- src/utils/crash-reporter.spec.ts
- src/commands/debug.ts
- src/utils/telemetry-schema.spec.ts
- src/utils/telemetry-client.ts
- src/utils/telemetry-store-forward.spec.ts
- src/lib/device-id.spec.ts
- src/utils/output.spec.ts
- src/utils/telemetry-store-forward.ts
- src/utils/command-telemetry.spec.ts
- src/lib/api-error-handler.spec.ts
- src/lib/device-id.ts
- src/lib/api-error-handler.ts
- src/utils/telemetry-sanitize.spec.ts
- src/utils/crash-reporter.ts
- src/utils/analytics.ts
- src/utils/telemetry-types.ts
- src/utils/exit-codes.ts
- src/utils/analytics.spec.ts
…amps, and new event types Add environment fingerprint fields (OS, Node version, CI detection, shell) to session.start and session.end events. Add startTimestamp to step and agent.tool events for span reconstruction. Define command and crash event types with stub emission methods. Add discriminated union Zod schema validation tests mirroring the API schema.
… persistence Wire up yargs middleware that emits a provisional command event before each handler runs, then replaces it with actual duration/success on completion. This covers the ~25 process.exit() call sites without modifying them. - Command telemetry middleware with canonical name resolution and flag extraction - Crash reporter with sanitized stack traces (sync handlers, no async) - Store-forward: persist unsent events to temp file on exit, recover on next run - Fix flush() to retain events until HTTP success (was clearing before fetch) - Auto-wrap handlers in registerSubcommand() (single change point) - Shared COMMAND_ALIASES map for telemetry and help-json - analytics.initForNonInstaller() sets gatewayUrl + JWT from stored creds
- Enable debug output for non-installer commands via env var - Log telemetry event details (type, name, duration, attributes) on flush - Register in debug command's env var catalog
- Wrap inline command handlers (seed, setup-org, doctor, etc.) with wrapCommandHandler so they report real duration/success - Skip provisional telemetry event for install command (has own session telemetry) - Add claim -> env.claim to canonical alias map - Defer store-forward file deletion until after successful flush
Client errors (401, 403) are permanent failures that won't succeed on retry. Only retain events for 5xx (transient server errors) and network failures where store-forward retry is meaningful.
- flush() returns true (sent/dropped) or false (retryable) so callers can act on the result - Use splice(0, count) instead of clearing all events, protecting events queued concurrently during the fetch - wrapCommandHandler flushes in-process so events are sent immediately instead of always deferring to next invocation via store-forward - Store-forward recovery deletes files after loading into memory (events are re-persisted by exit handler if flush fails) - Skip provisional events for dashboard and $0 (installer entry points) - Add 4xx drop test coverage
Add a section to CLAUDE.md explaining which commands auto-emit telemetry (registerSubcommand) versus which need manual wrapCommandHandler wrapping (inline top-level .command() calls). Add a pointer comment in bin.ts near the workflow commands block. Prevents new top-level commands from silently emitting duration=0 telemetry.
- Add workos.user_id to command and crash events (from stored credentials or unclaimed environment clientId) so dashboards can count unique users - Add cli.version to command and crash events for release adoption tracking - Support claim-token auth path on the telemetry client, so unclaimed environments' telemetry reaches the API (guard accepts this path too) - Rename CrashEvent's installer.version to cli.version (crashes happen outside the installer too) - initForNonInstaller() now wires up user_id and claim-token auth
Closes security-audit finding #1 on PR #122 (telemetry message sanitization). `error.message` was flowing into 4 capture sites unsanitized, leaking homedir paths (and rarely, credentials) to the WorkOS gateway. - Add `sanitizeMessage()` in crash-reporter.ts: homedir strip + Bearer/ sk_*/JWT redaction + 1KB truncation. - Factor secret redaction into shared `redactSecrets()` used by both `sanitizeMessage` and `sanitizeStack` (Node echoes `.message` into the leading `Error.stack` line, so message-only sanitization was insufficient). - Add private `extractErrorFields()` chokepoint on `Analytics`; route all 4 capture sites through it (`captureException`, `stepCompleted`, `commandExecuted`, `captureUnhandledCrash`). `replaceLastCommandEvent` inherits sanitization via its delegation to `commandExecuted`. - `captureUnhandledCrash` now uses `sanitizeStack` instead of inline truncation, providing defense-in-depth for callers that bypass the crash-reporter wrapper. - Add regression guard test (`telemetry-sanitize.spec.ts`): poisons every capture method with homedir + Bearer + sk_live_ + JWT, asserts no marker reaches the serialized queue. Reviewed: ideation:reviewer cycle 1 PASS (0 critical, 0 high).
Introduces two additive telemetry signals requested by the signals spec:
- device.id — persistent per-install UUID stored at ~/.workos/device-id.
File IO failures fall through to a one-shot session UUID; never throws.
- auth.mode — 4-state enum ('jwt' | 'claim_token' | 'api_key' | 'none')
derived during initForNonInstaller with JWT > claim_token > api_key
priority. Installer flow sets it in run-with-core after credentials
resolve.
Both fields are injected via getEnvFingerprint so they appear on every
event that already carries env context (session.start, session.end,
command, crash). No change to downstream gateway — existing deliveries
accept the new attrs as optional.
Implements Phase 1 of docs/ideation/telemetry-signals/spec-phase-1.md.
Gateway-side type mirroring is tracked separately in the gateway repo.
…and events Replace the boolean `command.success` as the primary outcome dimension with a structured `termination.reason` enum (success | cancelled | auth_required | validation_error | api_error | crash) plus a `error.code` string. - `exitWithError`, `exitWithCode`, `exitWithAuthRequired` now patch the provisional command event via `analytics.recordTermination` before calling `process.exit`. Previously the queued provisional event (success true, duration 0) was persisted as-is by store-forward, producing misleading dashboard data. - `exitWithError` now honors the string error code for exit mapping: `auth_required` -> 4, `cancelled` -> 2, `http_*`/`not_found`/`unknown_error` -> 1 (api_error), everything else -> 1 (validation_error). Previously it always exited 1. - New `TelemetryClient.patchLastEventOfType` helper mutates a queued event in place. Unlike `replaceLastEventOfType`, it preserves the event so multiple callers can update fields incrementally. - `wrapCommandHandler` records `reason: 'success'` on clean completion and `reason: 'crash'` with `error.name` on uncaught throw. - `command.success` remains on CommandEvent for backward-compat. - `api.status`/`api.code`/`api.resource` fields added to CommandEvent and plumbed through `recordTermination` for Phase 3 consumption. Review passed (2 medium + 1 low non-blocking; low fixed in same commit for crash-path test symmetry).
…mmand events Wires `createApiErrorHandler` through to the provisional command event via an optional `apiContext` param on `exitWithError`, populating `api.status`, `api.code`, and `api.resource` attributes on API-failure command events. Also treats `apiContext` presence as ground truth for `api_error` termination reason — WorkOS error codes like `rate_limited` or `validation_error` would otherwise fall through `resolveErrorCode` to `validation_error`, hiding legitimate API failures from api_error dashboards. The `CommandEvent` type already had the `api.*` attribute slots (added as forward-compat in Phase 2); only the wiring and tests are new.
- Fix duration=0 on exit-path events: middleware now sets analytics.commandStartTime so recordTermination can compute real duration when exitWithError/exitWithCode bypass the wrapper - Remove not_found/unknown_error from ERROR_CODE_MAP to prevent local config misses (env.ts) from being misclassified as API errors - Resolve auth.mode before sessionStart in installer flow so the session.start event carries the correct credential source - Make recordTermination fully idempotent: clears stale api.* and error.code fields when called without them - Tighten device-id UUID validation to proper RFC 4122 v4 regex - Only override termination reason to api_error when the code-based classification falls to the generic validation_error fallback, preserving more specific reasons like auth_required
Wrap the `api` command handler with `wrapCommandHandler()` so it reports real duration and success/failure instead of the provisional defaults (duration=0, success=true). Replace raw `process.exit()` calls with `exitWithCode`/`exitWithError` across 10 command files so `recordTermination` fires before exit. This ensures `termination.reason` and `error.code` are populated on every command event. Long-running server handlers (dev/emulate SIGINT) are left as-is. Fix help-json.ts alias drift by importing from the shared `COMMAND_ALIASES` map instead of maintaining a private copy. Add a coverage test that scans bin.ts for inline handlers missing `wrapCommandHandler`, catching the class of regression that let the `api` command slip through.
0475ea8 to
8003788
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6ce40944-6658-4efe-aa8c-e9db5c81063a
📒 Files selected for processing (39)
CLAUDE.mdsrc/bin.tssrc/commands/api/index.spec.tssrc/commands/api/index.tssrc/commands/api/interactive.spec.tssrc/commands/api/interactive.tssrc/commands/debug.tssrc/commands/dev.tssrc/commands/doctor.tssrc/commands/emulate.tssrc/commands/env.tssrc/commands/install-skill.tssrc/commands/login.tssrc/commands/uninstall-skill.tssrc/lib/api-error-handler.spec.tssrc/lib/api-error-handler.tssrc/lib/command-aliases.tssrc/lib/device-id.spec.tssrc/lib/device-id.tssrc/lib/run-with-core.tssrc/utils/analytics.spec.tssrc/utils/analytics.tssrc/utils/command-telemetry.spec.tssrc/utils/command-telemetry.tssrc/utils/crash-reporter.spec.tssrc/utils/crash-reporter.tssrc/utils/exit-codes.spec.tssrc/utils/exit-codes.tssrc/utils/help-json.tssrc/utils/output.spec.tssrc/utils/output.tssrc/utils/register-subcommand.tssrc/utils/telemetry-client.spec.tssrc/utils/telemetry-client.tssrc/utils/telemetry-sanitize.spec.tssrc/utils/telemetry-schema.spec.tssrc/utils/telemetry-store-forward.spec.tssrc/utils/telemetry-store-forward.tssrc/utils/telemetry-types.ts
✅ Files skipped from review due to trivial changes (3)
- src/commands/api/index.spec.ts
- CLAUDE.md
- src/commands/api/interactive.spec.ts
🚧 Files skipped from review as they are similar to previous changes (31)
- src/commands/debug.ts
- src/commands/doctor.ts
- src/commands/emulate.ts
- src/utils/telemetry-store-forward.spec.ts
- src/commands/uninstall-skill.ts
- src/commands/install-skill.ts
- src/lib/command-aliases.ts
- src/commands/api/index.ts
- src/utils/output.ts
- src/utils/help-json.ts
- src/utils/exit-codes.spec.ts
- src/commands/env.ts
- src/commands/login.ts
- src/utils/command-telemetry.ts
- src/utils/telemetry-types.ts
- src/utils/crash-reporter.spec.ts
- src/commands/api/interactive.ts
- src/commands/dev.ts
- src/utils/exit-codes.ts
- src/lib/run-with-core.ts
- src/utils/telemetry-schema.spec.ts
- src/utils/telemetry-store-forward.ts
- src/utils/crash-reporter.ts
- src/lib/api-error-handler.ts
- src/utils/telemetry-client.spec.ts
- src/lib/device-id.spec.ts
- src/utils/telemetry-client.ts
- src/lib/api-error-handler.spec.ts
- src/lib/device-id.ts
- src/utils/analytics.ts
- src/bin.ts
…ze env.shell Make `commandExecuted` private on Analytics, exposing only `queueProvisionalCommand` for the middleware provisional-event path. Prevents accidental rogue command events that break the swap pattern. Record only `basename(SHELL)` in env.shell to avoid leaking homedir paths (e.g. ~/.local/bin/fish). Wrap `migrations` command with `wrapCommandHandler` for telemetry.
…OfType from TelemetryClient
…ndler, and auto-wrapping
Wraps the yargs chain in runCli(), captures the canonical command name via middleware, emits a single command telemetry event from the lifecycle try/catch/finally (success, CliExit, or crash), and flushes the telemetry client before exiting. Removes per-handler wrapCommandHandler() wrappers and the legacy commandTelemetryMiddleware. Also drops process.exit(0) calls in handlers that no longer need to short-circuit since exit is now centralized.
Migrate 12 spec files from process.exit spies to CliExit throw assertions, matching the exitWithCode/exitWithError refactor.
- .fail() now re-throws CliExit so handler exits preserve their context (reason, errorCode, apiContext) instead of being replaced with generic validation_error - Remove unconditional process.exit() from finally block so long-running commands (dev, emulate) stay alive after wiring listeners; their signal handlers still call process.exit() directly - doctor.ts catch block re-throws CliExit so exitWithCode(SUCCESS) doesn't get caught and re-classified as GENERAL_ERROR
exitWithError outside runCli() threw CliExit into the crash reporter, recording intentional validation errors as crash events. Now caught and exited cleanly before the crash reporter sees it.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/utils/command-telemetry.ts (1)
12-17:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winHarden flag extraction to avoid invalid telemetry flags.
The current filter accepts
--(which becomes empty string after processing) and numeric short flags like-1,-2(which become'1','2') that pollute the telemetrycommand.flagsattribute with invalid entries.🛡️ Suggested fix from previous review
export function extractUserFlags(rawArgs: string[]): string[] { const passedFlags = rawArgs - .filter((arg) => arg.startsWith('--') || (arg.startsWith('-') && arg.length === 2)) - .map((arg) => arg.replace(/^-+/, '').split('=')[0]); + .filter((arg) => { + if (arg === '--') return false; + if (/^--[A-Za-z][\w-]*(=.*)?$/.test(arg)) return true; + if (/^-[A-Za-z]$/.test(arg)) return true; + return false; + }) + .map((arg) => arg.replace(/^-+/, '').split('=')[0]) + .filter(Boolean); return [...new Set(passedFlags)]; }
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a84b559e-a9da-49e6-a70b-1b60ba4b833d
📒 Files selected for processing (27)
CLAUDE.mdsrc/bin.tssrc/commands/api/index.spec.tssrc/commands/api/interactive.spec.tssrc/commands/connection.spec.tssrc/commands/directory.spec.tssrc/commands/doctor.tssrc/commands/env.spec.tssrc/commands/install.spec.tssrc/commands/install.tssrc/commands/membership.spec.tssrc/commands/seed.spec.tssrc/commands/uninstall-skill.spec.tssrc/lib/api-error-handler.spec.tssrc/utils/analytics.spec.tssrc/utils/analytics.tssrc/utils/cli-exit.spec.tssrc/utils/cli-exit.tssrc/utils/command-telemetry.spec.tssrc/utils/command-telemetry.tssrc/utils/exit-codes.spec.tssrc/utils/exit-codes.tssrc/utils/output.spec.tssrc/utils/output.tssrc/utils/telemetry-client.spec.tssrc/utils/telemetry-client.tssrc/utils/telemetry-sanitize.spec.ts
💤 Files with no reviewable changes (2)
- src/utils/telemetry-client.spec.ts
- src/utils/telemetry-client.ts
✅ Files skipped from review due to trivial changes (1)
- CLAUDE.md
🚧 Files skipped from review as they are similar to previous changes (1)
- src/utils/command-telemetry.spec.ts
| clack.log.info(`Debug logs: ${logPath}`); | ||
| } | ||
| process.exit(1); | ||
| throw new CliExit(1, { reason: 'crash' }); |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Use handler exit helpers instead of constructing CliExit directly.
Line 57 throws CliExit manually; command handlers should terminate via exitWithCode() / exitWithError() to keep lifecycle behavior consistent.
Suggested patch
-import { CliExit } from '../utils/cli-exit.js';
+import { ExitCode, exitWithCode } from '../utils/exit-codes.js';
@@
- throw new CliExit(1, { reason: 'crash' });
+ exitWithCode(ExitCode.GENERAL_ERROR);As per coding guidelines src/commands/**/*.ts: Use exitWithError() or exitWithCode() from handlers — they throw CliExit which the lifecycle catches.
…dation exitWithError outside runCli() threw CliExit into the crash reporter, recording intentional validation errors as crash events. Use outputError + process.exit directly since this runs before the centralized lifecycle exists.
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4dc74551-d649-4729-a366-dc7e375c36f4
📒 Files selected for processing (28)
CLAUDE.mdREADME.mdsrc/bin.tssrc/commands/api/index.spec.tssrc/commands/api/interactive.spec.tssrc/commands/connection.spec.tssrc/commands/directory.spec.tssrc/commands/doctor.tssrc/commands/env.spec.tssrc/commands/install.spec.tssrc/commands/install.tssrc/commands/membership.spec.tssrc/commands/seed.spec.tssrc/commands/uninstall-skill.spec.tssrc/lib/api-error-handler.spec.tssrc/utils/analytics.spec.tssrc/utils/analytics.tssrc/utils/cli-exit.spec.tssrc/utils/cli-exit.tssrc/utils/command-telemetry.spec.tssrc/utils/command-telemetry.tssrc/utils/exit-codes.spec.tssrc/utils/exit-codes.tssrc/utils/output.spec.tssrc/utils/output.tssrc/utils/telemetry-client.spec.tssrc/utils/telemetry-client.tssrc/utils/telemetry-sanitize.spec.ts
💤 Files with no reviewable changes (2)
- src/utils/telemetry-client.spec.ts
- src/utils/telemetry-client.ts
✅ Files skipped from review due to trivial changes (1)
- CLAUDE.md
🚧 Files skipped from review as they are similar to previous changes (22)
- src/commands/seed.spec.ts
- src/utils/cli-exit.spec.ts
- src/commands/directory.spec.ts
- src/commands/doctor.ts
- src/utils/telemetry-sanitize.spec.ts
- src/commands/membership.spec.ts
- src/commands/api/interactive.spec.ts
- src/utils/command-telemetry.spec.ts
- src/commands/uninstall-skill.spec.ts
- src/utils/command-telemetry.ts
- src/commands/install.spec.ts
- src/utils/cli-exit.ts
- src/commands/install.ts
- src/utils/output.ts
- src/utils/exit-codes.spec.ts
- src/lib/api-error-handler.spec.ts
- src/commands/connection.spec.ts
- src/utils/analytics.spec.ts
- src/utils/output.spec.ts
- src/utils/analytics.ts
- src/commands/api/index.spec.ts
- src/commands/env.spec.ts
…API caps
- crash-reporter: cap sanitized stack (marker included) at 4096 so the API's
per-attribute Zod limit can't silently drop oversized crash events; also
collapse Windows node_modules/dist/src paths; treat a CliExit that reaches
the global handlers as an intentional exit rather than a crash (fixes false
crash telemetry from dev's fire-and-forget child `error` listener)
- exit-codes: map `no_api_key` to `auth_required`, not `validation_error`
- analytics: sanitize error message before the WORKOS_DEBUG log line
- doctor: emit structured `{ error: { code, message } }` in JSON mode
- command-telemetry: harden extractUserFlags (ignore `--` and negative values)
- telemetry-client: persist store-forward file with 0700/0600 modes
- README: scope command-event wording to telemetry-enabled commands
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/utils/analytics.spec.ts (1)
518-548: ⚡ Quick winImprove type safety in mock call assertions.
The
anytype annotation in thefind()callbacks bypasses TypeScript's type checking. Consider defining a type for the queued event structure or using a more specific type.♻️ Proposed improvement
Define a helper type at the top of the test file:
+type QueuedEvent = { + type: string; + attributes: Record<string, unknown>; + [key: string]: unknown; +};Then update the find callbacks:
- const event = mockQueueEvent.mock.calls.find((c: any) => c[0].type === 'command')[0]; + const event = mockQueueEvent.mock.calls.find((c: [QueuedEvent]) => c[0].type === 'command')?.[0];Apply the same pattern to lines 528, 539, and 548.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a3ddcfb0-a669-4a6b-a9d4-f54c043cb0bc
📒 Files selected for processing (29)
CLAUDE.mdREADME.mdsrc/bin.tssrc/commands/api/index.spec.tssrc/commands/api/interactive.spec.tssrc/commands/connection.spec.tssrc/commands/directory.spec.tssrc/commands/doctor.tssrc/commands/env.spec.tssrc/commands/install.spec.tssrc/commands/install.tssrc/commands/membership.spec.tssrc/commands/seed.spec.tssrc/commands/uninstall-skill.spec.tssrc/lib/api-error-handler.spec.tssrc/utils/analytics.spec.tssrc/utils/analytics.tssrc/utils/cli-exit.spec.tssrc/utils/cli-exit.tssrc/utils/command-telemetry.spec.tssrc/utils/command-telemetry.tssrc/utils/crash-reporter.tssrc/utils/exit-codes.spec.tssrc/utils/exit-codes.tssrc/utils/output.spec.tssrc/utils/output.tssrc/utils/telemetry-client.spec.tssrc/utils/telemetry-client.tssrc/utils/telemetry-sanitize.spec.ts
✅ Files skipped from review due to trivial changes (1)
- README.md
🚧 Files skipped from review as they are similar to previous changes (26)
- src/utils/cli-exit.spec.ts
- CLAUDE.md
- src/utils/output.ts
- src/commands/doctor.ts
- src/commands/env.spec.ts
- src/commands/api/interactive.spec.ts
- src/utils/exit-codes.ts
- src/commands/connection.spec.ts
- src/utils/command-telemetry.spec.ts
- src/utils/cli-exit.ts
- src/commands/seed.spec.ts
- src/commands/install.ts
- src/commands/directory.spec.ts
- src/utils/telemetry-client.spec.ts
- src/utils/crash-reporter.ts
- src/commands/install.spec.ts
- src/utils/output.spec.ts
- src/lib/api-error-handler.spec.ts
- src/utils/exit-codes.spec.ts
- src/utils/telemetry-sanitize.spec.ts
- src/commands/uninstall-skill.spec.ts
- src/commands/membership.spec.ts
- src/utils/command-telemetry.ts
- src/commands/api/index.spec.ts
- src/utils/analytics.ts
- src/utils/telemetry-client.ts
Telemetry now posts to a dedicated /cli/telemetry endpoint (WORKOS_TELEMETRY_URL, default https://api.workos.com/cli) rather than sharing the LLM gateway route. WORKOS_LLM_GATEWAY_URL stays scoped to doctor/install LLM proxy traffic. The client sends x-workos-api-key when auth mode is API key and no JWT or claim-token transport is available, so API-key-only cohorts (CI, headless) finally deliver telemetry. Auth precedence: JWT > claim token > API key.
|
Actionable comments posted: 0 |
Record the detected framework on session.end (read from the final state machine snapshot) so the API can break install metrics down by framework and compute success rate per integration. Absent when a session aborts before detection runs.
|
Actionable comments posted: 0 |
| try { | ||
| if (!existsSync(PENDING_DIR)) return; | ||
| const files = readdirSync(PENDING_DIR).filter((f) => f.startsWith('pending-') && f.endsWith('.json')); | ||
|
|
||
| const recoveredFiles: string[] = []; | ||
| for (const file of files) { | ||
| const filePath = join(PENDING_DIR, file); | ||
| try { | ||
| const raw = readFileSync(filePath, 'utf-8'); | ||
| const events = JSON.parse(raw); | ||
| if (Array.isArray(events) && events.length > 0) { | ||
| telemetryClient.queueEvents(events); | ||
| recoveredFiles.push(filePath); | ||
| } else { | ||
| // Empty file — delete immediately | ||
| try { | ||
| unlinkSync(filePath); | ||
| } catch { | ||
| /* ignore */ | ||
| } | ||
| } | ||
| } catch { | ||
| // Corrupted file — delete and move on | ||
| try { | ||
| unlinkSync(filePath); | ||
| } catch { | ||
| /* ignore */ | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Delete source files — events are now in memory regardless of flush outcome. | ||
| // If flush succeeds: events sent, done. | ||
| // If flush fails: events stay in memory, exit handler re-persists to new PID file. | ||
| for (const filePath of recoveredFiles) { | ||
| try { | ||
| unlinkSync(filePath); | ||
| } catch { | ||
| /* ignore */ | ||
| } | ||
| } | ||
|
|
||
| // Flush all recovered events in one batch |
There was a problem hiding this comment.
Pending events bounce indefinitely when telemetry is disabled
When a user sets WORKOS_TELEMETRY=false after a crash has written a pending file, the loop never resolves: recoverPendingEvents reads the file, calls telemetryClient.queueEvents(events) (no WORKOS_TELEMETRY_ENABLED guard here), then calls flush() which returns false because initForNonInstaller never set gatewayUrl. The source files are deleted before the flush attempt, so the events are orphaned in memory. On process exit, persistToFile sees this.events.length > 0 and writes them to a new PID file. Every subsequent invocation with telemetry disabled repeats this cycle.
The simplest fix is to bail out early when telemetry is disabled — either by checking WORKOS_TELEMETRY_ENABLED at the top of recoverPendingEvents, or by guarding the call site in bin.ts with the same constant.
Summary
Adds telemetry coverage to every CLI command with a centralized lifecycle that owns timing, success/failure classification, and event emission from one place.
Telemetry infrastructure
uncaughtException/unhandledRejectionhandlers with sanitized stack tracesWORKOS_DEBUG=1env var for verbose debug logging on all commandsCentralized command lifecycle (
runCli())yargs.exitProcess(false)+parseAsync()with a single try/catchexitWithCode()/exitWithError()throwCliExit(typed error carrying exit code + telemetry context) instead of callingprocess.exit()emitCommandEvent()call per command outcome (success, structured exit, crash)wrapCommandHandler()wrappers, provisional events, and event patchingWhat this replaces
The previous design required every handler to be wrapped with
wrapCommandHandler()and used a provisional-event/replace/patch chain across 4 layers (middleware, wrapper, exit helpers, analytics patching). That design was fragile (forgotten wrappers produced misleading success=true/duration=0 events) and required a regex guardrail test to enforce.Design decisions
install,dashboard, and$0are excluded from command telemetry (own session-based telemetry)dev,emulate) keep the event loop alive via server/child listeners; their signal handlers callprocess.exit()directly--mode) usesoutputError()+process.exit()directly sincerunCli()doesn't exist yet at that pointTest plan
pnpm typecheckpassespnpm testpasses (146 files, 1926 tests)pnpm buildpassesworkos doctor --json --skip-ai --skip-apiexits 0 with clean JSON (no CliExit leak)workos emulate --json --port 0stays alive and serves/healthworkos org list(no API key) produces structuredno_api_keyerror with correct termination.reasonworkos --mode robot doctorexits 1 with structured error, no crash event in store-forwardSummary by CodeRabbit
New Features
Bug Fixes
Documentation