Skip to content

feat: CLI telemetry for all commands with centralized lifecycle#122

Open
nicknisi wants to merge 35 commits into
mainfrom
nicknisi/telemetry
Open

feat: CLI telemetry for all commands with centralized lifecycle#122
nicknisi wants to merge 35 commits into
mainfrom
nicknisi/telemetry

Conversation

@nicknisi
Copy link
Copy Markdown
Member

@nicknisi nicknisi commented Apr 14, 2026

Summary

Adds telemetry coverage to every CLI command with a centralized lifecycle that owns timing, success/failure classification, and event emission from one place.

Telemetry infrastructure

  • Command events for every yargs command (name, duration, success/failure, flags, termination reason, error code, API context)
  • Crash reporting via global uncaughtException/unhandledRejection handlers with sanitized stack traces
  • Store-and-forward persistence to PID-based temp files on exit; next invocation recovers and sends
  • WORKOS_DEBUG=1 env var for verbose debug logging on all commands

Centralized command lifecycle (runCli())

  • Uses yargs.exitProcess(false) + parseAsync() with a single try/catch
  • exitWithCode() / exitWithError() throw CliExit (typed error carrying exit code + telemetry context) instead of calling process.exit()
  • One emitCommandEvent() call per command outcome (success, structured exit, crash)
  • Eliminates per-handler wrapCommandHandler() wrappers, provisional events, and event patching
  • Command handlers are plain async functions with no telemetry awareness

What this replaces

The previous design required every handler to be wrapped with wrapCommandHandler() and used a provisional-event/replace/patch chain across 4 layers (middleware, wrapper, exit helpers, analytics patching). That design was fragile (forgotten wrappers produced misleading success=true/duration=0 events) and required a regex guardrail test to enforce.

Design decisions

  • install, dashboard, and $0 are excluded from command telemetry (own session-based telemetry)
  • Non-installer telemetry only works for JWT-authenticated users (API-key-only users' events are silently dropped by the gateway guard)
  • Long-running commands (dev, emulate) keep the event loop alive via server/child listeners; their signal handlers call process.exit() directly
  • Pre-lifecycle validation (--mode) uses outputError() + process.exit() directly since runCli() doesn't exist yet at that point

Test plan

  • pnpm typecheck passes
  • pnpm test passes (146 files, 1926 tests)
  • pnpm build passes
  • workos doctor --json --skip-ai --skip-api exits 0 with clean JSON (no CliExit leak)
  • workos emulate --json --port 0 stays alive and serves /health
  • workos org list (no API key) produces structured no_api_key error with correct termination.reason
  • workos --mode robot doctor exits 1 with structured error, no crash event in store-forward

Summary by CodeRabbit

  • New Features

    • CLI telemetry: command/session/crash events with alias-aware command names, stable device ID, environment/device fingerprint, auth-mode selection, and command-duration/flag telemetry.
    • Reliability: store-and-forward persistence with recovery, queued flush semantics, and sanitized crash reporting.
  • Bug Fixes

    • Standardized structured exit/error handling that preserves exit classifications and API context.
  • Documentation

    • Updated telemetry docs and README, including disable instructions.

nicknisi added a commit that referenced this pull request Apr 15, 2026
Closes security-audit finding #1 on PR #122 (telemetry message
sanitization). `error.message` was flowing into 4 capture sites
unsanitized, leaking homedir paths (and rarely, credentials) to the
WorkOS gateway.

- Add `sanitizeMessage()` in crash-reporter.ts: homedir strip + Bearer/
  sk_*/JWT redaction + 1KB truncation.
- Factor secret redaction into shared `redactSecrets()` used by both
  `sanitizeMessage` and `sanitizeStack` (Node echoes `.message` into the
  leading `Error.stack` line, so message-only sanitization was
  insufficient).
- Add private `extractErrorFields()` chokepoint on `Analytics`; route
  all 4 capture sites through it (`captureException`, `stepCompleted`,
  `commandExecuted`, `captureUnhandledCrash`). `replaceLastCommandEvent`
  inherits sanitization via its delegation to `commandExecuted`.
- `captureUnhandledCrash` now uses `sanitizeStack` instead of inline
  truncation, providing defense-in-depth for callers that bypass the
  crash-reporter wrapper.
- Add regression guard test (`telemetry-sanitize.spec.ts`): poisons
  every capture method with homedir + Bearer + sk_live_ + JWT, asserts
  no marker reaches the serialized queue.

Reviewed: ideation:reviewer cycle 1 PASS (0 critical, 0 high).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fd414442-7dc0-4a46-b5b7-3b9b02051d56

📥 Commits

Reviewing files that changed from the base of the PR and between ccf0bc5 and c36368d.

📒 Files selected for processing (2)
  • src/lib/run-with-core.ts
  • src/utils/analytics.spec.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/utils/analytics.spec.ts

📝 Walkthrough

Walkthrough

Adds CLI telemetry pipeline (command/session/crash), device-id persistence and store‑and‑forward, crash sanitization and reporting, telemetry client queue/persist/flush/auth modes, analytics event emission (emitCommandEvent/captureUnhandledCrash), structured CliExit/exit-code mapping, API error enrichment, CLI middleware/handler wrapping, and corresponding tests/docs.

Changes

Telemetry & CLI Exit

Layer / File(s) Summary
Telemetry types & schema
src/utils/telemetry-types.ts, src/utils/telemetry-schema.spec.ts
Adds command/crash event kinds, AuthMode/TerminationReason, session/device fingerprint fields, and startTimestamp on step/tool events; schema tests added.
Device ID & store-forward
src/lib/device-id.ts, src/lib/device-id.spec.ts, src/utils/telemetry-store-forward.ts, src/utils/telemetry-store-forward.spec.ts
Implements persistent ~/.workos/device-id with safe fallback and PID-scoped pending-file persistence/recovery for queued telemetry; tests validate behavior and recovery.
Crash reporter & sanitization
src/utils/crash-reporter.ts, src/utils/telemetry-sanitize.spec.ts, src/utils/crash-reporter.spec.ts
Adds stack/message sanitizers (home-path, secrets, truncation) and synchronous uncaught/unhandled handlers that report via analytics and then exit; tests ensure redaction/truncation and handler semantics.
Telemetry client: queue/flush/persist & auth
src/utils/telemetry-client.ts, src/utils/telemetry-client.spec.ts
Adds queueEvents, setClaimTokenAuth/setApiKeyAuth, snapshot-based flush returning boolean (retry semantics), and persistToFile for store-forward; tests updated for semantics and fs interactions.
Analytics extensions & events
src/utils/analytics.ts, src/utils/analytics.spec.ts
Adds initForNonInstaller, auth-mode wiring, environment/device fingerprint enrichment for events, emitCommandEvent, captureUnhandledCrash, sanitized exception extraction, and tests for propagation and auth precedence.
CliExit & exit-code mapping
src/utils/cli-exit.ts, src/utils/exit-codes.ts, src/utils/output.ts, src/utils/output.spec.ts, src/utils/exit-codes.spec.ts
Introduces CliExit with structured CliExitContext, resolveErrorCode mapping to TerminationReason and exit values, and exitWithCode/exitWithError now throw CliExit with structured context; tests updated accordingly.
API error handler enrichment
src/lib/api-error-handler.ts, src/lib/api-error-handler.spec.ts
Standardizes derived error code and attaches structured apiContext (status, code, resource) to errors passed to exit/reporting; tests assert telemetry/apiContext propagation.
Command telemetry helpers & aliases
src/lib/command-aliases.ts, src/utils/command-telemetry.ts, src/utils/command-telemetry.spec.ts
Adds canonical COMMAND_ALIASES, SKIP_TELEMETRY_COMMANDS, resolveCanonicalName, and extractUserFlags to normalize command names and flags for telemetry; tests validate behaviors.
CLI bootstrap, middleware & handler wrapping
src/bin.ts, CLAUDE.md
Initializes debug/crash/store-forward/analytics early, recovers pending events, inserts commandTelemetryMiddleware into yargs, and wraps top-level .command() handlers with wrapCommandHandler() for automatic command telemetry; docs updated.
Replace process.exit in commands
src/commands/* (many files: api, interactive, dev, emulate, env, doctor, install, install-skill, uninstall-skill, login, seed, etc.)
Replaces direct process.exit calls with exitWithCode/exitWithError/throwing CliExit across commands to provide structured termination and telemetry context; tests updated to assert CliExit.
Help/README/docs & settings
README.md, CLAUDE.md, src/utils/help-json.ts, src/lib/settings.ts, src/cli.config.ts
Updates telemetry docs, disable example, help-json command resolution to use shared COMMAND_ALIASES, adds workos.telemetryUrl config and getTelemetryUrl(), and documents WORKOS_DEBUG/WORKOS_TELEMETRY_URL.
  • Possibly related PRs:
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch nicknisi/telemetry

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (4)
src/utils/telemetry-types.ts (1)

58-58: ⚡ Quick win

Align startTimestamp optionality with backward-compat contract.

telemetry-schema.spec.ts explicitly accepts step/agent.tool without startTimestamp, but these interfaces currently require it. Making both optional avoids contract drift.

Suggested patch
 export interface StepEvent extends TelemetryEvent {
   type: 'step';
   name: string;
-  startTimestamp: string;
+  startTimestamp?: string;
   durationMs: number;
   success: boolean;
   error?: {
@@
 export interface AgentToolEvent extends TelemetryEvent {
   type: 'agent.tool';
   toolName: string;
-  startTimestamp: string;
+  startTimestamp?: string;
   durationMs: number;
   success: boolean;
 }

Also applies to: 70-70

src/utils/telemetry-client.spec.ts (1)

60-105: ⚡ Quick win

Add coverage for claim-token auth headers.

setClaimTokenAuth() is new behavior in this PR, but there’s no test asserting x-workos-claim-token + x-workos-client-id emission (and bearer-token precedence when both exist).

Also applies to: 124-218

src/utils/telemetry-store-forward.ts (1)

1-3: ⚡ Quick win

Use async fs APIs for startup recovery path.

recoverPendingEvents() (Line 25+) is async but uses sync disk calls (Lines 27-56), which blocks startup and conflicts with the project rule for TS files.

As per coding guidelines "Avoid Node-specific sync APIs (crypto, fs sync) unless necessary".

Also applies to: 27-56

src/utils/output.ts (1)

123-124: ⚡ Quick win

Protect process termination from telemetry failures

analytics.recordTermination(...) on Line 123 can throw and prevent Line 124 from executing. Error exits should remain deterministic even when telemetry is unhealthy.

🔧 Proposed fix
   const reason = error.apiContext ? 'api_error' : codeReason;
-  analytics.recordTermination(reason, error.code, error.apiContext);
-  process.exit(exit);
+  try {
+    analytics.recordTermination(reason, error.code, error.apiContext);
+  } finally {
+    process.exit(exit);
+  }
 }

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c45afda9-659c-43a0-9646-cd08dd0cabe3

📥 Commits

Reviewing files that changed from the base of the PR and between 524c709 and 31829ec.

📒 Files selected for processing (27)
  • CLAUDE.md
  • src/bin.ts
  • src/commands/debug.ts
  • src/lib/api-error-handler.spec.ts
  • src/lib/api-error-handler.ts
  • src/lib/command-aliases.ts
  • src/lib/device-id.spec.ts
  • src/lib/device-id.ts
  • src/lib/run-with-core.ts
  • src/utils/analytics.spec.ts
  • src/utils/analytics.ts
  • src/utils/command-telemetry.spec.ts
  • src/utils/command-telemetry.ts
  • src/utils/crash-reporter.spec.ts
  • src/utils/crash-reporter.ts
  • src/utils/exit-codes.spec.ts
  • src/utils/exit-codes.ts
  • src/utils/output.spec.ts
  • src/utils/output.ts
  • src/utils/register-subcommand.ts
  • src/utils/telemetry-client.spec.ts
  • src/utils/telemetry-client.ts
  • src/utils/telemetry-sanitize.spec.ts
  • src/utils/telemetry-schema.spec.ts
  • src/utils/telemetry-store-forward.spec.ts
  • src/utils/telemetry-store-forward.ts
  • src/utils/telemetry-types.ts

Comment thread src/lib/device-id.ts Outdated
Comment thread src/lib/device-id.ts
Comment on lines +31 to +43
try {
if (fs.existsSync(filePath)) {
const raw = fs.readFileSync(filePath, 'utf8').trim();
if (UUID_REGEX.test(raw)) {
cached = raw;
return raw;
}
}

const id = crypto.randomUUID();
fs.mkdirSync(path.dirname(filePath), { recursive: true, mode: 0o700 });
fs.writeFileSync(filePath, id, { encoding: 'utf8', mode: 0o600 });
cached = id;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Replace sync fs usage in the command path.

Line 31–43 and Line 49 use synchronous fs/crypto APIs on a hot CLI path. Please move this to async (node:fs/promises) and cache a pending promise to keep call sites simple.

As per coding guidelines src/**/*.ts: “Avoid Node-specific sync APIs (crypto, fs sync) unless necessary”.

Also applies to: 49-50

Comment thread src/utils/analytics.ts
Comment thread src/utils/command-telemetry.ts
Comment thread src/utils/command-telemetry.ts Outdated
Comment thread src/utils/crash-reporter.ts Outdated
Comment thread src/utils/exit-codes.ts Outdated
Comment on lines +85 to 94
async flush(): Promise<boolean> {
if (this.events.length === 0) return true;
if (!this.gatewayUrl) {
debug('[Telemetry] No gateway URL configured, skipping flush');
return;
return false;
}

const payload: TelemetryRequest = { events: [...this.events] };
this.events = [];
const count = this.events.length;
const payload: TelemetryRequest = { events: this.events.slice(0, count) };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Guard against concurrent flush() calls to prevent event loss/duplication.

Line 85 currently allows overlapping flushes. Two concurrent calls can send the same snapshot twice, and later splice(0, count) can remove events queued after the first flush.

Suggested patch
 export class TelemetryClient {
   private events: TelemetryEvent[] = [];
+  private flushInFlight: Promise<boolean> | null = null;
@@
   async flush(): Promise<boolean> {
+    if (this.flushInFlight) return this.flushInFlight;
+    this.flushInFlight = this.flushInternal();
+    try {
+      return await this.flushInFlight;
+    } finally {
+      this.flushInFlight = null;
+    }
+  }
+
+  private async flushInternal(): Promise<boolean> {
     if (this.events.length === 0) return true;
@@
-  }
+  }
 }

Also applies to: 135-137, 143-144

Comment thread src/utils/telemetry-client.ts Outdated
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR replaces the fragile provisional-event/patch-chain telemetry design with a centralized runCli() lifecycle that owns timing, success/failure classification, and event emission from a single try/catch. It also adds crash reporting, store-and-forward persistence, a persistent device ID, and environment fingerprinting across all CLI commands.

  • Centralized lifecycle: yargs.exitProcess(false) + parseAsync() with a single try/catch replaces per-handler wrapCommandHandler() wrappers; exitWithCode/exitWithError throw CliExit instead of calling process.exit so every exit path is intercepted.
  • New infrastructure: crash-reporter.ts, telemetry-store-forward.ts, device-id.ts, and command-telemetry.ts added.
  • Auth enrichment: analytics.initForNonInstaller() resolves JWT → claim-token → API-key auth priority; CommandEvent/CrashEvent types carry env fingerprints and API context.

Confidence Score: 4/5

Safe to merge with awareness of two edge cases in the telemetry subsystem that do not affect CLI command correctness.

The core lifecycle change is well-structured and the crash reporter correctly handles CliExit escaping async listeners. The store-forward implementation has a real defect: when WORKOS_TELEMETRY=false is set after a crash has written a pending file, events cannot be flushed and are re-persisted on every exit indefinitely. The dev.ts async error path also bypasses runCli()'s finally flush. Both issues are confined to the telemetry subsystem.

src/utils/telemetry-store-forward.ts and src/commands/dev.ts warrant a second look before the next release.

Important Files Changed

Filename Overview
src/bin.ts Centralizes command lifecycle in runCli() with a single try/catch; emits command telemetry on success, CliExit, and crash paths. The finally flush is correct for normal exits but is bypassed in dev/emulate fire-and-forget async error paths.
src/utils/telemetry-store-forward.ts New store-and-forward module; introduces a defect where pending events loop indefinitely when WORKOS_TELEMETRY=false is set. Otherwise PID-based naming, 0o600 file modes, and error handling are correct.
src/utils/crash-reporter.ts New crash reporter correctly guards CliExit in both global handlers. isCrashing re-entrancy guard is solid. Sanitization uses shared helpers consistently.
src/utils/analytics.ts Adds emitCommandEvent, captureUnhandledCrash, initForNonInstaller, and env-fingerprint helpers. Auth priority chain is correctly ordered. Minor: basename called on the literal 'unknown' fallback value.
src/utils/cli-exit.ts New CliExit typed error class carrying exit code and telemetry context. Clean and minimal.
src/utils/exit-codes.ts Adds resolveErrorCode, reasonForExitCode, and ERROR_CODE_MAP with the no_api_key → auth_required mapping. Both exit helpers now throw CliExit.
src/utils/telemetry-client.ts Adds claim-token and API-key auth paths, queueEvents, persistToFile, and improved splice-based flush semantics.
src/commands/dev.ts exitWithCode throws CliExit inside fire-and-forget async listeners, bypassing runCli()'s finally flush.
src/lib/api-error-handler.ts Enriches exitWithError calls with apiContext for telemetry. No functional changes to error-handling logic.
src/lib/device-id.ts New persistent device-ID module with UUID validation, in-process cache, and graceful IO fallback.
src/utils/command-telemetry.ts New module with alias-aware name resolution, flag extraction, and SKIP_TELEMETRY_COMMANDS.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[bin.ts startup] --> B[installCrashReporter]
    A --> C[installStoreForward]
    A --> D[analytics.initForNonInstaller]
    A --> E[recoverPendingEvents fire-and-forget]
    B --> F{uncaughtException / unhandledRejection}
    F -->|CliExit| G[process.exit exitCode]
    F -->|real crash| H[reportCrashSync queue crash event]
    E --> I[readdir pending json files]
    I --> J[queueEvents recovered]
    J --> K[telemetryClient.flush]
    K -->|success| L[events removed]
    K -->|fail no gateway| M[events stay in memory]
    M --> N[process.on exit persistToFile new PID]
    A --> O[runCli]
    O --> P[yargs.parseAsync]
    P -->|success| Q[emitCommandEvent success]
    P -->|CliExit thrown| R[emitCommandEvent with reason]
    P -->|unexpected error| S[emitCommandEvent crash]
    Q --> T[finally telemetryClient.flush]
    R --> T
    S --> T
    T --> U[process exits]
Loading

Comments Outside Diff (1)

  1. src/commands/dev.ts, line 129-145 (link)

    P1 exitWithCode throws CliExit inside fire-and-forget async listeners, bypassing runCli()'s finally block

    Both exitWithCode calls in this file are inside async event listener callbacks (child.on('error', async (err) => {...})). When exitWithCode throws CliExit, the returned promise is rejected with no observer — Node routes it to unhandledRejection, which the crash reporter catches correctly and calls process.exit(exitCode) without emitting a crash event (good). However, process.exit() from the unhandledRejection handler terminates the process synchronously, skipping runCli()'s finally { await telemetryClient.flush() }. The already-queued success event will only reach the backend via store-forward on the next run.

Reviews (9): Last reviewed commit: "feat(telemetry): tag installer sessions ..." | Re-trigger Greptile

Comment thread src/utils/output.ts
Comment thread src/bin.ts
Comment thread src/utils/command-telemetry.ts Outdated
Comment thread src/utils/telemetry-client.ts Outdated
nicknisi added a commit that referenced this pull request May 13, 2026
Closes security-audit finding #1 on PR #122 (telemetry message
sanitization). `error.message` was flowing into 4 capture sites
unsanitized, leaking homedir paths (and rarely, credentials) to the
WorkOS gateway.

- Add `sanitizeMessage()` in crash-reporter.ts: homedir strip + Bearer/
  sk_*/JWT redaction + 1KB truncation.
- Factor secret redaction into shared `redactSecrets()` used by both
  `sanitizeMessage` and `sanitizeStack` (Node echoes `.message` into the
  leading `Error.stack` line, so message-only sanitization was
  insufficient).
- Add private `extractErrorFields()` chokepoint on `Analytics`; route
  all 4 capture sites through it (`captureException`, `stepCompleted`,
  `commandExecuted`, `captureUnhandledCrash`). `replaceLastCommandEvent`
  inherits sanitization via its delegation to `commandExecuted`.
- `captureUnhandledCrash` now uses `sanitizeStack` instead of inline
  truncation, providing defense-in-depth for callers that bypass the
  crash-reporter wrapper.
- Add regression guard test (`telemetry-sanitize.spec.ts`): poisons
  every capture method with homedir + Bearer + sk_live_ + JWT, asserts
  no marker reaches the serialized queue.

Reviewed: ideation:reviewer cycle 1 PASS (0 critical, 0 high).
@nicknisi nicknisi force-pushed the nicknisi/telemetry branch from 31829ec to 0475ea8 Compare May 13, 2026 03:20
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e6b7b384-44b3-47fe-9030-070bd92bf44e

📥 Commits

Reviewing files that changed from the base of the PR and between 31829ec and 0475ea8.

📒 Files selected for processing (39)
  • CLAUDE.md
  • src/bin.ts
  • src/commands/api/index.spec.ts
  • src/commands/api/index.ts
  • src/commands/api/interactive.spec.ts
  • src/commands/api/interactive.ts
  • src/commands/debug.ts
  • src/commands/dev.ts
  • src/commands/doctor.ts
  • src/commands/emulate.ts
  • src/commands/env.ts
  • src/commands/install-skill.ts
  • src/commands/login.ts
  • src/commands/uninstall-skill.ts
  • src/lib/api-error-handler.spec.ts
  • src/lib/api-error-handler.ts
  • src/lib/command-aliases.ts
  • src/lib/device-id.spec.ts
  • src/lib/device-id.ts
  • src/lib/run-with-core.ts
  • src/utils/analytics.spec.ts
  • src/utils/analytics.ts
  • src/utils/command-telemetry.spec.ts
  • src/utils/command-telemetry.ts
  • src/utils/crash-reporter.spec.ts
  • src/utils/crash-reporter.ts
  • src/utils/exit-codes.spec.ts
  • src/utils/exit-codes.ts
  • src/utils/help-json.ts
  • src/utils/output.spec.ts
  • src/utils/output.ts
  • src/utils/register-subcommand.ts
  • src/utils/telemetry-client.spec.ts
  • src/utils/telemetry-client.ts
  • src/utils/telemetry-sanitize.spec.ts
  • src/utils/telemetry-schema.spec.ts
  • src/utils/telemetry-store-forward.spec.ts
  • src/utils/telemetry-store-forward.ts
  • src/utils/telemetry-types.ts
✅ Files skipped from review due to trivial changes (1)
  • src/commands/emulate.ts
🚧 Files skipped from review as they are similar to previous changes (22)
  • CLAUDE.md
  • src/lib/command-aliases.ts
  • src/utils/register-subcommand.ts
  • src/utils/telemetry-client.spec.ts
  • src/utils/crash-reporter.spec.ts
  • src/commands/debug.ts
  • src/utils/telemetry-schema.spec.ts
  • src/utils/telemetry-client.ts
  • src/utils/telemetry-store-forward.spec.ts
  • src/lib/device-id.spec.ts
  • src/utils/output.spec.ts
  • src/utils/telemetry-store-forward.ts
  • src/utils/command-telemetry.spec.ts
  • src/lib/api-error-handler.spec.ts
  • src/lib/device-id.ts
  • src/lib/api-error-handler.ts
  • src/utils/telemetry-sanitize.spec.ts
  • src/utils/crash-reporter.ts
  • src/utils/analytics.ts
  • src/utils/telemetry-types.ts
  • src/utils/exit-codes.ts
  • src/utils/analytics.spec.ts

Comment thread src/commands/doctor.ts
Comment thread src/lib/run-with-core.ts Outdated
Comment thread src/utils/crash-reporter.ts
nicknisi added 16 commits May 27, 2026 16:30
…amps, and new event types

Add environment fingerprint fields (OS, Node version, CI detection, shell)
to session.start and session.end events. Add startTimestamp to step and
agent.tool events for span reconstruction. Define command and crash event
types with stub emission methods. Add discriminated union Zod schema
validation tests mirroring the API schema.
… persistence

Wire up yargs middleware that emits a provisional command event before each
handler runs, then replaces it with actual duration/success on completion.
This covers the ~25 process.exit() call sites without modifying them.

- Command telemetry middleware with canonical name resolution and flag extraction
- Crash reporter with sanitized stack traces (sync handlers, no async)
- Store-forward: persist unsent events to temp file on exit, recover on next run
- Fix flush() to retain events until HTTP success (was clearing before fetch)
- Auto-wrap handlers in registerSubcommand() (single change point)
- Shared COMMAND_ALIASES map for telemetry and help-json
- analytics.initForNonInstaller() sets gatewayUrl + JWT from stored creds
- Enable debug output for non-installer commands via env var
- Log telemetry event details (type, name, duration, attributes) on flush
- Register in debug command's env var catalog
- Wrap inline command handlers (seed, setup-org, doctor, etc.) with
  wrapCommandHandler so they report real duration/success
- Skip provisional telemetry event for install command (has own session telemetry)
- Add claim -> env.claim to canonical alias map
- Defer store-forward file deletion until after successful flush
Client errors (401, 403) are permanent failures that won't succeed on
retry. Only retain events for 5xx (transient server errors) and network
failures where store-forward retry is meaningful.
- flush() returns true (sent/dropped) or false (retryable) so callers
  can act on the result
- Use splice(0, count) instead of clearing all events, protecting
  events queued concurrently during the fetch
- wrapCommandHandler flushes in-process so events are sent immediately
  instead of always deferring to next invocation via store-forward
- Store-forward recovery deletes files after loading into memory
  (events are re-persisted by exit handler if flush fails)
- Skip provisional events for dashboard and $0 (installer entry points)
- Add 4xx drop test coverage
Add a section to CLAUDE.md explaining which commands auto-emit telemetry
(registerSubcommand) versus which need manual wrapCommandHandler wrapping
(inline top-level .command() calls). Add a pointer comment in bin.ts near
the workflow commands block.

Prevents new top-level commands from silently emitting duration=0 telemetry.
- Add workos.user_id to command and crash events (from stored credentials
  or unclaimed environment clientId) so dashboards can count unique users
- Add cli.version to command and crash events for release adoption tracking
- Support claim-token auth path on the telemetry client, so unclaimed
  environments' telemetry reaches the API (guard accepts this path too)
- Rename CrashEvent's installer.version to cli.version (crashes happen
  outside the installer too)
- initForNonInstaller() now wires up user_id and claim-token auth
Closes security-audit finding #1 on PR #122 (telemetry message
sanitization). `error.message` was flowing into 4 capture sites
unsanitized, leaking homedir paths (and rarely, credentials) to the
WorkOS gateway.

- Add `sanitizeMessage()` in crash-reporter.ts: homedir strip + Bearer/
  sk_*/JWT redaction + 1KB truncation.
- Factor secret redaction into shared `redactSecrets()` used by both
  `sanitizeMessage` and `sanitizeStack` (Node echoes `.message` into the
  leading `Error.stack` line, so message-only sanitization was
  insufficient).
- Add private `extractErrorFields()` chokepoint on `Analytics`; route
  all 4 capture sites through it (`captureException`, `stepCompleted`,
  `commandExecuted`, `captureUnhandledCrash`). `replaceLastCommandEvent`
  inherits sanitization via its delegation to `commandExecuted`.
- `captureUnhandledCrash` now uses `sanitizeStack` instead of inline
  truncation, providing defense-in-depth for callers that bypass the
  crash-reporter wrapper.
- Add regression guard test (`telemetry-sanitize.spec.ts`): poisons
  every capture method with homedir + Bearer + sk_live_ + JWT, asserts
  no marker reaches the serialized queue.

Reviewed: ideation:reviewer cycle 1 PASS (0 critical, 0 high).
Introduces two additive telemetry signals requested by the signals spec:

- device.id — persistent per-install UUID stored at ~/.workos/device-id.
  File IO failures fall through to a one-shot session UUID; never throws.
- auth.mode — 4-state enum ('jwt' | 'claim_token' | 'api_key' | 'none')
  derived during initForNonInstaller with JWT > claim_token > api_key
  priority. Installer flow sets it in run-with-core after credentials
  resolve.

Both fields are injected via getEnvFingerprint so they appear on every
event that already carries env context (session.start, session.end,
command, crash). No change to downstream gateway — existing deliveries
accept the new attrs as optional.

Implements Phase 1 of docs/ideation/telemetry-signals/spec-phase-1.md.
Gateway-side type mirroring is tracked separately in the gateway repo.
…and events

Replace the boolean `command.success` as the primary outcome dimension with
a structured `termination.reason` enum (success | cancelled | auth_required
| validation_error | api_error | crash) plus a `error.code` string.

- `exitWithError`, `exitWithCode`, `exitWithAuthRequired` now patch the
  provisional command event via `analytics.recordTermination` before
  calling `process.exit`. Previously the queued provisional event (success
  true, duration 0) was persisted as-is by store-forward, producing
  misleading dashboard data.
- `exitWithError` now honors the string error code for exit mapping:
  `auth_required` -> 4, `cancelled` -> 2, `http_*`/`not_found`/`unknown_error`
  -> 1 (api_error), everything else -> 1 (validation_error). Previously it
  always exited 1.
- New `TelemetryClient.patchLastEventOfType` helper mutates a queued event
  in place. Unlike `replaceLastEventOfType`, it preserves the event so
  multiple callers can update fields incrementally.
- `wrapCommandHandler` records `reason: 'success'` on clean completion and
  `reason: 'crash'` with `error.name` on uncaught throw.
- `command.success` remains on CommandEvent for backward-compat.
- `api.status`/`api.code`/`api.resource` fields added to CommandEvent and
  plumbed through `recordTermination` for Phase 3 consumption.

Review passed (2 medium + 1 low non-blocking; low fixed in same commit for
crash-path test symmetry).
…mmand events

Wires `createApiErrorHandler` through to the provisional command event via
an optional `apiContext` param on `exitWithError`, populating `api.status`,
`api.code`, and `api.resource` attributes on API-failure command events.

Also treats `apiContext` presence as ground truth for `api_error` termination
reason — WorkOS error codes like `rate_limited` or `validation_error` would
otherwise fall through `resolveErrorCode` to `validation_error`, hiding
legitimate API failures from api_error dashboards.

The `CommandEvent` type already had the `api.*` attribute slots (added as
forward-compat in Phase 2); only the wiring and tests are new.
- Fix duration=0 on exit-path events: middleware now sets
  analytics.commandStartTime so recordTermination can compute real
  duration when exitWithError/exitWithCode bypass the wrapper
- Remove not_found/unknown_error from ERROR_CODE_MAP to prevent local
  config misses (env.ts) from being misclassified as API errors
- Resolve auth.mode before sessionStart in installer flow so the
  session.start event carries the correct credential source
- Make recordTermination fully idempotent: clears stale api.* and
  error.code fields when called without them
- Tighten device-id UUID validation to proper RFC 4122 v4 regex
- Only override termination reason to api_error when the code-based
  classification falls to the generic validation_error fallback,
  preserving more specific reasons like auth_required
Wrap the `api` command handler with `wrapCommandHandler()` so it
reports real duration and success/failure instead of the provisional
defaults (duration=0, success=true).

Replace raw `process.exit()` calls with `exitWithCode`/`exitWithError`
across 10 command files so `recordTermination` fires before exit. This
ensures `termination.reason` and `error.code` are populated on every
command event. Long-running server handlers (dev/emulate SIGINT) are
left as-is.

Fix help-json.ts alias drift by importing from the shared
`COMMAND_ALIASES` map instead of maintaining a private copy.

Add a coverage test that scans bin.ts for inline handlers missing
`wrapCommandHandler`, catching the class of regression that let the
`api` command slip through.
@nicknisi nicknisi force-pushed the nicknisi/telemetry branch from 0475ea8 to 8003788 Compare May 27, 2026 21:34
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6ce40944-6658-4efe-aa8c-e9db5c81063a

📥 Commits

Reviewing files that changed from the base of the PR and between 0475ea8 and 8003788.

📒 Files selected for processing (39)
  • CLAUDE.md
  • src/bin.ts
  • src/commands/api/index.spec.ts
  • src/commands/api/index.ts
  • src/commands/api/interactive.spec.ts
  • src/commands/api/interactive.ts
  • src/commands/debug.ts
  • src/commands/dev.ts
  • src/commands/doctor.ts
  • src/commands/emulate.ts
  • src/commands/env.ts
  • src/commands/install-skill.ts
  • src/commands/login.ts
  • src/commands/uninstall-skill.ts
  • src/lib/api-error-handler.spec.ts
  • src/lib/api-error-handler.ts
  • src/lib/command-aliases.ts
  • src/lib/device-id.spec.ts
  • src/lib/device-id.ts
  • src/lib/run-with-core.ts
  • src/utils/analytics.spec.ts
  • src/utils/analytics.ts
  • src/utils/command-telemetry.spec.ts
  • src/utils/command-telemetry.ts
  • src/utils/crash-reporter.spec.ts
  • src/utils/crash-reporter.ts
  • src/utils/exit-codes.spec.ts
  • src/utils/exit-codes.ts
  • src/utils/help-json.ts
  • src/utils/output.spec.ts
  • src/utils/output.ts
  • src/utils/register-subcommand.ts
  • src/utils/telemetry-client.spec.ts
  • src/utils/telemetry-client.ts
  • src/utils/telemetry-sanitize.spec.ts
  • src/utils/telemetry-schema.spec.ts
  • src/utils/telemetry-store-forward.spec.ts
  • src/utils/telemetry-store-forward.ts
  • src/utils/telemetry-types.ts
✅ Files skipped from review due to trivial changes (3)
  • src/commands/api/index.spec.ts
  • CLAUDE.md
  • src/commands/api/interactive.spec.ts
🚧 Files skipped from review as they are similar to previous changes (31)
  • src/commands/debug.ts
  • src/commands/doctor.ts
  • src/commands/emulate.ts
  • src/utils/telemetry-store-forward.spec.ts
  • src/commands/uninstall-skill.ts
  • src/commands/install-skill.ts
  • src/lib/command-aliases.ts
  • src/commands/api/index.ts
  • src/utils/output.ts
  • src/utils/help-json.ts
  • src/utils/exit-codes.spec.ts
  • src/commands/env.ts
  • src/commands/login.ts
  • src/utils/command-telemetry.ts
  • src/utils/telemetry-types.ts
  • src/utils/crash-reporter.spec.ts
  • src/commands/api/interactive.ts
  • src/commands/dev.ts
  • src/utils/exit-codes.ts
  • src/lib/run-with-core.ts
  • src/utils/telemetry-schema.spec.ts
  • src/utils/telemetry-store-forward.ts
  • src/utils/crash-reporter.ts
  • src/lib/api-error-handler.ts
  • src/utils/telemetry-client.spec.ts
  • src/lib/device-id.spec.ts
  • src/utils/telemetry-client.ts
  • src/lib/api-error-handler.spec.ts
  • src/lib/device-id.ts
  • src/utils/analytics.ts
  • src/bin.ts

Comment thread src/utils/command-telemetry.spec.ts Outdated
…ze env.shell

Make `commandExecuted` private on Analytics, exposing only
`queueProvisionalCommand` for the middleware provisional-event path.
Prevents accidental rogue command events that break the swap pattern.

Record only `basename(SHELL)` in env.shell to avoid leaking homedir
paths (e.g. ~/.local/bin/fish).

Wrap `migrations` command with `wrapCommandHandler` for telemetry.
Comment thread src/utils/analytics.ts Outdated
nicknisi added 12 commits May 27, 2026 18:17
Wraps the yargs chain in runCli(), captures the canonical command name via
middleware, emits a single command telemetry event from the lifecycle
try/catch/finally (success, CliExit, or crash), and flushes the telemetry
client before exiting. Removes per-handler wrapCommandHandler() wrappers and
the legacy commandTelemetryMiddleware. Also drops process.exit(0) calls in
handlers that no longer need to short-circuit since exit is now centralized.
Migrate 12 spec files from process.exit spies to CliExit throw
assertions, matching the exitWithCode/exitWithError refactor.
- .fail() now re-throws CliExit so handler exits preserve their
  context (reason, errorCode, apiContext) instead of being replaced
  with generic validation_error
- Remove unconditional process.exit() from finally block so
  long-running commands (dev, emulate) stay alive after wiring
  listeners; their signal handlers still call process.exit() directly
- doctor.ts catch block re-throws CliExit so exitWithCode(SUCCESS)
  doesn't get caught and re-classified as GENERAL_ERROR
exitWithError outside runCli() threw CliExit into the crash reporter,
recording intentional validation errors as crash events. Now caught
and exited cleanly before the crash reporter sees it.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/utils/command-telemetry.ts (1)

12-17: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Harden flag extraction to avoid invalid telemetry flags.

The current filter accepts -- (which becomes empty string after processing) and numeric short flags like -1, -2 (which become '1', '2') that pollute the telemetry command.flags attribute with invalid entries.

🛡️ Suggested fix from previous review
 export function extractUserFlags(rawArgs: string[]): string[] {
   const passedFlags = rawArgs
-    .filter((arg) => arg.startsWith('--') || (arg.startsWith('-') && arg.length === 2))
-    .map((arg) => arg.replace(/^-+/, '').split('=')[0]);
+    .filter((arg) => {
+      if (arg === '--') return false;
+      if (/^--[A-Za-z][\w-]*(=.*)?$/.test(arg)) return true;
+      if (/^-[A-Za-z]$/.test(arg)) return true;
+      return false;
+    })
+    .map((arg) => arg.replace(/^-+/, '').split('=')[0])
+    .filter(Boolean);
   return [...new Set(passedFlags)];
 }

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a84b559e-a9da-49e6-a70b-1b60ba4b833d

📥 Commits

Reviewing files that changed from the base of the PR and between c7804a2 and 6a80d3f.

📒 Files selected for processing (27)
  • CLAUDE.md
  • src/bin.ts
  • src/commands/api/index.spec.ts
  • src/commands/api/interactive.spec.ts
  • src/commands/connection.spec.ts
  • src/commands/directory.spec.ts
  • src/commands/doctor.ts
  • src/commands/env.spec.ts
  • src/commands/install.spec.ts
  • src/commands/install.ts
  • src/commands/membership.spec.ts
  • src/commands/seed.spec.ts
  • src/commands/uninstall-skill.spec.ts
  • src/lib/api-error-handler.spec.ts
  • src/utils/analytics.spec.ts
  • src/utils/analytics.ts
  • src/utils/cli-exit.spec.ts
  • src/utils/cli-exit.ts
  • src/utils/command-telemetry.spec.ts
  • src/utils/command-telemetry.ts
  • src/utils/exit-codes.spec.ts
  • src/utils/exit-codes.ts
  • src/utils/output.spec.ts
  • src/utils/output.ts
  • src/utils/telemetry-client.spec.ts
  • src/utils/telemetry-client.ts
  • src/utils/telemetry-sanitize.spec.ts
💤 Files with no reviewable changes (2)
  • src/utils/telemetry-client.spec.ts
  • src/utils/telemetry-client.ts
✅ Files skipped from review due to trivial changes (1)
  • CLAUDE.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/utils/command-telemetry.spec.ts

Comment thread src/commands/install.ts
clack.log.info(`Debug logs: ${logPath}`);
}
process.exit(1);
throw new CliExit(1, { reason: 'crash' });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use handler exit helpers instead of constructing CliExit directly.

Line 57 throws CliExit manually; command handlers should terminate via exitWithCode() / exitWithError() to keep lifecycle behavior consistent.

Suggested patch
-import { CliExit } from '../utils/cli-exit.js';
+import { ExitCode, exitWithCode } from '../utils/exit-codes.js';
@@
-    throw new CliExit(1, { reason: 'crash' });
+    exitWithCode(ExitCode.GENERAL_ERROR);

As per coding guidelines src/commands/**/*.ts: Use exitWithError() or exitWithCode() from handlers — they throw CliExit which the lifecycle catches.

Comment thread src/commands/dev.ts
nicknisi added 2 commits May 28, 2026 11:45
…dation

exitWithError outside runCli() threw CliExit into the crash reporter,
recording intentional validation errors as crash events. Use
outputError + process.exit directly since this runs before the
centralized lifecycle exists.
@nicknisi nicknisi changed the title feat: CLI telemetry for all commands + crash reporting feat: CLI telemetry for all commands with centralized lifecycle May 28, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4dc74551-d649-4729-a366-dc7e375c36f4

📥 Commits

Reviewing files that changed from the base of the PR and between c7804a2 and 41379b1.

📒 Files selected for processing (28)
  • CLAUDE.md
  • README.md
  • src/bin.ts
  • src/commands/api/index.spec.ts
  • src/commands/api/interactive.spec.ts
  • src/commands/connection.spec.ts
  • src/commands/directory.spec.ts
  • src/commands/doctor.ts
  • src/commands/env.spec.ts
  • src/commands/install.spec.ts
  • src/commands/install.ts
  • src/commands/membership.spec.ts
  • src/commands/seed.spec.ts
  • src/commands/uninstall-skill.spec.ts
  • src/lib/api-error-handler.spec.ts
  • src/utils/analytics.spec.ts
  • src/utils/analytics.ts
  • src/utils/cli-exit.spec.ts
  • src/utils/cli-exit.ts
  • src/utils/command-telemetry.spec.ts
  • src/utils/command-telemetry.ts
  • src/utils/exit-codes.spec.ts
  • src/utils/exit-codes.ts
  • src/utils/output.spec.ts
  • src/utils/output.ts
  • src/utils/telemetry-client.spec.ts
  • src/utils/telemetry-client.ts
  • src/utils/telemetry-sanitize.spec.ts
💤 Files with no reviewable changes (2)
  • src/utils/telemetry-client.spec.ts
  • src/utils/telemetry-client.ts
✅ Files skipped from review due to trivial changes (1)
  • CLAUDE.md
🚧 Files skipped from review as they are similar to previous changes (22)
  • src/commands/seed.spec.ts
  • src/utils/cli-exit.spec.ts
  • src/commands/directory.spec.ts
  • src/commands/doctor.ts
  • src/utils/telemetry-sanitize.spec.ts
  • src/commands/membership.spec.ts
  • src/commands/api/interactive.spec.ts
  • src/utils/command-telemetry.spec.ts
  • src/commands/uninstall-skill.spec.ts
  • src/utils/command-telemetry.ts
  • src/commands/install.spec.ts
  • src/utils/cli-exit.ts
  • src/commands/install.ts
  • src/utils/output.ts
  • src/utils/exit-codes.spec.ts
  • src/lib/api-error-handler.spec.ts
  • src/commands/connection.spec.ts
  • src/utils/analytics.spec.ts
  • src/utils/output.spec.ts
  • src/utils/analytics.ts
  • src/commands/api/index.spec.ts
  • src/commands/env.spec.ts

Comment thread README.md Outdated
Comment thread src/utils/exit-codes.ts
…API caps

- crash-reporter: cap sanitized stack (marker included) at 4096 so the API's
  per-attribute Zod limit can't silently drop oversized crash events; also
  collapse Windows node_modules/dist/src paths; treat a CliExit that reaches
  the global handlers as an intentional exit rather than a crash (fixes false
  crash telemetry from dev's fire-and-forget child `error` listener)
- exit-codes: map `no_api_key` to `auth_required`, not `validation_error`
- analytics: sanitize error message before the WORKOS_DEBUG log line
- doctor: emit structured `{ error: { code, message } }` in JSON mode
- command-telemetry: harden extractUserFlags (ignore `--` and negative values)
- telemetry-client: persist store-forward file with 0700/0600 modes
- README: scope command-event wording to telemetry-enabled commands
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/utils/analytics.spec.ts (1)

518-548: ⚡ Quick win

Improve type safety in mock call assertions.

The any type annotation in the find() callbacks bypasses TypeScript's type checking. Consider defining a type for the queued event structure or using a more specific type.

♻️ Proposed improvement

Define a helper type at the top of the test file:

+type QueuedEvent = {
+  type: string;
+  attributes: Record<string, unknown>;
+  [key: string]: unknown;
+};

Then update the find callbacks:

-        const event = mockQueueEvent.mock.calls.find((c: any) => c[0].type === 'command')[0];
+        const event = mockQueueEvent.mock.calls.find((c: [QueuedEvent]) => c[0].type === 'command')?.[0];

Apply the same pattern to lines 528, 539, and 548.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a3ddcfb0-a669-4a6b-a9d4-f54c043cb0bc

📥 Commits

Reviewing files that changed from the base of the PR and between c7804a2 and 849f632.

📒 Files selected for processing (29)
  • CLAUDE.md
  • README.md
  • src/bin.ts
  • src/commands/api/index.spec.ts
  • src/commands/api/interactive.spec.ts
  • src/commands/connection.spec.ts
  • src/commands/directory.spec.ts
  • src/commands/doctor.ts
  • src/commands/env.spec.ts
  • src/commands/install.spec.ts
  • src/commands/install.ts
  • src/commands/membership.spec.ts
  • src/commands/seed.spec.ts
  • src/commands/uninstall-skill.spec.ts
  • src/lib/api-error-handler.spec.ts
  • src/utils/analytics.spec.ts
  • src/utils/analytics.ts
  • src/utils/cli-exit.spec.ts
  • src/utils/cli-exit.ts
  • src/utils/command-telemetry.spec.ts
  • src/utils/command-telemetry.ts
  • src/utils/crash-reporter.ts
  • src/utils/exit-codes.spec.ts
  • src/utils/exit-codes.ts
  • src/utils/output.spec.ts
  • src/utils/output.ts
  • src/utils/telemetry-client.spec.ts
  • src/utils/telemetry-client.ts
  • src/utils/telemetry-sanitize.spec.ts
✅ Files skipped from review due to trivial changes (1)
  • README.md
🚧 Files skipped from review as they are similar to previous changes (26)
  • src/utils/cli-exit.spec.ts
  • CLAUDE.md
  • src/utils/output.ts
  • src/commands/doctor.ts
  • src/commands/env.spec.ts
  • src/commands/api/interactive.spec.ts
  • src/utils/exit-codes.ts
  • src/commands/connection.spec.ts
  • src/utils/command-telemetry.spec.ts
  • src/utils/cli-exit.ts
  • src/commands/seed.spec.ts
  • src/commands/install.ts
  • src/commands/directory.spec.ts
  • src/utils/telemetry-client.spec.ts
  • src/utils/crash-reporter.ts
  • src/commands/install.spec.ts
  • src/utils/output.spec.ts
  • src/lib/api-error-handler.spec.ts
  • src/utils/exit-codes.spec.ts
  • src/utils/telemetry-sanitize.spec.ts
  • src/commands/uninstall-skill.spec.ts
  • src/commands/membership.spec.ts
  • src/utils/command-telemetry.ts
  • src/commands/api/index.spec.ts
  • src/utils/analytics.ts
  • src/utils/telemetry-client.ts

Telemetry now posts to a dedicated /cli/telemetry endpoint (WORKOS_TELEMETRY_URL,
default https://api.workos.com/cli) rather than sharing the LLM gateway route.
WORKOS_LLM_GATEWAY_URL stays scoped to doctor/install LLM proxy traffic.

The client sends x-workos-api-key when auth mode is API key and no JWT or
claim-token transport is available, so API-key-only cohorts (CI, headless)
finally deliver telemetry. Auth precedence: JWT > claim token > API key.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

Record the detected framework on session.end (read from the final state
machine snapshot) so the API can break install metrics down by framework
and compute success rate per integration. Absent when a session aborts
before detection runs.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

Comment on lines +26 to +68
try {
if (!existsSync(PENDING_DIR)) return;
const files = readdirSync(PENDING_DIR).filter((f) => f.startsWith('pending-') && f.endsWith('.json'));

const recoveredFiles: string[] = [];
for (const file of files) {
const filePath = join(PENDING_DIR, file);
try {
const raw = readFileSync(filePath, 'utf-8');
const events = JSON.parse(raw);
if (Array.isArray(events) && events.length > 0) {
telemetryClient.queueEvents(events);
recoveredFiles.push(filePath);
} else {
// Empty file — delete immediately
try {
unlinkSync(filePath);
} catch {
/* ignore */
}
}
} catch {
// Corrupted file — delete and move on
try {
unlinkSync(filePath);
} catch {
/* ignore */
}
}
}

// Delete source files — events are now in memory regardless of flush outcome.
// If flush succeeds: events sent, done.
// If flush fails: events stay in memory, exit handler re-persists to new PID file.
for (const filePath of recoveredFiles) {
try {
unlinkSync(filePath);
} catch {
/* ignore */
}
}

// Flush all recovered events in one batch
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Pending events bounce indefinitely when telemetry is disabled

When a user sets WORKOS_TELEMETRY=false after a crash has written a pending file, the loop never resolves: recoverPendingEvents reads the file, calls telemetryClient.queueEvents(events) (no WORKOS_TELEMETRY_ENABLED guard here), then calls flush() which returns false because initForNonInstaller never set gatewayUrl. The source files are deleted before the flush attempt, so the events are orphaned in memory. On process exit, persistToFile sees this.events.length > 0 and writes them to a new PID file. Every subsequent invocation with telemetry disabled repeats this cycle.

The simplest fix is to bail out early when telemetry is disabled — either by checking WORKOS_TELEMETRY_ENABLED at the top of recoverPendingEvents, or by guarding the call site in bin.ts with the same constant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant