Skip to content

fix(telemetry): track preflight error for telemetry#1403

Merged
Hweinstock merged 2 commits into
aws:mainfrom
Hweinstock:fix/deploy-tui-telemetry
May 28, 2026
Merged

fix(telemetry): track preflight error for telemetry#1403
Hweinstock merged 2 commits into
aws:mainfrom
Hweinstock:fix/deploy-tui-telemetry

Conversation

@Hweinstock
Copy link
Copy Markdown
Contributor

@Hweinstock Hweinstock commented May 27, 2026

Description

Problem

The deploy TUI isn't emitting telemetry when the preflight steps fail. This is because the telemetry wrapper is only wrapping the post-preflight logic.

Solution

Ideally, we would wrap all logic in a telemetry wrapper, but the deploy flow leverages React hooks that are not linear. Specifically, when preflight steps finish it triggers a hook that invokes the main deployment. Because they are disjointed flows via events, we need another solution.

The solution proposed here is to store the error from the earlier failure, and bail fast and emit when we get into the core deploy flow. This allows us to emit the correct information, but loses the duration of the failure from the preflight. We could theoretically store this in another Ref and then emit directly, but this significantly increases complexity, since we now need a way to override duration on the metric and need to manage a timer in a react hook that may trigger multiple times.

I also feel fairly certain that we will revisit this deploy code for a significant refactor as part of future work since its current flow is difficult to understand and extend.

Testing

ran deploy w/o AWS credentials and saw the error pop up in telemetry.

> AWS_ACCESS_KEY_ID=FAKEKEY agentcore deploy

  AgentCore Deploy

  [error]   Validate project
            → AWS credentials are invalid.

  To fix this:
    1. Check your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
    2. Or Run: aws login

  Log: agentcore/.cli/logs/deploy/deploy-20260528-083049.log

  Esc back · Ctrl+C quit

[audit mode] Telemetry written to [...]/deploy-b6c80bbe-baf3-4bf1-a06b-488f439
ec08b.json

Then error in file is AWSCredentialsError

Related Issue

Closes #

Documentation PR

N/A

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Other (please describe):

Testing

How have you tested the change?

  • I ran npm run test:unit and npm run test:integ
  • I ran npm run typecheck
  • I ran npm run lint
  • If I modified src/assets/, I ran npm run test:update-snapshots and committed the updated snapshots

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.

@github-actions github-actions Bot added the size/m PR size: M label May 27, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label May 27, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label May 27, 2026
@agentcore-devx-automation
Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label May 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Package Tarball

aws-agentcore-0.15.0.tgz

How to install

gh release download pr-1403-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.15.0.tgz

@Hweinstock Hweinstock force-pushed the fix/deploy-tui-telemetry branch from b8e3200 to cbcc5ca Compare May 27, 2026 23:49
@github-actions github-actions Bot added size/m PR size: M and removed size/m PR size: M labels May 27, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label May 27, 2026
@agentcore-devx-automation
Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label May 27, 2026
Copy link
Copy Markdown

@agentcore-cli-automation agentcore-cli-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup — consolidating all the setPhase('error') + isRunningRef.current = false paths through failPreflight() is a clear improvement, and the new telemetry on preflight failure plugs a real gap.

One thing worth discussing before merging (inline comment on cancelTeardown).

Comment thread src/cli/tui/hooks/useCdkPreflight.ts Outdated
const cancelTeardown = useCallback(() => {
setPhase('error');
isRunningRef.current = false;
failPreflight(new Error('Teardown cancelled'));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cancelTeardown now routes through failPreflight(new Error('Teardown cancelled')), which means user-initiated cancellation at the teardown-confirm prompt will be emitted to telemetry as a deploy failure with error_name: 'UnknownError' and error_source: 'unknown'. Pre-PR this path was silent.

A couple of concerns:

  1. Signal pollution — every user who cancels a teardown will now show up in error metrics indistinguishably from real failures. Since 'UnknownError'/'unknown' are also where uncategorized real bugs land, this makes those buckets noisier.
  2. It's not really a failure — the user made a deliberate choice to abort.

A few options:

  • Skip telemetry emission entirely when the error is a user cancellation (e.g., have failPreflight take a flag, or have cancelTeardown reset phase without going through the error path).
  • Introduce a dedicated UserCancelledError (subclass of BaseError) and add 'UserCancelledError' to the ErrorName enum in src/cli/telemetry/schemas/common-shapes.ts, so cancellations are at least distinguishable in dashboards.
  • Add a separate "cancelled" exit_reason — bigger change, probably overkill for this PR.

Same consideration applies more broadly to the other failPreflight(new Error('...')) call sites (e.g., 'Some OAuth providers failed to set up', stack-status messages) — those are real failures, so they should be telemetered, but they'll all classify as UnknownError/unknown. Less urgent, but worth tracking if error-bucket signal matters.

Copy link
Copy Markdown
Contributor Author

@Hweinstock Hweinstock May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, this will be used else so we should model.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label May 27, 2026
@github-actions github-actions Bot added size/m PR size: M and removed size/m PR size: M labels May 28, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label May 28, 2026
@agentcore-devx-automation
Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label May 28, 2026
@Hweinstock Hweinstock marked this pull request as ready for review May 28, 2026 13:16
@Hweinstock Hweinstock requested a review from a team May 28, 2026 13:16
Copy link
Copy Markdown
Contributor

@notgitika notgitika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thank you!

@Hweinstock Hweinstock merged commit 13a0391 into aws:main May 28, 2026
31 checks passed
@Hweinstock Hweinstock deleted the fix/deploy-tui-telemetry branch May 28, 2026 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/m PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants