Skip to content

feat: check for spans in agent log group#1404

Merged
avi-alpert merged 1 commit into
aws:mainfrom
avi-alpert:aalpert/log-group-updates
May 28, 2026
Merged

feat: check for spans in agent log group#1404
avi-alpert merged 1 commit into
aws:mainfrom
avi-alpert:aalpert/log-group-updates

Conversation

@avi-alpert
Copy link
Copy Markdown
Contributor

@avi-alpert avi-alpert commented May 28, 2026

Description

Add runtime log group as an additional span source alongside aws/spans for all span query sites.

For every place we check the aws/spans log group, also check for spans in the runtime log group and union the results. These changes are backwards compatible so command still work with spans in either location.

Changes:

  • get-trace.ts — fetchSpans now queries both aws/spans and the runtime log group in parallel, concatenates results, swallows ResourceNotFoundException from either
  • span-collector.ts — fetchSessionSpans now queries both aws/spans and the runtime log group for OTEL spans in parallel, concatenates results; added executeQueryGraceful helper
  • run-eval.ts — discoverSessions now queries both aws/spans and the runtime log group in parallel, concatenates results
  • fetch-session-spans.ts (recommendation command) — now makes 3 parallel calls instead of 2: aws/spans for span records, runtime log group for span records, runtime log group for log records
  • ABTestDetailScreen.tsx — debug checks now query both aws/spans and the runtime log group for experiment spans, summing counts across both
  • post-deploy-ab-tests.ts — added runtime log group ARN wildcard to the AB test role's IAM policy so the online eval service can read spans there

I tested by running the following commands on agents writing spans to aws/spans and the agent log group:

  • agentcore traces list and traces get
  • agentcore run eval
  • agentcore run recommendation

Related Issue

Closes #

Documentation PR

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Other (please describe):

Testing

How have you tested the change?

  • I ran npm run test:unit and npm run test:integ
  • I ran npm run typecheck
  • I ran npm run lint
  • If I modified src/assets/, I ran npm run test:update-snapshots and committed the updated snapshots

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.

@avi-alpert avi-alpert requested a review from a team May 28, 2026 01:39
@github-actions github-actions Bot added the size/m PR size: M label May 28, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label May 28, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label May 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Package Tarball

aws-agentcore-0.15.0.tgz

How to install

gh release download pr-1404-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.15.0.tgz

@agentcore-devx-automation
Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label May 28, 2026
Copy link
Copy Markdown

@agentcore-cli-automation agentcore-cli-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice feature — querying both log groups makes the tooling resilient as the span-emission story evolves. I have one main concern that shows up in three places: when a span exists in both aws/spans (via X-Ray Transaction Search) and the runtime log group (emitted directly by the agent), the new code concatenates the result sets without deduplication. Depending on the user's setup this can lead to duplicate session entries in the picker, double-counted spans sent to evaluators, and duplicate rows in trace output. Inline comments below.

executeQueryGraceful(client, SPANS_LOG_GROUP, query, startTimeSec, endTimeSec),
executeQueryGraceful(client, runtimeLogGroupName, query, startTimeSec, endTimeSec),
]);
const rows = [...spansRows, ...runtimeRows];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the same sessionId exists in both aws/spans and the runtime log group, the merged rows will produce two SessionInfo entries for that session — one with the count from aws/spans and one from the runtime log group. The TUI session picker (RunEvalScreen, RunBatchEvalFlow, RecommendationScreen) will then show duplicates.

A few ways to handle this:

  1. Merge by sessionId in JS — sum spanCount, take the earlier firstSeen.
  2. Run a single Insights query that lists both log groups (StartQuery accepts logGroupNames: [...]), so the stats ... by sessionId aggregation happens server-side.
  3. Dedupe by sessionId and just keep the row with the higher count.

Option 2 is probably the cleanest since it avoids the double-count of spans entirely.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please check this ? The TUI screen concern seems valid.

executeQueryGraceful(client, SPANS_LOG_GROUP, spanQuery, startTimeSec, endTimeSec),
executeQueryGraceful(client, runtimeLogGroup, spanQuery, startTimeSec, endTimeSec),
]);
const allSpanRows = [...sharedSpanRows, ...runtimeSpanRows];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same duplication concern as discoverSessions: if a span lands in both aws/spans and the runtime log group (transaction search + agent-emitted), allSpanRows will contain it twice, the same parsed doc will be pushed to sessionMap.get(sessionId) twice, and the duplicates will be sent to the evaluators. For TRACE/TOOL_CALL evaluators in particular this could meaningfully skew results.

Options:

  • Issue a single StartQuery with both log groups in logGroupNames (server-side dedup is still your responsibility, but at least there's only one result set to walk).
  • Dedupe allSpanRows by spanId (or traceId+spanId) before building sessionMap.
  • Prefer one source over the other (e.g., if any rows came back from aws/spans, ignore the runtime log group rows for spans, and only use the runtime log group for the runtime-log lookup further down).

} else {
return { success: false, error: result.error };
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allRows may contain the same span twice when it shows up in both log groups, which would result in duplicate CloudWatchSpanRecord entries in the trace returned to the web UI / CLI consumers (and would break parent/child tree rendering since the same spanId would appear twice).

Suggest deduping by spanId after concatenation, e.g.:

const seen = new Set<string>();
const spans: CloudWatchSpanRecord[] = allRows
  .filter(row => row.traceId && row.spanId && !seen.has(row.spanId!) && (seen.add(row.spanId!), true))
  .map(row => ({ ... }));

or pass both log groups to a single StartQuery call.

]);

onProgress?.(`Found ${spanRecords.length} span records, ${logRecords.length} log record candidates`);
const allSpanRecords = [...sharedSpanRecords, ...runtimeSpanRecords];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allSpanRecords is the union of records from both log groups. If a span exists in both (which is the whole reason we're querying both), it gets parsed and pushed to spans twice, which means the OTEL mapper on the recommendation Lambda will see duplicate spans for the same traceId/spanId. That can produce inflated trajectory counts and skew tool-description recommendations.

After parsing, please dedupe by something stable (e.g. traceId + spanId, or JSON.stringify(parsed) for log records that don't have spanIds) before pushing into spans.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A span wont ever be duplicated, however we do have log-events and spans having the same sessionId in the agent-log-group and aws/spans respectively in current behavior.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label May 28, 2026
@avi-alpert
Copy link
Copy Markdown
Contributor Author

re the automated comments: The same span will never live in both log groups

Copy link
Copy Markdown
Contributor

@Hweinstock Hweinstock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me! Worth getting someone who worked on evals/logs in CLI to look at it.


try {
const createResult = await iamClient.send(
new CreateRoleCommand({
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally out of scope, but any idea why we create a role imperatively here? Is this not supported in the CDK?

],
Resource: [
`${arnPrefix(region)}:logs:*:${accountId}:log-group:/aws/bedrock-agentcore/evaluations/*`,
`${arnPrefix(region)}:logs:*:${accountId}:log-group:/aws/bedrock-agentcore/runtimes/*`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is managed imperatively, is there a risk that this permission isn't added to existing deployed ab-tests? (not familiar with the logic here)

Wondering if this blocks customers who have a deployed ab-test and want to use the new log group.

/**
* Execute a CloudWatch Logs Insights query, returning [] if the log group does not exist.
*/
export async function executeQueryGraceful(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we wrap the params in an object. I find that with 3+ parameters in typescript it can make things hard to read without named arguments.

@avi-alpert avi-alpert merged commit 2e94fdd into aws:main May 28, 2026
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/m PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants