feat: check for spans in agent log group#1404
Conversation
Package TarballHow to installgh release download pr-1404-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.15.0.tgz |
|
Claude Security Review: no high-confidence findings. (run) |
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Nice feature — querying both log groups makes the tooling resilient as the span-emission story evolves. I have one main concern that shows up in three places: when a span exists in both aws/spans (via X-Ray Transaction Search) and the runtime log group (emitted directly by the agent), the new code concatenates the result sets without deduplication. Depending on the user's setup this can lead to duplicate session entries in the picker, double-counted spans sent to evaluators, and duplicate rows in trace output. Inline comments below.
| executeQueryGraceful(client, SPANS_LOG_GROUP, query, startTimeSec, endTimeSec), | ||
| executeQueryGraceful(client, runtimeLogGroupName, query, startTimeSec, endTimeSec), | ||
| ]); | ||
| const rows = [...spansRows, ...runtimeRows]; |
There was a problem hiding this comment.
When the same sessionId exists in both aws/spans and the runtime log group, the merged rows will produce two SessionInfo entries for that session — one with the count from aws/spans and one from the runtime log group. The TUI session picker (RunEvalScreen, RunBatchEvalFlow, RecommendationScreen) will then show duplicates.
A few ways to handle this:
- Merge by
sessionIdin JS — sumspanCount, take the earlierfirstSeen. - Run a single Insights query that lists both log groups (
StartQueryacceptslogGroupNames: [...]), so thestats ... by sessionIdaggregation happens server-side. - Dedupe by
sessionIdand just keep the row with the higher count.
Option 2 is probably the cleanest since it avoids the double-count of spans entirely.
There was a problem hiding this comment.
Can we please check this ? The TUI screen concern seems valid.
| executeQueryGraceful(client, SPANS_LOG_GROUP, spanQuery, startTimeSec, endTimeSec), | ||
| executeQueryGraceful(client, runtimeLogGroup, spanQuery, startTimeSec, endTimeSec), | ||
| ]); | ||
| const allSpanRows = [...sharedSpanRows, ...runtimeSpanRows]; |
There was a problem hiding this comment.
Same duplication concern as discoverSessions: if a span lands in both aws/spans and the runtime log group (transaction search + agent-emitted), allSpanRows will contain it twice, the same parsed doc will be pushed to sessionMap.get(sessionId) twice, and the duplicates will be sent to the evaluators. For TRACE/TOOL_CALL evaluators in particular this could meaningfully skew results.
Options:
- Issue a single
StartQuerywith both log groups inlogGroupNames(server-side dedup is still your responsibility, but at least there's only one result set to walk). - Dedupe
allSpanRowsbyspanId(ortraceId+spanId) before buildingsessionMap. - Prefer one source over the other (e.g., if any rows came back from
aws/spans, ignore the runtime log group rows for spans, and only use the runtime log group for the runtime-log lookup further down).
| } else { | ||
| return { success: false, error: result.error }; | ||
| } | ||
| } |
There was a problem hiding this comment.
allRows may contain the same span twice when it shows up in both log groups, which would result in duplicate CloudWatchSpanRecord entries in the trace returned to the web UI / CLI consumers (and would break parent/child tree rendering since the same spanId would appear twice).
Suggest deduping by spanId after concatenation, e.g.:
const seen = new Set<string>();
const spans: CloudWatchSpanRecord[] = allRows
.filter(row => row.traceId && row.spanId && !seen.has(row.spanId!) && (seen.add(row.spanId!), true))
.map(row => ({ ... }));or pass both log groups to a single StartQuery call.
| ]); | ||
|
|
||
| onProgress?.(`Found ${spanRecords.length} span records, ${logRecords.length} log record candidates`); | ||
| const allSpanRecords = [...sharedSpanRecords, ...runtimeSpanRecords]; |
There was a problem hiding this comment.
allSpanRecords is the union of records from both log groups. If a span exists in both (which is the whole reason we're querying both), it gets parsed and pushed to spans twice, which means the OTEL mapper on the recommendation Lambda will see duplicate spans for the same traceId/spanId. That can produce inflated trajectory counts and skew tool-description recommendations.
After parsing, please dedupe by something stable (e.g. traceId + spanId, or JSON.stringify(parsed) for log records that don't have spanIds) before pushing into spans.
There was a problem hiding this comment.
A span wont ever be duplicated, however we do have log-events and spans having the same sessionId in the agent-log-group and aws/spans respectively in current behavior.
|
re the automated comments: The same span will never live in both log groups |
Hweinstock
left a comment
There was a problem hiding this comment.
makes sense to me! Worth getting someone who worked on evals/logs in CLI to look at it.
|
|
||
| try { | ||
| const createResult = await iamClient.send( | ||
| new CreateRoleCommand({ |
There was a problem hiding this comment.
totally out of scope, but any idea why we create a role imperatively here? Is this not supported in the CDK?
| ], | ||
| Resource: [ | ||
| `${arnPrefix(region)}:logs:*:${accountId}:log-group:/aws/bedrock-agentcore/evaluations/*`, | ||
| `${arnPrefix(region)}:logs:*:${accountId}:log-group:/aws/bedrock-agentcore/runtimes/*`, |
There was a problem hiding this comment.
since this is managed imperatively, is there a risk that this permission isn't added to existing deployed ab-tests? (not familiar with the logic here)
Wondering if this blocks customers who have a deployed ab-test and want to use the new log group.
| /** | ||
| * Execute a CloudWatch Logs Insights query, returning [] if the log group does not exist. | ||
| */ | ||
| export async function executeQueryGraceful( |
There was a problem hiding this comment.
nit: can we wrap the params in an object. I find that with 3+ parameters in typescript it can make things hard to read without named arguments.
Description
Add runtime log group as an additional span source alongside aws/spans for all span query sites.
For every place we check the aws/spans log group, also check for spans in the runtime log group and union the results. These changes are backwards compatible so command still work with spans in either location.
Changes:
recommendationcommand) — now makes 3 parallel calls instead of 2: aws/spans for span records, runtime log group for span records, runtime log group for log recordsI tested by running the following commands on agents writing spans to aws/spans and the agent log group:
Related Issue
Closes #
Documentation PR
Type of Change
Testing
How have you tested the change?
npm run test:unitandnpm run test:integnpm run typechecknpm run lintsrc/assets/, I rannpm run test:update-snapshotsand committed the updated snapshotsChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.