Skip to content

scylla/3.11.x: fix ShardAwarenessTest.correctShardInTracingTest for Scylla 2025.1#931

Open
nikagra wants to merge 1 commit into
scylla-3.xfrom
fix/shard-awareness-test-2025.1
Open

scylla/3.11.x: fix ShardAwarenessTest.correctShardInTracingTest for Scylla 2025.1#931
nikagra wants to merge 1 commit into
scylla-3.xfrom
fix/shard-awareness-test-2025.1

Conversation

@nikagra

@nikagra nikagra commented Jun 23, 2026

Copy link
Copy Markdown

Problem

ShardAwarenessTest.correctShardInTracingTest fails on Scylla 2025.1 (both scylla/3.11.4.0 and scylla/3.11.5.15 matrix entries) with:

Expecting: <"shard 0/sl:default"> to start with: <"shard 1">

Two Scylla 2025.1 changes contribute to this failure:

  1. Tablets enabled by default for NTS keyspaces (primary, confirmed): Scylla 2025.1 enables tablets by default for NetworkTopologyStrategy keyspaces. With tablets, shard assignment is arbitrary — the driver's hash-based getShardId(token) prediction no longer matches the shard that actually handles the request. The test creates shardawaretest without explicitly disabling tablets, so the expected and actual shards diverge.

  2. Service-level suffix in thread names: Since Scylla 2025.1, thread names include a /sl: suffix (e.g. "shard 0/sl:default"). Note: startsWith("shard N") still matches correctly-sharded events — stripping the suffix is defensive hygiene rather than a direct fix.

Pre-existing bug (unmasked): assertThat(anyLocal) is an AssertJ no-op — it wraps the value but never calls .isTrue(). The liveness guard never actually fired even before 2025.1.

Fix

  • Add AND TABLETS = {'enabled': false} to CREATE KEYSPACE shardawaretest to restore deterministic hash-based shard assignment.
  • Narrow the per-event assertion to only the "querying locally" event (the event that executes on the data-owning shard), making the assertion robust to any future addition of coordinator-side events.
  • Strip the /sl:... suffix with replaceFirst("/sl:[^/]*$", "") before startsWith(shard) (defensive hygiene).
  • Promote the liveness guard from assertThat(anyLocal) (no-op) to assertTrue(anyLocal, ...) so the guard actually fires when no local event is found.

Testing

Verified via staging job java-driver-matrix-2025.1 build #2GREEN on Scylla 2025.1.15-0.20260624.b48b97fb10bd, all 4 driver versions (3.11.4.0, 3.11.5.15, 4.18.1.0, 4.19.0.9).

scylla/4.18.1.0 and scylla/4.19.0.9 pass the test today without changes; this PR targets only 3.11.x (via matrix patches in the companion scylla-java-driver-matrix PR #155).

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@nikagra, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 19 minutes and 36 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: afe7e3fa-3667-42aa-8ae5-02fea92bc155

📥 Commits

Reviewing files that changed from the base of the PR and between 1998141 and 9ea57f2.

📒 Files selected for processing (1)
  • driver-core/src/test/java/com/datastax/driver/core/ShardAwarenessTest.java
📝 Walkthrough

Walkthrough

The tracing shard-awareness test now normalizes trace thread names by removing the Scylla 2025.1+ service-level suffix before checking shard prefixes, while still requiring at least one local query event. The keyspace setup in the tracing test now disables tablets and is formatted across multiple lines.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly matches the main change: fixing ShardAwarenessTest.correctShardInTracingTest for Scylla 2025.1 on 3.11.x.
Description check ✅ Passed The description is directly related to the test fix and explains the tablets, shard, and tracing-name changes in the patch.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@driver-core/src/test/java/com/datastax/driver/core/ShardAwarenessTest.java`:
- Line 95: The test assertion in ShardAwarenessTest currently uses the
no-message assertTrue overload, so update the anyLocal check to include a clear
failure message. Locate the assertion in the shard awareness test and change it
to document that no "querying locally" event was observed, so failures are
easier to diagnose.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d34dc26c-9c6e-4949-8d75-e1d0e8854e1a

📥 Commits

Reviewing files that changed from the base of the PR and between 5f4f988 and 318e0a4.

📒 Files selected for processing (1)
  • driver-core/src/test/java/com/datastax/driver/core/ShardAwarenessTest.java

Comment thread driver-core/src/test/java/com/datastax/driver/core/ShardAwarenessTest.java Outdated
@nikagra nikagra requested a review from dkropachev June 25, 2026 21:19
@nikagra nikagra force-pushed the fix/shard-awareness-test-2025.1 branch from 318e0a4 to 1998141 Compare June 26, 2026 11:57
Two Scylla 2025.1 changes break correctShardInTracingTest:

1. Tablets enabled by default for NTS keyspaces make the driver's
   hash-based getShardId() prediction non-deterministic, so the
   hardcoded expected shards ("shard 0" / "shard 1") become
   unreliable.  Fix: ADD TABLETS = {'enabled': false} to the CREATE
   KEYSPACE statement to restore traditional token-based shard
   assignment.

2. A new coordinator-side trace event runs on shard 0 regardless of
   the data shard, causing the previous per-event assertion to fail.
   Fix: only assert the shard for events whose description contains
   "querying locally" (the data-local event).

Also:
- Thread names now include a /sl:<service_level> suffix (e.g.
  "shard 0/sl:default").  Strip it before the startsWith() comparison.
- assertThat(anyLocal) was a no-op BooleanAssert (never called
  .isTrue()).  Replace with assertTrue(anyLocal) so the liveness
  guard actually fires.
@nikagra nikagra force-pushed the fix/shard-awareness-test-2025.1 branch from 1998141 to 9ea57f2 Compare June 26, 2026 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant