Skip to content

CASSANDRA-21117: Fix unreliable metric "networking cache size" in nodetool info#4898

Open
EvgeniiR wants to merge 1 commit into
apache:trunkfrom
EvgeniiR:CASSANDRA-21117-trunk
Open

CASSANDRA-21117: Fix unreliable metric "networking cache size" in nodetool info#4898
EvgeniiR wants to merge 1 commit into
apache:trunkfrom
EvgeniiR:CASSANDRA-21117-trunk

Conversation

@EvgeniiR

@EvgeniiR EvgeniiR commented Jun 23, 2026

Copy link
Copy Markdown

nodetool info reported "Network Cache" size using BufferPool.sizeInBytes() which includes memoryAllocated - a monotonically non-decreasing counter that never drops below the pool's high-water mark. Switch to usedSizeInBytes() which tracks only currently checked-out buffers.

Additionally fix BufferPool.LocalPool.putUnusedPortion for non-pooled buffers: the chunk==null branch was decrementing overflowMemoryUsage by the unused portion, but the subsequent put() decremented by the full original capacity, causing overflowMemoryUsage to drift negative over time.

@smiklosovic

Copy link
Copy Markdown
Contributor

@EvgeniiR can you resolve the conflict? Just nuke that change in changes.txt for now

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes how Cassandra reports the “Network Cache” size in nodetool info by switching from an allocated-capacity-style metric (Size) to an in-use metric (UsedSize), and corrects overflow memory accounting in BufferPool when putUnusedPortion() is invoked for non-pooled (overflow) buffers to prevent the overflow counter drifting negative over time.

Changes:

  • Update nodetool info to report “Network Cache” using UsedSize instead of Size.
  • Fix BufferPool.LocalPool.putUnusedPortion() accounting for chunk == null (non-pooled buffers) to avoid double-decrementing overflow usage.
  • Add unit tests covering overflow accounting balance and validating that usedSize returns to zero when pooled buffers are returned.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
test/unit/org/apache/cassandra/metrics/BufferPoolMetricsTest.java Adds regression/behavior tests for UsedSize and overflow accounting around putUnusedPortion() + put().
src/java/org/apache/cassandra/utils/memory/BufferPool.java Removes incorrect overflow accounting in LocalPool.putUnusedPortion() for non-pooled buffers and clarifies the comment.
src/java/org/apache/cassandra/tools/nodetool/Info.java Switches the displayed “Network Cache” size metric from Size to UsedSize (and updates printed wording).
CHANGES.txt Records the nodetool metric fix under 7.0 changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/java/org/apache/cassandra/tools/nodetool/Info.java
@EvgeniiR EvgeniiR force-pushed the CASSANDRA-21117-trunk branch from 6783122 to f33de20 Compare June 23, 2026 16:21
@EvgeniiR

EvgeniiR commented Jun 23, 2026

Copy link
Copy Markdown
Author

@EvgeniiR can you resolve the conflict? Just nuke that change in changes.txt for now

Done.
Also applied update suggested by Copilot.

@EvgeniiR EvgeniiR force-pushed the CASSANDRA-21117-trunk branch from f33de20 to bb36949 Compare June 23, 2026 16:39
@smiklosovic

Copy link
Copy Markdown
Contributor

@EvgeniiR where? You havent pushed or it is same as it was.

nodetool info reported "Network Cache" size using BufferPool.sizeInBytes()
which includes memoryAllocated — a monotonically non-decreasing counter that
never drops below the maximum ever allocated. Switch to usedSizeInBytes()
which tracks only currently checked-out buffers.

Additionally fix BufferPool.LocalPool.putUnusedPortion for non-pooled
(overflow) buffers: the chunk==null branch was decrementing overflowMemoryUsage
by the unused portion, but the subsequent put() decremented by the full original
capacity, causing overflowMemoryUsage to drift negative over time.

patch by EvgeniiR; reviewed by TBD for CASSANDRA-21117
@EvgeniiR EvgeniiR force-pushed the CASSANDRA-21117-trunk branch from bb36949 to 34dee6d Compare June 23, 2026 16:53
@EvgeniiR

Copy link
Copy Markdown
Author

@EvgeniiR where? You havent pushed or it is same as it was.

My bad, sorry. Latest changed pushed now. Output value is updated, tool output kept the same format.

@smiklosovic smiklosovic requested a review from netudima June 23, 2026 17:41
@smiklosovic

Copy link
Copy Markdown
Contributor

cc @netudima

@EvgeniiR

EvgeniiR commented Jun 23, 2026

Copy link
Copy Markdown
Author

Examples from local testing:

Info output with fix applied
bin/nodetool info
ID                     : 6d194555-f6eb-41d0-c000-000000000001
Gossip active          : true
Native Transport active: true
Load                   : 123.67 KiB
Uncompressed load      : 174.96 KiB
Generation No          : 1782235131
Uptime (seconds)       : 426
Heap Memory (MB)       : 7661.99 / 15904.00
Off Heap Memory (MB)   : 0.00
Data Center            : datacenter1
Rack                   : rack1
Exceptions             : 0
Key Cache              : entries 25, size 1.99 KiB, capacity 100 MiB, 1109 hits, 1131 requests, 0.981 recent hit rate, 14400 save period in seconds
Row Cache              : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache          : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Network Cache          : size 8.09 KiB, overflow size: 0 bytes, capacity 128 MiB
Percent Repaired       : 100.0%
Token                  : (invoke with -T/--tokens to see all 16 tokens)
Bootstrap state        : COMPLETED
Bootstrap failed       : false
Decommissioning        : false
Decommission failed    : false
  • Network Cache size value now increases and decreases accordingly
Info output on trunk
bin/nodetool info
ID                     : 6d194555-f6eb-41d0-c000-000000000001
Gossip active          : true
Native Transport active: true
Load                   : 230.47 MiB
Uncompressed load      : 230.55 MiB
Generation No          : 1782235801
Uptime (seconds)       : 405
Heap Memory (MB)       : 5359.93 / 15904.00
Off Heap Memory (MB)   : 1.33
Data Center            : datacenter1
Rack                   : rack1
Exceptions             : 0
Key Cache              : entries 11, size 904 bytes, capacity 100 MiB, 149 hits, 160 requests, 0.931 recent hit rate, 14400 save period in seconds
Row Cache              : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache          : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Network Cache          : size 8 MiB, overflow size: 0 bytes, capacity 128 MiB
Percent Repaired       : 100.0%
Token                  : (invoke with -T/--tokens to see all 16 tokens)
Bootstrap state        : COMPLETED
Bootstrap failed       : false
Decommissioning        : false
Decommission failed    : false
  • Network Cache size jumped to 8MiB on the first request and does not decrease afterwards

@EvgeniiR EvgeniiR force-pushed the CASSANDRA-21117-trunk branch from ef50a83 to 34dee6d Compare June 23, 2026 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants