CASSANDRA-21117: Fix unreliable metric "networking cache size" in nodetool info#4898
CASSANDRA-21117: Fix unreliable metric "networking cache size" in nodetool info#4898EvgeniiR wants to merge 1 commit into
Conversation
|
@EvgeniiR can you resolve the conflict? Just nuke that change in changes.txt for now |
There was a problem hiding this comment.
Pull request overview
Fixes how Cassandra reports the “Network Cache” size in nodetool info by switching from an allocated-capacity-style metric (Size) to an in-use metric (UsedSize), and corrects overflow memory accounting in BufferPool when putUnusedPortion() is invoked for non-pooled (overflow) buffers to prevent the overflow counter drifting negative over time.
Changes:
- Update
nodetool infoto report “Network Cache” usingUsedSizeinstead ofSize. - Fix
BufferPool.LocalPool.putUnusedPortion()accounting forchunk == null(non-pooled buffers) to avoid double-decrementing overflow usage. - Add unit tests covering overflow accounting balance and validating that
usedSizereturns to zero when pooled buffers are returned.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
test/unit/org/apache/cassandra/metrics/BufferPoolMetricsTest.java |
Adds regression/behavior tests for UsedSize and overflow accounting around putUnusedPortion() + put(). |
src/java/org/apache/cassandra/utils/memory/BufferPool.java |
Removes incorrect overflow accounting in LocalPool.putUnusedPortion() for non-pooled buffers and clarifies the comment. |
src/java/org/apache/cassandra/tools/nodetool/Info.java |
Switches the displayed “Network Cache” size metric from Size to UsedSize (and updates printed wording). |
CHANGES.txt |
Records the nodetool metric fix under 7.0 changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
6783122 to
f33de20
Compare
Done. |
f33de20 to
bb36949
Compare
|
@EvgeniiR where? You havent pushed or it is same as it was. |
nodetool info reported "Network Cache" size using BufferPool.sizeInBytes() which includes memoryAllocated — a monotonically non-decreasing counter that never drops below the maximum ever allocated. Switch to usedSizeInBytes() which tracks only currently checked-out buffers. Additionally fix BufferPool.LocalPool.putUnusedPortion for non-pooled (overflow) buffers: the chunk==null branch was decrementing overflowMemoryUsage by the unused portion, but the subsequent put() decremented by the full original capacity, causing overflowMemoryUsage to drift negative over time. patch by EvgeniiR; reviewed by TBD for CASSANDRA-21117
bb36949 to
34dee6d
Compare
My bad, sorry. Latest changed pushed now. Output value is updated, tool output kept the same format. |
|
cc @netudima |
|
Examples from local testing: Info output with fix appliedbin/nodetool info
ID : 6d194555-f6eb-41d0-c000-000000000001
Gossip active : true
Native Transport active: true
Load : 123.67 KiB
Uncompressed load : 174.96 KiB
Generation No : 1782235131
Uptime (seconds) : 426
Heap Memory (MB) : 7661.99 / 15904.00
Off Heap Memory (MB) : 0.00
Data Center : datacenter1
Rack : rack1
Exceptions : 0
Key Cache : entries 25, size 1.99 KiB, capacity 100 MiB, 1109 hits, 1131 requests, 0.981 recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Network Cache : size 8.09 KiB, overflow size: 0 bytes, capacity 128 MiB
Percent Repaired : 100.0%
Token : (invoke with -T/--tokens to see all 16 tokens)
Bootstrap state : COMPLETED
Bootstrap failed : false
Decommissioning : false
Decommission failed : false
Info output on trunkbin/nodetool info
ID : 6d194555-f6eb-41d0-c000-000000000001
Gossip active : true
Native Transport active: true
Load : 230.47 MiB
Uncompressed load : 230.55 MiB
Generation No : 1782235801
Uptime (seconds) : 405
Heap Memory (MB) : 5359.93 / 15904.00
Off Heap Memory (MB) : 1.33
Data Center : datacenter1
Rack : rack1
Exceptions : 0
Key Cache : entries 11, size 904 bytes, capacity 100 MiB, 149 hits, 160 requests, 0.931 recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Network Cache : size 8 MiB, overflow size: 0 bytes, capacity 128 MiB
Percent Repaired : 100.0%
Token : (invoke with -T/--tokens to see all 16 tokens)
Bootstrap state : COMPLETED
Bootstrap failed : false
Decommissioning : false
Decommission failed : false
|
ef50a83 to
34dee6d
Compare
nodetool info reported "Network Cache" size using BufferPool.sizeInBytes() which includes memoryAllocated - a monotonically non-decreasing counter that never drops below the pool's high-water mark. Switch to usedSizeInBytes() which tracks only currently checked-out buffers.
Additionally fix BufferPool.LocalPool.putUnusedPortion for non-pooled buffers: the chunk==null branch was decrementing overflowMemoryUsage by the unused portion, but the subsequent put() decremented by the full original capacity, causing overflowMemoryUsage to drift negative over time.