feat(storage): diff-layer state storage with bounded pruning#444
Draft
MegaRedHand wants to merge 5 commits into
Draft
feat(storage): diff-layer state storage with bounded pruning#444MegaRedHand wants to merge 5 commits into
MegaRedHand wants to merge 5 commits into
Conversation
2cb4ca3 to
a342fa7
Compare
6600cb2 to
fb1cf37
Compare
a342fa7 to
34d6345
Compare
MegaRedHand
added a commit
that referenced
this pull request
Jun 23, 2026
… forever (#453) ## Summary Relaxes block pruning: instead of deleting old blocks wholesale, keep block headers and bodies **forever** and prune only the signatures of old finalized blocks. This preserves the full block history (for debugging, re-org safety, and state reconstruction) while still reclaiming the heavy signature data (~3KB+ per block). ## Change - Replaces `prune_old_blocks` (deleted headers, bodies, and signatures beyond a fixed `BLOCKS_TO_KEEP` window) with `prune_old_block_signatures(finalized_slot, tip_slot)`. - Policy, with `cutoff = tip_slot - SIGNATURE_PRUNING_RANGE`: - **healthy finality** (`cutoff <= finalized_slot`): delete signatures for `slot < cutoff` (entirely within finalized history); - **deep non-finality** (non-finalized range exceeds the window): prune nothing, so non-finalized signatures are never touched. - `BlockHeaders` and `BlockBodies` are kept forever; all non-finalized signatures are always retained. - `get_signed_block` returns `None` when a signature is absent, now including a pruned finalized block (deep historical signed-block serving via BlocksByRoot is lost; peers use checkpoint sync). - `prune_old_data` derives the tip slot from the head header and runs signature pruning alongside state pruning. ## Key layout (slot-ordered pruning) - `BlockSignatures` is now keyed by **`slot||root`** (big-endian slot), reusing the shared `encode_slot_root_key` codec also used by `LiveChain`. - Pruning iterates in slot order and **stops at the first entry past the cutoff** (`take_while`), and no longer does a per-row `BlockHeaders` lookup to recover the slot. Every read site already has the slot: `get_signed_block` loads the header first. - `InMemoryBackend::prefix_iterator` now returns keys in sorted order to match the RocksDB backend, which these slot-ordered range scans rely on. ## Migration Changes the `BlockSignatures` key format: existing root-keyed entries are not read after upgrade (old finalized blocks read as pruned). Fine for fresh / checkpoint-synced nodes; no backfill. ## Tests - `prune_signatures_keeps_recent_window_when_finality_healthy`, `prune_signatures_noop_when_non_finalized_range_exceeds_window`, `prune_signatures_noop_when_tip_within_window`. - Storage lib suite green (42 tests); clippy `-D warnings` clean. ## Context Split out of #444 (diff-layer state storage), which depends on this: reconstructing historical states needs block headers retained. #444 stacks on top of this PR. Co-authored-by: Pablo Deymonnaz <pdeymon@fi.uba.ar>
608e97f to
d1a76aa
Compare
Store every non-genesis state as a parent-linked StateDiff (StateDiffs, never pruned) plus a full snapshot (States) written only at 1024-slot anchors (and the bootstrap). Neither table is ever pruned, so the full state history is preserved cheaply. - get_state returns an anchor snapshot directly, else reconstructs by walking base_root back to the nearest anchor and replaying the appended historical_block_hashes tails; config/validators come from the snapshot and latest_block_header from BlockHeaders. - Reconstructed and freshly imported states are memoized in an in-memory LRU (STATE_CACHE_CAPACITY), keyed by block root. States are immutable per root, so the cache never needs invalidation; it keeps recent reads (e.g. a child block's parent state) hot without reconstruction. - DiffBase captures the parent (root, hbh_len, slot) before it is consumed into the post-state. StateDiff/DiffBase live in the storage crate. - No snapshot eviction and no StateAnchors table: anchors are simply the snapshots in States, so the prune-states scan is gone entirely.
d1a76aa to
1a060f2
Compare
…ktree-feat+state-diff-layers
The migrated test built its DiffBase from the target post-state itself (DiffBase::from_state(a, &head_state)), so base.slot/base.hbh_len came from the target rather than the parent. That made the anchor-boundary check always false (no snapshot written, contradicting the comment) and left the diff self-referential, passing only via the cache memoization. Diff against the genesis anchor already present in the store instead, so the base correctly describes the parent state.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces aggressive state pruning with a diff-layer storage model so the full state history stays available cheaply. Builds on the block-signature pruning from #453 (now merged to
main): keeping block headers/bodies forever is what lets historical states be reconstructed.State storage
StateDiff(StateDiffstable, never pruned) plus a full-state snapshot (States) at anchors and hot states.StateDiffstoresslot, both checkpoints, and the justification fields in full, plus the appendedhistorical_block_hashestail.config/validatorscome from the nearest snapshot (they never change);latest_block_headeris read back fromBlockHeaders(the stored state caches the realstate_rootthere, so it matches byte-for-byte).get_statereturns a snapshot directly, else reconstructs by walkingbase_rootto the nearest ancestor snapshot and replaying appended tails forward.StateAnchors, permanent) bound the reconstruction walk.prune_old_states) keeps the lastSNAPSHOT_HOT_WINDOW = 300slots + anchors + finalized/justified/head; evicted snapshots leave their diff behind. TheStatestable is never written alone (always paired with aStateDiffsorStateAnchorsentry).DiffBasecaptures the parent(root, hbh_len, slot)before it is consumed into the post-state;StateDiff/DiffBaselive in the storage crate.Tests
StateDiffbuild/SSZ round-trip; state reconstruction (single + multi-diff after eviction); anchor recording; snapshot eviction (window/protected/anchors).-D warningsclean.Status / follow-ups
prune_old_dataruns on the node's finalization path (blockchain/src/lib.rs), so snapshot eviction + reconstruction are exercised after ~300 slots.