Skip to content

fix(flatkv): implement flatkv_only mode state-sync int testings#3545

Open
blindchaser wants to merge 4 commits into
mainfrom
yiren/flaktv-migrated-state-sync-testing
Open

fix(flatkv): implement flatkv_only mode state-sync int testings#3545
blindchaser wants to merge 4 commits into
mainfrom
yiren/flaktv-migrated-state-sync-testing

Conversation

@blindchaser
Copy link
Copy Markdown
Contributor

Summary

Fixes two distinct bugs that broke post-migration flatkv_only state-sync, where a node boots directly in the terminal v3 steady state (all SC writes route to FlatKV, memiavl is never allocated). Both bugs only surface on the WAL-replay / snapshot path, so they were invisible to live execution and to the existing memiavl-backed state-sync coverage. A new end-to-end 4-validator integration test (GIGA_FLATKV_ONLY=true) exercises the full kill → wipe → state-sync → verify loop, backed by deterministic Go regression tests.

  • sei-db/state_db/sc/flatkv/store_apply.go: classifyAndPrefix now normalizes non-delete changeset values through the new nonNilValue helper. A changeset pair is a deletion iff Delete == true; a zero-length value with Delete == false is a legitimate "set this key to an empty value" write. Protobuf cannot distinguish []byte{} from nil, so such a write round-trips through the WAL (catchup, read-only clone, snapshot export, state-sync restore) as Value == nil. Previously the downstream process*Changes helpers, which use the nil value == deletion convention, dropped the key on replay — diverging the per-DB LtHash, the evm_lattice store hash, and ultimately the consensus AppHash from the live chain. The helper guarantees empty-value writes survive replay; true deletes still arrive as nil via the Delete flag and are unaffected.
  • sei-db/state_db/sc/composite/store.go: CompositeCommitStore.Exporter no longer returns the bare FlatKV exporter in flatkv_only mode. It wraps it in the composite SnapshotExporter (NewExporter(nil, flatkvExporter)) so the keys.FlatKVStoreKey module header is emitted ahead of the nodes. The restore-side State Store importer keys off that header to route the snapshot through convertFlatKVNodes; the bare exporter omitted it.
  • sei-db/state_db/sc/composite/exporter.go: SnapshotExporter gains a flatkvHeaderPending flag, set when the stream starts directly in phaseFlatKV (cosmosExporter == nil). With no cosmos→flatkv transition to carry the module header, nextFromFlatKV now emits keys.FlatKVStoreKey once before the first node. This guarantees that a flatkv_only snapshot populates the State Store on restore, so EVM-RPC and historical queries return real data instead of nil even though SC/AppHash were already healthy.
  • docker/localnode/scripts/step4_config_override.sh: adds GIGA_FLATKV_ONLY, which writes sc-write-mode = "flatkv_only" and evm-ss-split = false to app.toml, booting the node in the v3 steady state. This mode is mutually exclusive with the GIGA_STORAGE dual-write override.
  • docker/docker-compose.yml, Makefile: thread GIGA_FLATKV_ONLY through the cluster env vars so make docker-cluster-start can boot a flatkv_only cluster.
  • .github/workflows/integration-test.yml: adds a FlatKV Only State Sync matrix entry (GIGA_FLATKV_ONLY=true) that deploys the EVM fixture and runs verify_flatkv_only_statesync.sh.
  • integration_test/contracts/verify_flatkv_only_statesync.sh: end-to-end harness. Asserts the cluster booted FlatKV-only (no memiavl on disk), deploys an EVM fixture, kills and wipes one validator, configures it for state-sync from the surviving validators, waits for the restored node to catch up, and verifies the recovered FlatKV digest matches the donors at a shared height plus EVM-RPC reads (balance, contract storage, code) return the expected values.

Test plan

  • sei-db/state_db/sc/flatkv/empty_value_replay_test.go
    • TestEmptyValueSurvivesWALReplay: writes a key with an empty ([]byte{}, non-delete) value, rebuilds the store from the WAL, and asserts the replayed root hash matches the live store — directly guarding the empty-value-vs-deletion regression.
  • sei-cosmos/storev2/rootmulti/flatkv_snapshot_test.go
    • TestFlatKVOnlySnapshotRestoreAppHashParity: takes a snapshot of a flatkv_only source store and restores into a fresh destination, asserting the restored LastBlockAppHash matches the donor (crash/restore AppHash parity).
    • TestFlatKVOnlySnapshotRestorePopulatesSS: populates acc/bank/EVM state in an SS-enabled flatkv_only source, snapshots, restores into a fresh SS-enabled destination, and reads those keys back through the State Store read path (Prove=false) at the snapshot height — fails without the module-header fix, passes with it.
  • sei-cosmos/storev2/rootmulti/flatkv_helpers_test.go: adds the flatkv_only config factory and shared fixtures/assertions used by the new tests.
  • End-to-end (integration_test/contracts/verify_flatkv_only_statesync.sh, wired into CI as FlatKV Only State Sync): validates the live kill → wipe → state-sync → verify loop on a 4-validator cluster, including post-restore FlatKV digest parity against donors and EVM-RPC read-back. Verified locally against a GIGA_FLATKV_ONLY=true docker cluster.

@cursor
Copy link
Copy Markdown

cursor Bot commented Jun 3, 2026

PR Summary

High Risk
Touches consensus-critical FlatKV apply/replay, snapshot export/import, and AppHash/SS restore paths; mitigated by targeted unit tests and a new 4-validator integration scenario.

Overview
Fixes post-migration flatkv_only state-sync and adds CI/docker coverage so a cluster can boot in terminal FlatKV-only mode and recover a wiped validator from peers.

Storage / snapshot path: Non-delete changesets with empty values are normalized via nonNilValue so protobuf/WAL replay no longer treats them as deletions and diverges LtHash/AppHash. The FlatKV snapshot exporter now emits the flatkv module header itself; the composite exporter only concatenates streams (so flatkv_only bare exports still register the module on restore and State Store is populated for RPC/historical queries). Go tests cover AppHash parity, SS query parity, and empty-value WAL replay.

Integration: New GIGA_FLATKV_ONLY env threads through Makefile, docker-compose, and localnode/RPC init scripts (sc-write-mode = flatkv_only, mutually exclusive with migrate/dual-write overrides). CI adds a FlatKV Only State Sync matrix job running verify_flatkv_only_statesync.sh (layout checks, kill/wipe one validator, state-sync, FlatKV digest match, EVM fixture reads).

Reviewed by Cursor Bugbot for commit c84cf4e. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJun 4, 2026, 9:37 PM

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a8ff4db. Configure here.

Comment thread docker/localnode/scripts/step4_config_override.sh
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a8ff4db93e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/integration-test.yml
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.54%. Comparing base (400a907) to head (c84cf4e).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3545      +/-   ##
==========================================
- Coverage   59.17%   58.54%   -0.64%     
==========================================
  Files        2219     2161      -58     
  Lines      183185   177213    -5972     
==========================================
- Hits       108395   103742    -4653     
+ Misses      65029    64134     -895     
+ Partials     9761     9337     -424     
Flag Coverage Δ
sei-chain-pr 65.15% <ø> (?)
sei-db 70.41% <ø> (ø)
sei-db-state-db ?
sei-db-state-db-pr 73.55% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-db/state_db/sc/composite/exporter.go 73.33% <100.00%> (ø)
sei-db/state_db/sc/composite/store.go 67.66% <ø> (ø)
sei-db/state_db/sc/flatkv/exporter.go 81.08% <100.00%> (+1.66%) ⬆️
sei-db/state_db/sc/flatkv/store_apply.go 91.90% <100.00%> (+0.15%) ⬆️

... and 98 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blindchaser blindchaser changed the title fix(flatkv): implment flatkv_only state-sync fix(flatkv): implement flatkv_only state-sync Jun 3, 2026
@blindchaser blindchaser changed the title fix(flatkv): implement flatkv_only state-sync fix(flatkv): implement flatkv_only mode state-sync int testings Jun 4, 2026
… SS import)

Two distinct bugs broke post-migration flatkv_only state-sync, surfaced by a
new end-to-end 4-validator state-sync integration test:

1. Empty-value WAL replay: a proto.KVPair{Value: []byte{}, Delete: false} is a
   legitimate "set empty value" write, but it round-trips through the WAL as
   Value=nil and ApplyChangeSets treated nil as a deletion, dropping the key on
   replay/restore. This diverged the FlatKV LtHash (evm_lattice) and broke
   AppHash verification during state sync. Fixed by normalizing non-delete
   values via nonNilValue in classifyAndPrefix.

2. State Store empty after flatkv_only restore: the flatkv_only snapshot stream
   omitted the keys.FlatKVStoreKey module header, so the restore-side SS
   importer never ran convertFlatKVNodes and the State Store came up empty
   (every EVM-RPC/historical query returned nil) even though SC/AppHash were
   healthy. Fixed by making the FlatKV exporter self-describing: KVExporter now
   emits its own keys.FlatKVStoreKey header ahead of its nodes, mirroring the
   memiavl MultiTreeExporter. The composite exporter no longer injects the
   header (dropped flatkvHeaderPending and the cosmos-transition injection), and
   CompositeCommitStore.Exporter returns the bare flatkv exporter in
   flatkv_only. The header is now correct whether the stream is consumed bare or
   appended after the cosmos modules.

Adds end-to-end coverage (verify_flatkv_only_statesync.sh + CI matrix entry,
docker/Makefile/app.toml plumbing for GIGA_FLATKV_ONLY) and deterministic Go
regression tests (TestEmptyValueSurvivesWALReplay,
TestFlatKVOnlySnapshotRestorePopulatesSS).

Co-authored-by: Cursor <cursoragent@cursor.com>
@blindchaser blindchaser force-pushed the yiren/flaktv-migrated-state-sync-testing branch from a8ff4db to 0e18f23 Compare June 4, 2026 17:22
@blindchaser blindchaser requested a review from cody-littley June 4, 2026 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant