fix: improve WAL file sequence number management#27305
Open
Marukome0743 wants to merge 3 commits into
Open
Conversation
Enhance WAL file sequence number handling to consider both the last known sequence and the newest WAL file on disk, ensuring uniqueness even with corrupt files present.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #26970
Problem
After an unclean shutdown (e.g. power outage), empty (0-byte) WAL files can be left on disk.
On restart, PR #26556 correctly skips these corrupt files during replay with a warning.
However, the WAL file sequence number is not advanced past the skipped files.
When the first new write arrives and the WAL buffer is flushed, the system attempts to persist a WAL file using the same sequence number as one of the skipped corrupt files.
Because the corrupt file still exists on the object store,
put_optswithPutMode::Createreturns anAlreadyExistserror, which triggers an immediate shutdown.If the service is configured to auto-restart, this creates an infinite crash loop until the corrupt files are manually removed.
Reproduction sequence (from the issue logs):
00000001784.walthrough00000001787.walnew_without_replay()setswal_file_sequence_numbertolast_wal_sequence_number + 1(= 1784), based solely on the snapshot metadatareplay()skips all four corrupt files (WalFileTooSmall) but does not update the sequence number00000001784.wal→AlreadyExists→ shutdownFix
This change introduces two layers of defense, both in
influxdb3_wal/src/object_store.rs:1.
new_without_replay()— initialization-time guardThe initial
wal_file_sequence_numbernow takes the maximum of:last_wal_sequence_number + 1(from snapshot metadata, the existing behavior)newest_wal_file_on_disk + 1(derived from the sorted list of WAL file paths)This ensures the starting sequence number is always greater than any file that exists on the object store, regardless of whether those files are valid or corrupt.
A new helper function
newest_wal_file_num()(mirroring the existingoldest_wal_file_num()) is added to extract the highest sequence number from the sorted WAL file path list.2.
replay()— replay-time guardWhen a corrupt WAL file is skipped during replay, the code now parses the sequence number from the file path and advances
wal_file_sequence_numberif the skipped file's number is ≥ the current value.This handles edge cases where corrupt files are interleaved with valid files during replay.
Result
Using the issue's example (corrupt files 1784–1787):
AlreadyExists→ shutdownnew_without_replay()sets sequence number to 1788 at initialization;replay()would also advance it to 1788 as each corrupt file is skipped.First flush writes
00000001788.walwith no conflict.Tests
Three new tests are added in
influxdb3_wal/src/object_store/tests.rs:test_newest_wal_file_numUnit test for the new
newest_wal_file_num()helper function. Verifies:Nonefor an empty path listtest_new_without_replay_advances_past_disk_filesReproduces the core scenario from issue #26970.
Sets
last_wal_sequence_numberto 1783 while providingall_wal_file_pathscontaining files 1784–1787 (simulating corrupt files left on disk after an unclean shutdown).Asserts that
new_without_replay()initializeswal_file_sequence_numberto 1788, not 1784.test_replay_advances_sequence_number_past_corrupt_filesEnd-to-end test that writes actual empty (0-byte) files to an in-memory object store, then runs
replay()withfail_on_error=false.Verifies that after all four corrupt files are skipped,
wal_file_sequence_numberis correctly advanced to 9 (one past the highest corrupt file number 8).