Skip to content

fix: improve WAL file sequence number management#27305

Open
Marukome0743 wants to merge 3 commits into
influxdata:mainfrom
Marukome0743:Marukome0743-patch-1
Open

fix: improve WAL file sequence number management#27305
Marukome0743 wants to merge 3 commits into
influxdata:mainfrom
Marukome0743:Marukome0743-patch-1

Conversation

@Marukome0743
Copy link
Copy Markdown

@Marukome0743 Marukome0743 commented Mar 26, 2026

Closes #26970

Problem

After an unclean shutdown (e.g. power outage), empty (0-byte) WAL files can be left on disk.
On restart, PR #26556 correctly skips these corrupt files during replay with a warning.
However, the WAL file sequence number is not advanced past the skipped files.

When the first new write arrives and the WAL buffer is flushed, the system attempts to persist a WAL file using the same sequence number as one of the skipped corrupt files.
Because the corrupt file still exists on the object store, put_opts with PutMode::Create returns an AlreadyExists error, which triggers an immediate shutdown.
If the service is configured to auto-restart, this creates an infinite crash loop until the corrupt files are manually removed.

Reproduction sequence (from the issue logs):

  1. Unclean shutdown leaves 0-byte WAL files: 00000001784.wal through 00000001787.wal
  2. On restart, new_without_replay() sets wal_file_sequence_number to last_wal_sequence_number + 1 (= 1784), based solely on the snapshot metadata
  3. replay() skips all four corrupt files (WalFileTooSmall) but does not update the sequence number
  4. First flush attempts to write 00000001784.walAlreadyExists → shutdown

Fix

This change introduces two layers of defense, both in influxdb3_wal/src/object_store.rs:

1. new_without_replay() — initialization-time guard

The initial wal_file_sequence_number now takes the maximum of:

  • last_wal_sequence_number + 1 (from snapshot metadata, the existing behavior)
  • newest_wal_file_on_disk + 1 (derived from the sorted list of WAL file paths)

This ensures the starting sequence number is always greater than any file that exists on the object store, regardless of whether those files are valid or corrupt.

A new helper function newest_wal_file_num() (mirroring the existing oldest_wal_file_num()) is added to extract the highest sequence number from the sorted WAL file path list.

2. replay() — replay-time guard

When a corrupt WAL file is skipped during replay, the code now parses the sequence number from the file path and advances wal_file_sequence_number if the skipped file's number is ≥ the current value.
This handles edge cases where corrupt files are interleaved with valid files during replay.

Result

Using the issue's example (corrupt files 1784–1787):

  • Before: sequence number stays at 1784 → AlreadyExists → shutdown
  • After: new_without_replay() sets sequence number to 1788 at initialization;
    replay() would also advance it to 1788 as each corrupt file is skipped.
    First flush writes 00000001788.wal with no conflict.

Tests

Three new tests are added in influxdb3_wal/src/object_store/tests.rs:

test_newest_wal_file_num
Unit test for the new newest_wal_file_num() helper function. Verifies:

  • Returns None for an empty path list
  • Returns the correct sequence number for a single-element list
  • Returns the highest (last) sequence number from a sorted multi-element list

test_new_without_replay_advances_past_disk_files
Reproduces the core scenario from issue #26970.
Sets last_wal_sequence_number to 1783 while providing all_wal_file_paths containing files 1784–1787 (simulating corrupt files left on disk after an unclean shutdown).
Asserts that new_without_replay() initializes wal_file_sequence_number to 1788, not 1784.

test_replay_advances_sequence_number_past_corrupt_files
End-to-end test that writes actual empty (0-byte) files to an in-memory object store, then runs replay() with fail_on_error=false.
Verifies that after all four corrupt files are skipped, wal_file_sequence_number is correctly advanced to 9 (one past the highest corrupt file number 8).

  • I've read the contributing section of the project README.
  • Signed CLA (if not already signed).

Enhance WAL file sequence number handling to consider both the last known sequence and the newest WAL file on disk, ensuring uniqueness even with corrupt files present.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[v3] influxdb3 core fails to start with corrupt catalog and corrupt wal after unclean shutdown (power outage)

1 participant