Skip to content

feat(storage): full object checksum: implement rolling checksum and verification in reads resumption strategy#17262

Open
chandra-siri wants to merge 2 commits into
feat/stream-checksum-metadatafrom
feat/full-object-rolling-checksum
Open

feat(storage): full object checksum: implement rolling checksum and verification in reads resumption strategy#17262
chandra-siri wants to merge 2 commits into
feat/stream-checksum-metadatafrom
feat/full-object-rolling-checksum

Conversation

@chandra-siri
Copy link
Copy Markdown
Contributor

PR 2 of 3: Upgrade _DownloadState to support rolling checksums, and verify full-object crc32c at completion in _ReadResumptionStrategy.

@chandra-siri chandra-siri requested a review from a team as a code owner May 26, 2026 20:54
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for full-object rolling checksum verification during resumed reads, allowing verification once the stream completes. It also adds a flag to conditionally disable checksum validation. The review feedback suggests optimizing performance by bypassing rolling checksum updates when checksums are disabled, and recommends adding a unit test to verify that full-object checksum mismatches are ignored when checksum validation is disabled.

Comment on lines +146 to +147
if read_state.rolling_checksum is not None:
read_state.rolling_checksum.update(data)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If checksum_enabled is False, we should avoid updating the rolling checksum to prevent unnecessary CPU overhead, especially when downloading large objects where checksums have been explicitly disabled for performance.

Suggested change
if read_state.rolling_checksum is not None:
read_state.rolling_checksum.update(data)
if checksum_enabled and read_state.rolling_checksum is not None:
read_state.rolling_checksum.update(data)

Comment on lines +408 to +415
def test_update_state_checksum_mismatch_ignored_when_disabled(self):
"""Test that a CRC32C mismatch is ignored when enable_checksum is False."""
self._add_download(_READ_ID)
self.state["enable_checksum"] = False
response = self._create_response(b"data", _READ_ID, offset=0, crc=999999)

# Should NOT raise DataCorruption!
self.strategy.update_state_from_response(response, self.state)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To ensure comprehensive test coverage, we should also verify that the full-object checksum verification is bypassed when enable_checksum is set to False.

    def test_update_state_checksum_mismatch_ignored_when_disabled(self):
        """Test that a CRC32C mismatch is ignored when enable_checksum is False."""
        self._add_download(_READ_ID)
        self.state["enable_checksum"] = False
        response = self._create_response(b"data", _READ_ID, offset=0, crc=999999)

        # Should NOT raise DataCorruption!
        self.strategy.update_state_from_response(response, self.state)

    def test_update_state_full_object_checksum_ignored_when_disabled(self):
        """Test that full object checksum mismatch is ignored when enable_checksum is False."""
        self._add_download(_READ_ID, offset=0, length=9, is_full_object_read=True)
        self.state["enable_checksum"] = False
        self.state["full_obj_server_crc32c"] = 111111  # Wrong server checksum!

        resp1 = self._create_response(b"test", _READ_ID, offset=0)
        self.strategy.update_state_from_response(resp1, self.state)

        resp2 = self._create_response(b"data1", _READ_ID, offset=4, range_end=True)
        # Should NOT raise DataCorruption!
        self.strategy.update_state_from_response(resp2, self.state)

@chandra-siri chandra-siri changed the title feat(storage): implement rolling checksum and verification in reads resumption strategy feat(storage): full object checksum: implement rolling checksum and verification in reads resumption strategy May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant