fix(#2871): Files.load size cap + FileBlobStore unique tmp + BlobStor…#92
Merged
Merged
Conversation
…e.verify
ChatGPT audit (#2865) P1 — blob/file operational hardening. Three small,
independent surfaces, fixed together.
Files.load size cap
- New `maxBytes: Long = DEFAULT_MAX_BYTES` parameter on `Files.load`,
`loadOrNull`, `loadAll`, `loadAllOrSkip`. Default 20 MiB.
- Size check via `Files.size(path)` BEFORE bytes are read into memory —
a 4 GiB upload fails-fast without OOMing the JVM, instead of being
slurped into a byte array first.
- New `OversizedFileException(path, sizeBytes, maxBytes)` names all
three values in the message so a misconfigured callsite is
debuggable.
- `loadAllOrSkip` deliberately does NOT swallow `OversizedFileException`
— it's a fail-fast safety check, not a soft predicate. Document via
the KDoc + suggest pre-filtering or a smaller `maxBytes`.
FileBlobStore unique tmp filename
- Pre-#2871 the put path used `dir.resolve("$hash.tmp")`. Two threads
putting the SAME hash (rare but valid — same bytes computed
independently) would both write to the same tmp filename, race on
truncate/rename. The final atomic rename works for the WINNING
thread, but a loser could rename its partial file over the winner's
good copy.
- Now: `dir.resolve("$hash.${UUID.randomUUID()}.tmp")`. Per-attempt
unique tmp filename, no collision possible. Atomic rename still
works — target is keyed on `hash`, so the second rename is a
same-bytes overwrite.
BlobStore.verify
- New default-method `BlobStore.verify(ref): Boolean`. Re-reads via
`get(ref)` and rehashes; returns `true` when stored bytes still
match the recorded hash, `false` on absence / corruption /
truncation.
- Default impl on the interface so existing `InMemoryBlobStore` and
`FileBlobStore` both work without changes. Backends with cheaper
checksum strategies can override.
- Not on the hot path of `get` — opt-in by the caller (audit-time
integrity scan, snapshot resume sanity check).
Regression coverage in `FilesBlobStoreHardeningTest` (5 cases):
- `Files.load` oversize → `OversizedFileException` with all three
values named in the diagnostic.
- `Files.load` under-cap success path (256-byte PNG).
- `Files.DEFAULT_MAX_BYTES` pinned at 20 MiB (changing it requires
test update).
- `BlobStore.verify` true / false / missing-blob contract.
- `FileBlobStore.put` 16 concurrent threads writing the same hash —
all succeed, no `.tmp` leftovers in the directory, all refs share
the deterministic hash, final blob verifies.
docs/multimodal.md `Files` reference updated with the `maxBytes`
default and the new `OversizedFileException` / `verify` semantics.
Full ./gradlew test + detekt green. detekt baseline regenerated for
the new test file's package-naming + max-line-length entries (same
shape as the other agents_engine.* test files).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…e.verify
ChatGPT audit (#2865) P1 — blob/file operational hardening. Three small, independent surfaces, fixed together.
Files.load size cap
maxBytes: Long = DEFAULT_MAX_BYTESparameter onFiles.load,loadOrNull,loadAll,loadAllOrSkip. Default 20 MiB.Files.size(path)BEFORE bytes are read into memory — a 4 GiB upload fails-fast without OOMing the JVM, instead of being slurped into a byte array first.OversizedFileException(path, sizeBytes, maxBytes)names all three values in the message so a misconfigured callsite is debuggable.loadAllOrSkipdeliberately does NOT swallowOversizedFileException— it's a fail-fast safety check, not a soft predicate. Document via the KDoc + suggest pre-filtering or a smallermaxBytes.FileBlobStore unique tmp filename
dir.resolve("$hash.tmp"). Two threads putting the SAME hash (rare but valid — same bytes computed independently) would both write to the same tmp filename, race on truncate/rename. The final atomic rename works for the WINNING thread, but a loser could rename its partial file over the winner's good copy.dir.resolve("$hash.${UUID.randomUUID()}.tmp"). Per-attempt unique tmp filename, no collision possible. Atomic rename still works — target is keyed onhash, so the second rename is a same-bytes overwrite.BlobStore.verify
BlobStore.verify(ref): Boolean. Re-reads viaget(ref)and rehashes; returnstruewhen stored bytes still match the recorded hash,falseon absence / corruption / truncation.InMemoryBlobStoreandFileBlobStoreboth work without changes. Backends with cheaper checksum strategies can override.get— opt-in by the caller (audit-time integrity scan, snapshot resume sanity check).Regression coverage in
FilesBlobStoreHardeningTest(5 cases):Files.loadoversize →OversizedFileExceptionwith all three values named in the diagnostic.Files.loadunder-cap success path (256-byte PNG).Files.DEFAULT_MAX_BYTESpinned at 20 MiB (changing it requires test update).BlobStore.verifytrue / false / missing-blob contract.FileBlobStore.put16 concurrent threads writing the same hash — all succeed, no.tmpleftovers in the directory, all refs share the deterministic hash, final blob verifies.docs/multimodal.md
Filesreference updated with themaxBytesdefault and the newOversizedFileException/verifysemantics.Full ./gradlew test + detekt green. detekt baseline regenerated for the new test file's package-naming + max-line-length entries (same shape as the other agents_engine.* test files).