harden VBA dir-stream parsing + bound decompression; pre-size ReadChain#12
Open
eilandert wants to merge 1 commit into
Open
harden VBA dir-stream parsing + bound decompression; pre-size ReadChain#12eilandert wants to merge 1 commit into
eilandert wants to merge 1 commit into
Conversation
Defensive hardening and two allocation cuts, found while fuzzing this parser inside a malware-scanning pipeline. No public API change; the existing test suite and both fuzz targets stay green. Hardening - ReadSector/ReadMiniSector/ReadFat/ReadMiniFat/GetStream: do the sector/offset arithmetic in uint64 and bound it before indexing, so a crafted sector or directory index can no longer overflow an int and index out of range. - Header: reject an unexpected MiniSectorShift, and reject a SectorCount over MAX_SECTORS (was only documented as a limit, never enforced). - Directory stream: reject a stream whose length is not a multiple of 128 (a partial trailing entry) instead of parsing a short final slot. Pass the real directory SID (slot index) to NewDirectory so the SID is correct even when earlier slots were unallocated. - DecompressStream: bound output at MAX_DECOMPRESSED (32 MiB). MS-OVBA copy tokens can repeatedly expand the 4096-byte window, so a small crafted stream could amplify to tens of GiB and OOM the process. Also fix an off-by-one in the copy-token bounds check (index == len was treated as in-range). - ExtractMacros: cap the VBA module count at MAX_MODULES (4096). - ParseVBAProject: replace the raw 'i += size' cursor advances with a bounds-checked skipBytes()/hasBytes() helper at every record, so a truncated dir stream returns an error instead of slicing past the end. Performance - _ReadChain: a first pass walks the FAT chain with ReadFat only (same cycle detection, no sector copy) to count sectors, then allocates the result once instead of growing from zero capacity. This was ~64% of allocations on the extract path in our profiles. - FindStreamByName: O(1) lookup via a name index built once in NewOLEFile, instead of a linear scan per call. - GetStream: bounded copy-on-read cache for small (<=4096 byte) streams that callers read repeatedly; at most 32 entries, returns copies so caller mutation cannot poison later reads.
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi! We incorporated your oleparse repo into https://github.com/eilandert/mailstrix (https://mailstrix.com and https://deb.myguard.nl/articles/yara-malware-scanning-mailstrix/)
Before we can use 3rd party we have to audit it and give back.
We fixed the following items:
Hardening
Performance