Skip to content

harden VBA dir-stream parsing + bound decompression; pre-size ReadChain#12

Open
eilandert wants to merge 1 commit into
Velocidex:masterfrom
eilandert:hardening-and-presize-readchain
Open

harden VBA dir-stream parsing + bound decompression; pre-size ReadChain#12
eilandert wants to merge 1 commit into
Velocidex:masterfrom
eilandert:hardening-and-presize-readchain

Conversation

@eilandert

Copy link
Copy Markdown

Hi! We incorporated your oleparse repo into https://github.com/eilandert/mailstrix (https://mailstrix.com and https://deb.myguard.nl/articles/yara-malware-scanning-mailstrix/)

Before we can use 3rd party we have to audit it and give back.

We fixed the following items:

Hardening

  • ReadSector/ReadMiniSector/ReadFat/ReadMiniFat/GetStream: do the sector/offset arithmetic in uint64 and bound it before indexing, so a crafted sector or directory index can no longer overflow an int and index out of range.
  • Header: reject an unexpected MiniSectorShift, and reject a SectorCount over MAX_SECTORS (was only documented as a limit, never enforced).
  • Directory stream: reject a stream whose length is not a multiple of 128 (a partial trailing entry) instead of parsing a short final slot. Pass the real directory SID (slot index) to NewDirectory so the SID is correct even when earlier slots were unallocated.
  • DecompressStream: bound output at MAX_DECOMPRESSED (32 MiB). MS-OVBA copy tokens can repeatedly expand the 4096-byte window, so a small crafted stream could amplify to tens of GiB and OOM the process. Also fix an off-by-one in the copy-token bounds check (index == len was treated as in-range).
  • ExtractMacros: cap the VBA module count at MAX_MODULES (4096).
  • ParseVBAProject: replace the raw 'i += size' cursor advances with a bounds-checked skipBytes()/hasBytes() helper at every record, so a truncated dir stream returns an error instead of slicing past the end.

Performance

  • _ReadChain: a first pass walks the FAT chain with ReadFat only (same cycle detection, no sector copy) to count sectors, then allocates the result once instead of growing from zero capacity. This was ~64% of allocations on the extract path in our profiles.
  • FindStreamByName: O(1) lookup via a name index built once in NewOLEFile, instead of a linear scan per call.
  • GetStream: bounded copy-on-read cache for small (<=4096 byte) streams that callers read repeatedly; at most 32 entries, returns copies so caller mutation cannot poison later reads.

Defensive hardening and two allocation cuts, found while fuzzing this
parser inside a malware-scanning pipeline. No public API change; the
existing test suite and both fuzz targets stay green.

Hardening
- ReadSector/ReadMiniSector/ReadFat/ReadMiniFat/GetStream: do the
  sector/offset arithmetic in uint64 and bound it before indexing, so a
  crafted sector or directory index can no longer overflow an int and
  index out of range.
- Header: reject an unexpected MiniSectorShift, and reject a SectorCount
  over MAX_SECTORS (was only documented as a limit, never enforced).
- Directory stream: reject a stream whose length is not a multiple of
  128 (a partial trailing entry) instead of parsing a short final slot.
  Pass the real directory SID (slot index) to NewDirectory so the SID is
  correct even when earlier slots were unallocated.
- DecompressStream: bound output at MAX_DECOMPRESSED (32 MiB). MS-OVBA
  copy tokens can repeatedly expand the 4096-byte window, so a small
  crafted stream could amplify to tens of GiB and OOM the process. Also
  fix an off-by-one in the copy-token bounds check (index == len was
  treated as in-range).
- ExtractMacros: cap the VBA module count at MAX_MODULES (4096).
- ParseVBAProject: replace the raw 'i += size' cursor advances with a
  bounds-checked skipBytes()/hasBytes() helper at every record, so a
  truncated dir stream returns an error instead of slicing past the end.

Performance
- _ReadChain: a first pass walks the FAT chain with ReadFat only (same
  cycle detection, no sector copy) to count sectors, then allocates the
  result once instead of growing from zero capacity. This was ~64% of
  allocations on the extract path in our profiles.
- FindStreamByName: O(1) lookup via a name index built once in
  NewOLEFile, instead of a linear scan per call.
- GetStream: bounded copy-on-read cache for small (<=4096 byte) streams
  that callers read repeatedly; at most 32 entries, returns copies so
  caller mutation cannot poison later reads.
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants