harden VBA dir-stream parsing + bound decompression; pre-size ReadChain by eilandert · Pull Request #12 · Velocidex/oleparse

eilandert · 2026-06-28T23:21:43Z

Hi! We incorporated your oleparse repo into https://github.com/eilandert/mailstrix (https://mailstrix.com and https://deb.myguard.nl/articles/yara-malware-scanning-mailstrix/)

Before we can use 3rd party we have to audit it and give back.

We fixed the following items:

Hardening

ReadSector/ReadMiniSector/ReadFat/ReadMiniFat/GetStream: do the sector/offset arithmetic in uint64 and bound it before indexing, so a crafted sector or directory index can no longer overflow an int and index out of range.
Header: reject an unexpected MiniSectorShift, and reject a SectorCount over MAX_SECTORS (was only documented as a limit, never enforced).
Directory stream: reject a stream whose length is not a multiple of 128 (a partial trailing entry) instead of parsing a short final slot. Pass the real directory SID (slot index) to NewDirectory so the SID is correct even when earlier slots were unallocated.
DecompressStream: bound output at MAX_DECOMPRESSED (32 MiB). MS-OVBA copy tokens can repeatedly expand the 4096-byte window, so a small crafted stream could amplify to tens of GiB and OOM the process. Also fix an off-by-one in the copy-token bounds check (index == len was treated as in-range).
ExtractMacros: cap the VBA module count at MAX_MODULES (4096).
ParseVBAProject: replace the raw 'i += size' cursor advances with a bounds-checked skipBytes()/hasBytes() helper at every record, so a truncated dir stream returns an error instead of slicing past the end.

Performance

_ReadChain: a first pass walks the FAT chain with ReadFat only (same cycle detection, no sector copy) to count sectors, then allocates the result once instead of growing from zero capacity. This was ~64% of allocations on the extract path in our profiles.
FindStreamByName: O(1) lookup via a name index built once in NewOLEFile, instead of a linear scan per call.
GetStream: bounded copy-on-read cache for small (<=4096 byte) streams that callers read repeatedly; at most 32 entries, returns copies so caller mutation cannot poison later reads.

Defensive hardening and two allocation cuts, found while fuzzing this parser inside a malware-scanning pipeline. No public API change; the existing test suite and both fuzz targets stay green. Hardening - ReadSector/ReadMiniSector/ReadFat/ReadMiniFat/GetStream: do the sector/offset arithmetic in uint64 and bound it before indexing, so a crafted sector or directory index can no longer overflow an int and index out of range. - Header: reject an unexpected MiniSectorShift, and reject a SectorCount over MAX_SECTORS (was only documented as a limit, never enforced). - Directory stream: reject a stream whose length is not a multiple of 128 (a partial trailing entry) instead of parsing a short final slot. Pass the real directory SID (slot index) to NewDirectory so the SID is correct even when earlier slots were unallocated. - DecompressStream: bound output at MAX_DECOMPRESSED (32 MiB). MS-OVBA copy tokens can repeatedly expand the 4096-byte window, so a small crafted stream could amplify to tens of GiB and OOM the process. Also fix an off-by-one in the copy-token bounds check (index == len was treated as in-range). - ExtractMacros: cap the VBA module count at MAX_MODULES (4096). - ParseVBAProject: replace the raw 'i += size' cursor advances with a bounds-checked skipBytes()/hasBytes() helper at every record, so a truncated dir stream returns an error instead of slicing past the end. Performance - _ReadChain: a first pass walks the FAT chain with ReadFat only (same cycle detection, no sector copy) to count sectors, then allocates the result once instead of growing from zero capacity. This was ~64% of allocations on the extract path in our profiles. - FindStreamByName: O(1) lookup via a name index built once in NewOLEFile, instead of a linear scan per call. - GetStream: bounded copy-on-read cache for small (<=4096 byte) streams that callers read repeatedly; at most 32 entries, returns copies so caller mutation cannot poison later reads.

CLAassistant · 2026-06-28T23:21:52Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

harden VBA dir-stream parsing + bound decompression; pre-size ReadChain#12

harden VBA dir-stream parsing + bound decompression; pre-size ReadChain#12
eilandert wants to merge 1 commit into
Velocidex:masterfrom
eilandert:hardening-and-presize-readchain

eilandert commented Jun 28, 2026

Uh oh!

CLAassistant commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

eilandert commented Jun 28, 2026

Uh oh!

CLAassistant commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants