Skip to content

fix(arrow-avro): bound untrusted OCF block size and item counts#10237

Open
miniex wants to merge 4 commits into
apache:mainfrom
miniex:fix/avro-untrusted-block-bounds
Open

fix(arrow-avro): bound untrusted OCF block size and item counts#10237
miniex wants to merge 4 commits into
apache:mainfrom
miniex:fix/avro-untrusted-block-bounds

Conversation

@miniex

@miniex miniex commented Jun 29, 2026

Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

The Avro OCF reader trusts two length fields straight from the input, so a small crafted file can take down a process that parses untrusted Avro. A block size of i64::MAX reaches Vec::reserve before any payload is read and aborts the process on a huge allocation (#10234). A block count of i64::MAX spins the array/map item loop forever for a zero-byte item type like null, and i64::MIN overflows the negative-count negation (#10235).

What changes are included in this PR?

  • block.rs: reserve only what the current input buffer backs, and let the rest grow as data arrives.
  • record.rs: reject a block item count larger than the bytes left to decode, and take the negative-count magnitude with unsigned_abs.
  • cursor.rs: add AvroCursor::remaining(), used by that bound.

A count past the remaining bytes can only describe items that are not there. Items that read input each need at least one byte, so only the zero-byte case is rejected and valid blocks keep working.

Are these changes tested?

Yes. The new tests hang or abort before this change and pass after. In reader::block, an i64::MAX block size stays bounded instead of aborting, a negative size errors, and a well-formed block still round-trips. In reader::record, i64::MAX and i64::MIN block counts on an array<null> now error instead of spinning the item loop. fmt --check and clippy are clean.

Are there any user-facing changes?

Malformed Avro OCF input that used to abort or hang now returns a clean AvroError. There are no public API changes.


I'm Korean, so sorry if any wording reads a little awkward.

@github-actions github-actions Bot added arrow Changes to the arrow crate arrow-avro arrow-avro crate labels Jun 29, 2026
@Jefffrey

Jefffrey commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

run benchmarks avro_reader decoder

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4851718045-766-gfd9k 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing fix/avro-untrusted-block-bounds (5907f95) to 7616e10 (merge-base) diff
BENCH_NAME=avro_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench avro_reader
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4851718045-767-m6sf9 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing fix/avro-untrusted-block-bounds (5907f95) to 7616e10 (merge-base) diff
BENCH_NAME=decoder
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench decoder
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              fix_avro-untrusted-block-bounds        main
-----                                              -------------------------------        ----
array_creation/string_array_1000_chars             1.00     28.0±0.11µs        ? ?/sec    1.03     28.8±0.20µs        ? ?/sec
array_creation/string_array_100_chars              1.00      7.2±0.12µs        ? ?/sec    1.04      7.5±0.16µs        ? ?/sec
array_creation/string_array_10_chars               1.02      5.3±0.06µs        ? ?/sec    1.00      5.2±0.05µs        ? ?/sec
array_creation/string_view_1000_chars              1.00     29.6±0.30µs        ? ?/sec    1.00     29.5±0.79µs        ? ?/sec
array_creation/string_view_100_chars               1.00      8.6±0.01µs        ? ?/sec    1.00      8.6±0.09µs        ? ?/sec
array_creation/string_view_10_chars                1.00      5.7±0.05µs        ? ?/sec    1.00      5.7±0.01µs        ? ?/sec
avro_reader/string_array_1000_chars                1.00    250.6±2.17µs        ? ?/sec    1.02    255.4±0.73µs        ? ?/sec
avro_reader/string_array_100_chars                 1.01     61.8±0.48µs        ? ?/sec    1.00     60.9±0.28µs        ? ?/sec
avro_reader/string_array_10_chars                  1.02     42.6±0.36µs        ? ?/sec    1.00     41.9±0.17µs        ? ?/sec
avro_reader/string_view_1000_chars                 1.00    239.2±0.99µs        ? ?/sec    1.03    247.1±0.78µs        ? ?/sec
avro_reader/string_view_100_chars                  1.02     62.9±0.46µs        ? ?/sec    1.00     62.0±0.24µs        ? ?/sec
avro_reader/string_view_10_chars                   1.01     42.7±0.39µs        ? ?/sec    1.00     42.4±0.19µs        ? ?/sec
string_operations/string_array_value_1000_chars    1.00     92.3±0.04ns        ? ?/sec    1.00     92.0±0.02ns        ? ?/sec
string_operations/string_array_value_100_chars     1.00     91.8±0.01ns        ? ?/sec    1.06     97.5±2.27ns        ? ?/sec
string_operations/string_array_value_10_chars      1.00     91.9±0.01ns        ? ?/sec    1.01     92.9±1.19ns        ? ?/sec
string_operations/string_view_value_1000_chars     1.00    760.2±1.93ns        ? ?/sec    1.00    759.5±1.10ns        ? ?/sec
string_operations/string_view_value_100_chars      1.00    761.5±1.15ns        ? ?/sec    1.00    760.2±1.15ns        ? ?/sec
string_operations/string_view_value_10_chars       1.10    833.4±1.48ns        ? ?/sec    1.00    759.7±1.71ns        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 170.0s
Peak memory 15.6 MiB
Avg memory 7.1 MiB
CPU user 152.6s
CPU sys 12.1s
Peak spill 0 B

branch

Metric Value
Wall time 170.0s
Peak memory 17.8 MiB
Avg memory 10.0 MiB
CPU user 151.8s
CPU sys 12.4s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                       fix_avro-untrusted-block-bounds        main
-----                       -------------------------------        ----
Array/100                   1.01      3.9±0.03µs   462.6 MB/sec    1.00      3.9±0.02µs   467.5 MB/sec
Array/10000                 1.02    146.7±0.62µs  1487.0 MB/sec    1.00    144.3±0.55µs  1511.7 MB/sec
Array/1000000               1.01      2.5±0.01ms     9.9 GB/sec    1.00      2.5±0.01ms    10.0 GB/sec
Binary(Bytes)/100           1.00  1444.8±16.70ns  1782.2 MB/sec    1.00  1444.9±20.21ns  1782.1 MB/sec
Binary(Bytes)/10000         1.00     47.8±0.40µs     5.3 GB/sec    1.00     48.1±0.49µs     5.2 GB/sec
Binary(Bytes)/1000000       1.00    767.8±1.44µs    32.8 GB/sec    1.02    780.8±4.59µs    32.2 GB/sec
Boolean/100                 1.01    885.7±4.93ns  1184.4 MB/sec    1.00    880.7±5.11ns  1191.1 MB/sec
Boolean/10000               1.00     30.3±0.20µs     3.4 GB/sec    1.00     30.4±0.16µs     3.4 GB/sec
Boolean/1000000             1.00    481.1±1.87µs    21.3 GB/sec    1.01    484.0±1.30µs    21.2 GB/sec
Date32/100                  1.00    953.2±6.36ns  1136.6 MB/sec    1.00    954.9±7.80ns  1134.5 MB/sec
Date32/10000                1.00     32.7±0.19µs     3.5 GB/sec    1.00     32.8±0.19µs     3.5 GB/sec
Date32/1000000              1.00    538.4±1.41µs    22.5 GB/sec    1.01    544.1±0.85µs    22.2 GB/sec
Decimal128/100              1.00      2.2±0.06µs   514.5 MB/sec    1.00      2.2±0.07µs   515.9 MB/sec
Decimal128/10000            1.00     72.8±0.17µs  1702.3 MB/sec    1.02     74.1±2.61µs  1670.5 MB/sec
Decimal128/1000000          1.01   1167.7±3.69µs    11.1 GB/sec    1.00   1156.8±1.82µs    11.2 GB/sec
Enum(Dictionary)/100        1.00  1438.3±19.80ns   729.4 MB/sec    1.00  1443.1±22.57ns   726.9 MB/sec
Enum(Dictionary)/10000      1.00     37.7±0.44µs     2.7 GB/sec    1.01     38.1±0.34µs     2.7 GB/sec
Enum(Dictionary)/1000000    1.00    597.6±4.20µs    17.1 GB/sec    1.02    610.0±5.56µs    16.8 GB/sec
FixedSizeBinary/100         1.00   983.4±12.73ns     2.5 GB/sec    1.00    979.2±6.61ns     2.5 GB/sec
FixedSizeBinary/10000       1.00     29.8±0.27µs     8.1 GB/sec    1.00     29.9±0.40µs     8.1 GB/sec
FixedSizeBinary/1000000     1.00    480.5±0.72µs    50.4 GB/sec    1.02    490.5±3.55µs    49.4 GB/sec
Float32/100                 1.01   820.0±13.60ns  1628.2 MB/sec    1.00   812.9±11.01ns  1642.5 MB/sec
Float32/10000               1.00     26.5±0.23µs     4.9 GB/sec    1.00     26.5±0.31µs     4.9 GB/sec
Float32/1000000             1.00    419.5±1.94µs    31.1 GB/sec    1.02    426.9±3.05µs    30.5 GB/sec
Float64/100                 1.00   822.6±10.09ns     2.0 GB/sec    1.00   821.3±12.05ns     2.0 GB/sec
Float64/10000               1.00     26.8±0.35µs     6.3 GB/sec    1.01     27.0±0.42µs     6.2 GB/sec
Float64/1000000             1.00    426.7±1.45µs    39.3 GB/sec    1.02    436.4±3.66µs    38.4 GB/sec
Int32/100                   1.01   955.2±12.78ns  1134.2 MB/sec    1.00    950.4±8.97ns  1139.9 MB/sec
Int32/10000                 1.00     32.8±0.21µs     3.5 GB/sec    1.00     32.9±0.28µs     3.4 GB/sec
Int32/1000000               1.00    538.9±1.47µs    22.5 GB/sec    1.01    543.0±1.39µs    22.3 GB/sec
Int32_Id/100                1.00    957.8±5.31ns   633.3 MB/sec    1.00    956.6±5.65ns   634.0 MB/sec
Int32_Id/10000              1.00     32.3±0.24µs     2.1 GB/sec    1.00     32.4±0.21µs     2.1 GB/sec
Int32_Id/1000000            1.00    542.8±1.75µs    13.7 GB/sec    1.01    546.1±1.96µs    13.6 GB/sec
Int64/100                   1.00    950.2±9.26ns  1140.1 MB/sec    1.00   950.0±11.54ns  1140.4 MB/sec
Int64/10000                 1.00     32.5±0.19µs     3.5 GB/sec    1.00     32.6±0.16µs     3.5 GB/sec
Int64/1000000               1.00    542.6±1.75µs    22.3 GB/sec    1.01    546.2±1.38µs    22.2 GB/sec
Interval/100                1.00  1215.0±11.74ns  1726.8 MB/sec    1.00   1212.7±8.22ns  1730.0 MB/sec
Interval/10000              1.00     37.3±0.18µs     5.5 GB/sec    1.01     37.7±0.19µs     5.4 GB/sec
Interval/1000000            1.00    600.5±0.81µs    34.1 GB/sec    1.01    603.7±1.19µs    33.9 GB/sec
Map/100                     1.00      8.5±0.06µs   375.7 MB/sec    1.00      8.5±0.06µs   376.6 MB/sec
Map/10000                   1.01    336.5±2.04µs   989.5 MB/sec    1.00    332.4±1.17µs  1001.6 MB/sec
Map/1000000                 1.01      5.5±0.02ms     6.2 GB/sec    1.00      5.4±0.02ms     6.3 GB/sec
Mixed/100                   1.00      3.3±0.06µs   835.1 MB/sec    1.00      3.3±0.08µs   833.1 MB/sec
Mixed/10000                 1.00    107.4±0.22µs     2.8 GB/sec    1.00    107.7±0.37µs     2.8 GB/sec
Mixed/1000000               1.00   1785.9±3.87µs    18.7 GB/sec    1.01   1807.2±5.05µs    18.5 GB/sec
Nested(Struct)/100          1.01      2.5±0.04µs   770.0 MB/sec    1.00      2.5±0.02µs   781.2 MB/sec
Nested(Struct)/10000        1.00     82.5±0.42µs     2.4 GB/sec    1.00     82.5±0.23µs     2.4 GB/sec
Nested(Struct)/1000000      1.00   1336.1±4.43µs    15.3 GB/sec    1.03   1381.9±2.60µs    14.8 GB/sec
String/100                  1.00  1481.9±14.95ns  1305.1 MB/sec    1.00  1482.3±15.58ns  1304.7 MB/sec
String/10000                1.00     52.4±0.63µs     3.7 GB/sec    1.01     52.8±0.39µs     3.7 GB/sec
String/1000000              1.00    839.8±1.51µs    24.0 GB/sec    1.01    851.4±5.89µs    23.7 GB/sec
StringView/100              1.00      2.5±0.02µs   778.2 MB/sec    1.00      2.5±0.02µs   776.5 MB/sec
StringView/10000            1.00     85.9±0.76µs     2.3 GB/sec    1.01     86.8±0.37µs     2.2 GB/sec
StringView/1000000          1.00   1373.6±5.19µs    14.7 GB/sec    1.01   1392.6±9.60µs    14.5 GB/sec
TimeMicros/100              1.01   994.0±15.56ns  1237.7 MB/sec    1.00   988.4±10.10ns  1244.7 MB/sec
TimeMicros/10000            1.00     34.7±0.22µs     3.7 GB/sec    1.01     35.0±0.23µs     3.7 GB/sec
TimeMicros/1000000          1.00    560.9±1.45µs    24.7 GB/sec    1.01    567.2±2.57µs    24.4 GB/sec
TimeMillis/100              1.00    962.7±6.51ns  1186.7 MB/sec    1.00    967.2±9.80ns  1181.2 MB/sec
TimeMillis/10000            1.00     33.7±0.12µs     3.6 GB/sec    1.00     33.8±0.18µs     3.6 GB/sec
TimeMillis/1000000          1.00    551.8±1.30µs    23.6 GB/sec    1.01    558.0±2.84µs    23.3 GB/sec
TimestampMicros/100         1.00  1181.2±12.24ns  1453.3 MB/sec    1.01  1189.7±16.45ns  1442.9 MB/sec
TimestampMicros/10000       1.00     39.1±0.33µs     4.3 GB/sec    1.00     39.2±0.26µs     4.3 GB/sec
TimestampMicros/1000000     1.00    624.8±2.32µs    26.8 GB/sec    1.02    636.6±5.87µs    26.3 GB/sec
TimestampMillis/100         1.00  1113.1±15.64ns  1370.9 MB/sec    1.00  1108.8±13.10ns  1376.1 MB/sec
TimestampMillis/10000       1.00     36.1±0.31µs     4.1 GB/sec    1.00     36.2±0.32µs     4.1 GB/sec
TimestampMillis/1000000     1.00    579.9±1.68µs    25.7 GB/sec    1.01    586.9±7.48µs    25.4 GB/sec
UUID/100                    1.00      3.5±0.04µs  1269.3 MB/sec    1.00      3.5±0.04µs  1264.8 MB/sec
UUID/10000                  1.00    134.0±1.80µs     3.3 GB/sec    1.01    135.6±1.28µs     3.2 GB/sec
UUID/1000000                1.00      2.1±0.03ms    20.4 GB/sec    1.01      2.2±0.02ms    20.2 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1820.4s
Peak memory 894.7 MiB
Avg memory 395.3 MiB
CPU user 0.0s
CPU sys 0.0s
Peak spill 0 B

branch

Metric Value
Wall time 1825.4s
Peak memory 902.3 MiB
Avg memory 399.0 MiB
CPU user 1816.4s
CPU sys 2.7s
Peak spill 0 B

File an issue against this benchmark runner

Comment thread arrow-avro/src/reader/block.rs
Comment thread arrow-avro/src/reader/record.rs Outdated
miniex and others added 4 commits July 1, 2026 22:48
A crafted OCF block size of i64::MAX hit Vec::reserve before any payload
was read, aborting the process. Reserve only what the input backs.

Closes apache#10234
A block count of i64::MAX spun the item loop forever for zero-byte items
like null, and i64::MIN overflowed the negation. Reject counts above the
bytes remaining and negate with unsigned_abs.

Closes apache#10235
Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
The old check rejected any block with more items than bytes remaining,
wrongly refusing valid `array<null>` (zero-byte items). Cap the item total
at `i32::MAX` instead; oversized counts still error before the loop.
@miniex miniex force-pushed the fix/avro-untrusted-block-bounds branch from 605a62c to 83f3909 Compare July 1, 2026 13:48
@miniex miniex requested a review from Jefffrey July 1, 2026 13:49
Comment on lines +2317 to +2328
/// Decode `count` items, capping the running total at `i32::MAX` (the largest index
/// an Arrow list/map offset holds). Otherwise a crafted `i64::MAX` count of a zero-byte
/// item like `null` spins the loop forever (#10235); byte-consuming items self-terminate
/// on cursor exhaustion, so valid blocks (including `array<null>`) are unaffected.
#[inline]
fn process_block_items(
buf: &mut AvroCursor,
count: usize,
total: usize,
on_item: &mut impl FnMut(&mut AvroCursor) -> Result<(), AvroError>,
) -> Result<usize, AvroError> {
let new_total = total.checked_add(count).filter(|&t| t <= i32::MAX as usize);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure about hard limiting to i32::MAX either considering we have LargeLists which have i64 offsets; i'll refer to my previous question, in wondering how other avro implementations deal with this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate arrow-avro arrow-avro crate

Projects

None yet

3 participants