Skip to content

Tally 32-bit Overflow Fix#3960

Open
jtramm wants to merge 11 commits into
openmc-dev:developfrom
jtramm:tally_overflow_fix
Open

Tally 32-bit Overflow Fix#3960
jtramm wants to merge 11 commits into
openmc-dev:developfrom
jtramm:tally_overflow_fix

Conversation

@jtramm
Copy link
Copy Markdown
Contributor

@jtramm jtramm commented Jun 4, 2026

Fix 32-bit integer overflow in tally filter-bin indexing

Problem

A tally with more than 2³¹ (~2.15 billion) filter bins aborts during initialization:

terminate called after throwing an instance of 'std::length_error'
  what():  cannot create std::vector larger than max_size()

(Tally::init_resultstensor::Tensor<double> constructor → std::vector(__n = 18446744068490538100).)

This is reachable from ordinary random-ray FW-CADIS runs. The weight-window flux tally
carries one bin per mesh element per energy group, so a 97,860,906-element mesh at 70
groups produces 6,850,263,420 filter bins. The same model at 4 groups (391M bins) runs
fine; the crash appears only once the bin count crosses 2³¹.

I hit this bug from a real world use case with the JET model, where I was creating 70 group weight windows on a (513, 638, 299) resolution mesh. To be clear: the error stems from the normal OpenMC tallying machinery on a mesh of that size with that many energy bins, the error is not coming from the random ray machinery. As such, the error could be hit from normal Monte Carlo usage as well if a large mesh + significant number of energy filter bins are used.

Root cause

Tally::n_filter_bins_ and strides_ are int32_t, and Tally::set_strides()
accumulates the product of the per-filter bin counts in a plain int. At 6.85e9 bins
this overflows and wraps to -1,739,671,172. Tally::init_results() then forms
static_cast<size_t>(n_filter_bins_) * n_scores * 3; the negative value sign-extends to
a huge size_t (× 3 = 18446744068490538100), which exceeds vector::max_size() and
throws.

The same 32-bit width is used at every step that later consumes the flat filter index,
so widening only the allocation would still mis-score above 2³¹ bins:

  • FilterBinIter::index_ — the accumulated flat filter index;
  • TallyTask::filter_idx — the random-ray per-element stored index;
  • the local filter_index accumulators and the filter_index parameters of
    score_general_ce_nonanalog / score_general_ce_analog / score_general_mg.

Separately, random ray's SourceRegionContainer::index() computes sr * negroups_ + g
in 64-bit (since sr is int64_t) but returned int, truncating the flux-array offset
past ~30.7M source regions at 70 groups.

The bin count overflows one layer lower too: StructuredMesh::n_bins() and
n_surface_bins() compute their counts in int (the latter is 4 * n_dim * n_bins, i.e.
~12x the element count in 3D), so a single very large mesh produces a negative count that
reaches init_results() as the same length_error. A surface-current tally hits this
first, near ~179M 3D cells.

Fix

Widen the bin count and the flat result index to int64_t along the whole chain:

  • Sizing/allocation: Tally::n_filter_bins_, strides_, their accessors, and the
    set_strides() accumulator.
  • Scoring index: FilterBinIter::index_; TallyTask::filter_idx (member +
    constructor); the CE/MG filter_index accumulators and score_general_* parameters;
    the random-ray flux-tally volume-normalization loop counter.
  • Random-ray flux index: SourceRegionContainer::index() return type.

tensor::Tensor already computes its size and offsets in size_t, so it is unchanged.
The result index is now 64-bit end-to-end on both the random-ray and the CE/MG scoring
paths.

Mesh bin-count guard. The widening above does not lift the limit on a single mesh:
Filter::n_bins_ and the per-event FilterMatch::bins_ remain int, so one mesh or filter
past 2³¹ bins is still unsupported. StructuredMesh::n_bins() / n_surface_bins() now
compute their counts in int64_t and raise a clear fatal_error (naming the mesh and
count) when the value exceeds int, instead of silently overflowing into the same
length_error. This makes the 32-bit boundary safe and explicit rather than lifting it.
(Unstructured MOAB/LibMesh counts are left as-is.)

Why a guard instead of widening everything?

We widened the indexing chain, so the obvious question is why not also widen
Filter::n_bins_ and FilterMatch::bins_ and support a single mesh or filter past 2³¹ bins
outright, instead of leaving a guarded limit.

Not because that case is unrealistic — a multi-billion-element structured mesh is perfectly
reasonable (e.g. ITER-scale shielding), the mesh itself is cheap to store, and the tally
memory (tens of GB) sits comfortably on a compute node. The reason is that lifting the limit
is a substantially larger, separable change than the two remaining type flips it looks like:

  • It is the entire mesh bin-index type, not two members. A mesh only ever emits a

    2³¹ index if the mesh layer itself computes one in 64-bit: get_bin(),
    get_bin_from_indices(), get_indices(), bins_crossed() / surface_bins_crossed(),
    and the internal for (int ... < n_bins()) loops — across RegularMesh,
    RectilinearMesh, CylindricalMesh, and SphericalMesh. Only then do Filter::n_bins_
    and the per-event FilterMatch::bins_ (vector<int>) matter.

  • It reaches into I/O and the public API. The statepoint HDF5 layout, the C API
    (openmc_mesh_*), and the Python bindings all assume 32-bit bin indices, so it carries
    format and compatibility surface, not just internal types.
  • It is on the hot path and should be measured. FilterMatch::bins_ is written and read
    on every scored event; widening it is a (likely small, but unmeasured) cost on every
    simulation, so it warrants a benchmark rather than an assumption.
  • It can only be validated at scale. Proving a multi-billion-bin tally scores correctly
    end-to-end needs a big-memory integration test, not the kilobyte unit test that covers the
    arithmetic in this PR.

So this PR fixes the crash and makes indexing correct up to the 2³¹-per-filter boundary, and
the guard ensures nothing past that boundary is left as a silent bug: an over-limit tally
now stops immediately with a clear message instead of corrupting indices or dying with a
cryptic length_error. Lifting the boundary outright — full 64-bit mesh and filter
indexing — can be done in the future if and when the need arrises. At that point, hopefully the performance CI can be used to reveal any performance regressions caused by that change.

Testing

The PR adds a Catch2 case to tests/cpp_unit_tests/test_tally.cpp (run with ctest;
the file is already in TEST_NAMES, so no CMakeLists.txt change is needed). It builds
a Tally with two energy filters of 50,000 bins each, whose bin counts multiply to
2.5e9 (above INT32_MAX), then calls Tally::set_strides() directly (it is public and
is what init_results() relies on) and asserts the count is the exact 64-bit product:

constexpr int64_t bins_per_filter = 50000; // two filters -> 2.5e9 > 2^31
...
REQUIRE(tally->n_filter_bins() == bins_per_filter * bins_per_filter);
REQUIRE(tally->n_filter_bins() > 2147483647);

This pins exactly the set_strides() / n_filter_bins_ arithmetic that overflowed,
runs in milliseconds on ~1 MB, and fails on the old int32_t code (the product wraps
negative). The large results_ allocation in init_results() is intentionally not
exercised; the defect is in the index arithmetic, not in the size of the results array.

Does the Fix Work?

The JET model has previously been running fine for weight window generation with random ray but I had been limiting my testing to 4 groups. I have been doing some parameter sensitivity studies and was trying to do 70 groups -- that's when I hit this issue. I can confirm that the fix does indeed work and I'm able to apply the 70 group energy mesh on that mesh now.

Checklist

  • I have performed a self-review of my own code
  • I have run clang-format (version 18) on any C++ source files (if applicable)
  • I have followed the style guidelines for Python source files (if applicable)
  • I have made corresponding changes to the documentation (if applicable)
  • I have added tests that prove my fix is effective or that my feature works (if applicable)

jtramm and others added 11 commits June 2, 2026 13:26
A combined mesh+energy tally can exceed 2^31 filter bins (e.g. a 97.8M-cell mesh times 70 energy groups = 6.85e9 bins). With int32_t storage the bin count overflowed to a negative value that init_results() then cast to size_t, requesting a ~1.8e19-element vector and throwing std::length_error before transport even started. Widen n_filter_bins_, the per-filter strides_, and their accessors to int64_t.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
set_strides() multiplied the per-filter bin counts into a plain int accumulator, so the running product overflowed before being stored into the now-int64_t n_filter_bins_. Promote the accumulator to int64_t so the product is formed in 64-bit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
index(sr, g) computes sr * negroups_ + g in 64-bit (sr is int64_t) but truncated the result back to int on return. With more than ~30.7M source regions at 70 groups the flattened element index exceeds 2^31, yielding a wrapped/negative offset and an out-of-bounds access into the correctly-sized flux arrays during the solve. Return int64_t.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
FilterBinIter accumulates the flat results_ index across the tally filters in index_. For tallies with more than 2^31 filter bins this overflowed. Widen to int64_t so the random ray tally-task setup, which copies filter_iter.index_ into each TallyTask, records correct indices.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three scoring helpers accumulated filter_index in a 32-bit int before indexing results_, which overflows for tallies exceeding 2^31 filter bins. Widen the locals to int64_t. Note: the continuous-energy and multigroup score_general_ce/score_general_mg paths still take filter_index as an int parameter; those are not exercised by the random ray solver and are left for a separate change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
TallyTask stores the flat tally results_ index for each source-region/energy-group element. With more than 2^31 filter bins the int member truncated the now-64-bit filter index taken from FilterBinIter::index_, sending random ray scores to the wrong (or out-of-bounds) results_ bins. Widen the member and the constructor parameter to int64_t.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The flux-tally volume-normalization loop in random_ray_tally iterated filter bins with an int counter compared against n_filter_bins(), which is now int64_t. For tallies with more than 2^31 bins the int counter cannot reach the end (it overflows and the comparison wraps), so the upper bins would never be volume-normalized. Use an int64_t loop variable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
score_general_ce_nonanalog, score_general_ce_analog, and score_general_mg took the flat results_ filter index as an int parameter. Their callers compute it from FilterBinIter::index_ (now int64_t), so the value was narrowed back to 32 bits at the call boundary and could index the wrong or out-of-bounds bin for a regular Monte Carlo tally exceeding 2^31 filter bins. This path is not used by the random ray solver; widen the parameter so large continuous-energy and multigroup MC tallies are correct too. Each function uses filter_index only for its final tally.results_(filter_index, ...) write, so widening the parameter is sufficient.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a Catch2 case to the existing tests/cpp_unit_tests/test_tally.cpp (already in TEST_NAMES, so no CMakeLists change). It builds a tally with two ~50,000-bin energy filters whose bin-count product (2.5e9) exceeds INT32_MAX, calls set_strides(), and asserts n_filter_bins() equals the exact 64-bit product and is positive. This pins the arithmetic fixed on this branch; it runs in milliseconds on ~1 MB and does not allocate the results tensor.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
StructuredMesh::n_bins() and n_surface_bins() formed their bin counts in 32-bit int and silently overflowed for very large meshes (n_bins near ~2.1B cells; n_surface_bins ~12x sooner, ~179M cells in 3D). The resulting negative count surfaced downstream as the same cryptic std::length_error this branch fixes, but originating from the mesh layer. Compute both counts in int64_t and raise a clear fatal_error naming the mesh and count when they exceed the 2^31 tally-indexing limit. This does not lift the limit (the per-event bin index and Filter::n_bins_ are still 32-bit) -- it converts a silent overflow into an actionable message. The unstructured (MOAB/LibMesh) counts are left as-is, since a >2^31-element unstructured mesh is not realistic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant