Skip to content

resample: reject out-of-range integer nodata sentinels#2671

Merged
brendancol merged 5 commits into
mainfrom
issue-2660
May 30, 2026
Merged

resample: reject out-of-range integer nodata sentinels#2671
brendancol merged 5 commits into
mainfrom
issue-2660

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2660.

What changed

  • _resolve_nodata in xrspatial/resample.py now checks that an integer nodata sentinel round-trips exactly into agg.dtype before trusting the cast. An out-of-range value (e.g. 999 for uint8, which wraps to 231) raises a ValueError instead of silently masking the wrong cells.
  • Updated the resample() docstring Raises section to cover the new case.

Note on boolean inputs

The original finding also flagged boolean inputs collapsing nonzero sentinels to True. resample() rejects boolean input upstream in _validate_raster ("must have a real numeric dtype"), so that path cannot reach _resolve_nodata through the public API. I did not add bool handling to _resolve_nodata since it would be dead code for the only caller.

Backend coverage

The validation lives in the shared _resolve_nodata helper, which runs before backend dispatch, so it applies to all backends. Tests parametrize numpy and dask+numpy; the boundary-value masking test confirms the dask path keeps data chunked and masks correctly.

Test plan

  • New TestNodataOutOfRange class: uint8-above-max, uint8-negative, int8-above-max, sentinel-via-_FillValue, and an in-range boundary (uint8 255) that must still mask.
  • Full test_resample.py passes (197 tests).
  • flake8 clean on changed lines.

Dedupe duplicate module rows (last-write-wins by last_inspected) and
collapse multi-line notes to single physical lines. The notes had
embedded newlines, which the merge=union .gitattributes strategy splits
record-by-record, corrupting the file into a 156-column phantom row on
parallel-agent appends. One line per record keeps union merges safe.
An out-of-range integer nodata sentinel used to wrap silently on the
dtype cast in _resolve_nodata (e.g. 999 -> 231 for uint8), masking the
wrong cells. Validate that the sentinel round-trips exactly into the
integer dtype and raise a clear ValueError otherwise.

Boolean inputs are already rejected upstream by _validate_raster, so the
nonzero-collapse case in the finding cannot reach _resolve_nodata via the
public resample() API; no separate bool handling was added.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 29, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: resample: reject out-of-range integer nodata sentinels

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/resample.py:1236-1237: For a Python int sentinel larger than the dtype's C-long range (e.g. nodata=2**70 on an int64 raster), np.asarray(nodata).astype(agg.dtype) raises OverflowError ("Python int too large to convert to C long") before the round-trip check runs. The rest of the nodata validation raises ValueError, so this is an inconsistent error type for the same class of bad input. Wrap the cast so an out-of-range int surfaces as the same ValueError ("out of range for integer dtype") instead of a raw OverflowError.

Nits (optional improvements)

None.

What looks good

  • The check lives in _resolve_nodata, which runs before backend dispatch, so the validation covers numpy, dask, cupy, and dask+cupy without per-backend duplication.
  • The round-trip comparison (cast != int(nodata)) is the right test: it catches both unsigned wrap (-1 to 255) and signed overflow (200 to -56), and the boundary value (uint8 255) still passes.
  • Tests parametrize numpy and dask+numpy, cover the _FillValue attr path, and assert the in-range boundary case still masks correctly rather than only checking that bad input raises.
  • The PR correctly identifies that boolean input cannot reach this helper (rejected upstream by _validate_raster) and avoids adding dead code.

Note on the diff

The PR diff shows a change to .claude/sweep-test-coverage-state.csv. That is base divergence, not part of this change. The branch was cut before main updated that file. Merging main into the branch will clear it.

Checklist

  • Algorithm matches the intended fix: yes
  • All implemented backends produce consistent results: yes (validation is pre-dispatch)
  • NaN handling correct: yes (float NaN sentinel path unchanged)
  • Edge cases covered by tests: mostly; the beyond-int64 overflow case is not covered
  • Dask chunk boundaries handled correctly: yes (boundary masking test passes on dask)
  • No premature materialization: yes
  • Benchmark exists or not needed: not needed (validation-only change)
  • README feature matrix updated: not applicable (no new function)
  • Docstrings present and accurate: yes (Raises section updated)

… nodata (#2660)

Range-check the integer sentinel before the numpy cast so a Python int
past the C-long range surfaces as the same out-of-range ValueError as any
other unrepresentable sentinel, instead of a raw OverflowError.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after 7097fb0)

The one suggestion from the first pass is resolved.

  • xrspatial/resample.py:1236-1247: The integer sentinel is now range-checked against np.iinfo(agg.dtype) before the numpy cast. A Python int beyond the C-long range (e.g. 2**70) now raises the same out-of-range ValueError as any other unrepresentable sentinel rather than a raw OverflowError. A regression test (test_sentinel_beyond_int64_raises_valueerror) covers it.

Status

  • Blockers: none.
  • Suggestions: 1 raised, 1 fixed.
  • Nits: none.

No new findings. test_resample.py passes (26 nodata tests, 198 total) and resample.py is flake8-clean.

One pre-existing note, not introduced here and out of scope for this PR: a np.float32 sentinel with a fractional part bypasses the Python-float fractional check at line 1228 and would truncate silently. np.float64 is covered because it subclasses Python float. Worth a separate look if float32 attr sentinels are a real input path.

@brendancol
Copy link
Copy Markdown
Contributor Author

Closing this as superseded by #2676, which already merged the same allow_rotated / allow_invalid_nodata VRT forwarding for #2672.

One thing this PR got right that the merged version did not: in the per-source (non-chunked) VRT read path, #2676 forwards allow_invalid_nodata=allow_rotated instead of allow_invalid_nodata=allow_invalid_nodata (xrspatial/geotiff/_vrt.py, around line 59). The chunked path is correct. I'm filing a focused bugfix with a regression test for that one line rather than re-merging this whole branch.

@brendancol
Copy link
Copy Markdown
Contributor Author

Reopening — my earlier close was a mistake. I confused this PR (resample nodata validation, branch issue-2660) with unrelated VRT forwarding work on a different branch, and the "bug from #2676" I referenced does not exist (_vrt.py forwards allow_invalid_nodata correctly on main). Apologies for the noise. This PR stands on its own and remains open.

@brendancol brendancol merged commit cd89281 into main May 30, 2026
7 checks passed
Melissari1997 pushed a commit to Melissari1997/xarray-spatial that referenced this pull request May 31, 2026
…b#2671)

* Repair sweep-test-coverage state CSV

Dedupe duplicate module rows (last-write-wins by last_inspected) and
collapse multi-line notes to single physical lines. The notes had
embedded newlines, which the merge=union .gitattributes strategy splits
record-by-record, corrupting the file into a 156-column phantom row on
parallel-agent appends. One line per record keeps union merges safe.

* resample: reject out-of-range integer nodata sentinels (xarray-contrib#2660)

An out-of-range integer nodata sentinel used to wrap silently on the
dtype cast in _resolve_nodata (e.g. 999 -> 231 for uint8), masking the
wrong cells. Validate that the sentinel round-trips exactly into the
integer dtype and raise a clear ValueError otherwise.

Boolean inputs are already rejected upstream by _validate_raster, so the
nonzero-collapse case in the finding cannot reach _resolve_nodata via the
public resample() API; no separate bool handling was added.

* Address review: raise ValueError (not OverflowError) for beyond-int64 nodata (xarray-contrib#2660)

Range-check the integer sentinel before the numpy cast so a Python int
past the C-long range surfaces as the same out-of-range ValueError as any
other unrepresentable sentinel, instead of a raw OverflowError.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

resample: integer/bool nodata sentinels silently wrap and mask valid data

1 participant