Skip to content

refactor(errors)!: a lean, idiomatic DataRetrievalError taxonomy#319

Draft
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:fix/error-taxonomy-followups
Draft

refactor(errors)!: a lean, idiomatic DataRetrievalError taxonomy#319
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:fix/error-taxonomy-followups

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

@thodson-usgs thodson-usgs commented Jun 3, 2026

What

A small, idiomatic exception taxonomy: every request failure raises a subclass of DataRetrievalError, so one except dataretrieval.DataRetrievalError handles any of them. It adds only what the underlying httpx exceptions can't express.

DataRetrievalError(Exception)
├─ HTTPError                  # .status_code — the server returned an error status
│   └─ TransientError         # .retry_after — retryable (429 / 5xx)
│       ├─ RateLimited        #   429
│       └─ ServiceUnavailable #   5xx
├─ RequestTooLarge            # the request can't fit
│   ├─ URLTooLong             #   414 / client-side over-long URL
│   └─ Unchunkable            #   the Water Data chunker can't split the call
└─ NoSitesError               # a 200 response with no data

A single factory, dataretrieval.exceptions.error_for_status(status, message, *, retry_after), maps a status to its type, and every request path routes through it — the legacy query path (nwis/wqp/nldi), the Water Data chunker, and nadp/streamstats — so a given status surfaces as the same type everywhere.

  • Fatal 4xx (400/401/403/404/405/…) → a generic HTTPError carrying .status_code. Inspect the code (except HTTPError as e: ... e.status_code == 404) rather than catching a class per code.
  • Retryable (429 / 5xx) → RateLimited / ServiceUnavailable (both TransientError), carrying .retry_after. The Water Data chunker keys its auto-retry/resume on TransientError; the single-shot paths raise the typed error for the caller to handle.
  • Domain failures httpx can't express → Unchunkable (the chunker can't split the call), URLTooLong (a 414 or a client-side over-long URL), NoSitesError (a 200 with no data).
  • Connection-level failures (timeouts, DNS) surface as httpx exceptions on the single-shot paths.
try:
    df, md = dataretrieval.waterdata.get_daily(...)
except dataretrieval.TransientError as e:
    time.sleep(e.retry_after or 1); ...   # 429 / 5xx — retry
except dataretrieval.HTTPError as e:
    ...  # any other error status; inspect e.status_code
except dataretrieval.DataRetrievalError:
    ...  # anything else (too-large, no-data, …)

Breaking changes

  • Request failures raise typed DataRetrievalError subclasses instead of bare ValueError / RuntimeError / httpx.HTTPStatusError. The exceptions root only at DataRetrievalError(Exception) — they no longer also inherit ValueError / RuntimeError, so except ValueError / except RuntimeError no longer catch them. Catch DataRetrievalError (or a subclass).
  • A fatal 4xx is an HTTPError (read .status_code); there are no per-code exception types.

Verification

mypy --strict clean; ruff clean; full suite green (483 passed, 2 skipped); live spot-checks of the error paths; the Water Data chunker's ~40 resume tests pass unchanged (the resume path is untouched). Adds a dataretrieval.exceptions API reference page.

🤖 Generated with Claude Code

@thodson-usgs thodson-usgs changed the title fix(errors): correct DataRetrievalError taxonomy follow-ups from code review fix(errors): unify HTTP status→exception across all request paths Jun 3, 2026
Every request failure raises a subclass of DataRetrievalError, so a caller can
handle any of them with a single `except dataretrieval.DataRetrievalError`. The
taxonomy stays small -- it adds only what the underlying httpx exceptions can't
express:

  DataRetrievalError(Exception)
  |- HTTPError                   # .status_code -- the server returned an error status
  |   '- TransientError          # .retry_after -- retryable (429 / 5xx)
  |       |- RateLimited         #   429
  |       '- ServiceUnavailable  #   5xx
  |- RequestTooLarge             # the request can't fit
  |   |- URLTooLong              #   414 / client-side over-long URL
  |   '- Unchunkable             #   the Water Data chunker can't split the call
  '- NoSitesError                # a 200 response with no data

One factory -- error_for_status(status, message, *, retry_after) -- maps a
status to its type, and every request path routes through it (the legacy
`query` path, the Water Data chunker, nldi, nadp, streamstats), so a given
status surfaces as the same type everywhere. A fatal 4xx is a generic HTTPError
carrying .status_code (inspect the code rather than a class per code). The
chunker keys retry/resume on TransientError; connection-level failures
(timeouts, DNS) surface as httpx exceptions on the single-shot paths.

BREAKING CHANGES
- Request failures raise typed DataRetrievalError subclasses instead of bare
  ValueError / RuntimeError / httpx.HTTPStatusError. The exceptions root only at
  DataRetrievalError(Exception) and no longer also inherit ValueError /
  RuntimeError -- catch DataRetrievalError (or a subclass), not the builtins.
- A fatal 4xx raises HTTPError (read .status_code); there are no per-code types.

Also adds a dataretrieval.exceptions API docs page.

mypy --strict clean; ruff clean; full suite green (483 passed, 2 skipped); the
Water Data chunker's resume tests pass unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@thodson-usgs thodson-usgs force-pushed the fix/error-taxonomy-followups branch from 3651f23 to 108b1ca Compare June 3, 2026 16:20
@thodson-usgs thodson-usgs changed the title fix(errors): unify HTTP status→exception across all request paths refactor(errors)!: a lean, idiomatic DataRetrievalError taxonomy Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant