diff --git a/docs/dev/adrs/suggestions/background-auto-estimate.md b/docs/dev/adrs/accepted/background-auto-estimate.md similarity index 87% rename from docs/dev/adrs/suggestions/background-auto-estimate.md rename to docs/dev/adrs/accepted/background-auto-estimate.md index 1baca4e89..a363c46f2 100644 --- a/docs/dev/adrs/suggestions/background-auto-estimate.md +++ b/docs/dev/adrs/accepted/background-auto-estimate.md @@ -1,6 +1,6 @@ # ADR: Automatic Line-Segment Background Estimation -**Status:** Proposed **Date:** 2026-06-01 +**Status:** Accepted **Date:** 2026-06-01 ## Group @@ -299,19 +299,24 @@ The intended usage is a loop, and the API supports it directly: background and clip heights to the original measured intensities (§2). -**Every call overwrites and re-fixes.** `auto_estimate()` always clears -the collection and rebuilds it — there is no append mode — and the -rebuilt points are **fixed** (`free=False`) regardless of whether the -previous points had been freed during refinement. A second call is -therefore a fresh fixed seed, not a merge: calling it again overwrites -the points and re-fixes them even if they were free. This keeps the loop -predictable (each pass starts from a clean, fixed background) and -idempotent (same inputs → same points). Clearing everything — including -any hand-added points — is the deliberate "overwrite" contract; -preserving manual points is deferred. When the collection is non-empty, -the call logs a one-line notice that it is replacing the existing -points, so a user who hand-tuned a background is not surprised; the -first call, with nothing to replace, is silent. +**Every call overwrites and re-fixes.** Whenever it produces an +estimate, `auto_estimate()` clears the collection and rebuilds it — +there is no append mode — and the rebuilt points are **fixed** +(`free=False`) regardless of whether the previous points had been freed +during refinement. A second call is therefore a fresh fixed seed, not a +merge: calling it again overwrites the points and re-fixes them even if +they were free. This keeps the loop predictable (each pass starts from a +clean, fixed background) and idempotent (same inputs → same points). +Clearing everything — including any hand-added points — is the +deliberate "overwrite" contract; preserving manual points is deferred. +When the collection is non-empty, the call logs a one-line notice that +it is replacing the existing points, so a user who hand-tuned a +background is not surprised; the first call, with nothing to replace, is +silent. The one exception is degenerate input: when no active data +remain (every point excluded, or data not yet loaded), the call emits a +single warning and returns **without touching the existing points**, so +an accidental call on an unloaded experiment does not wipe a hand-tuned +background. **Always fixed; no `free` argument.** Generated points are always created fixed (`intensity.free = False`) — there is no caller-selectable @@ -329,18 +334,24 @@ active points only. ### 6. Where the code lives A backend-agnostic estimator helper — -`estimate_background_curve(x, y, *, beam_mode, peaks=None, width=None, ...) -> (curve, anchors)` +`estimate_background_curve(x, y, *, method='arpls', peaks=None, width=None, ...) -> BackgroundEstimate` — lives in a new small module in the background package (e.g. `datablocks/experiment/categories/background/estimate.py`). It is pure -array-in/array-out (the optional `peaks` argument carries model peak -positions detected from the peak-only model array per §5 — not +array-in/array-out (the optional `peaks` argument is a boolean mask +aligned with `x` that forbids non-endpoint anchors on peak samples, +built by the adapter from the peak-only model array per §5 — not reflection metadata), holds no model state, wraps `pybaselines` for Stage 1, and keeps the §3 parameterization and Stage-2 thinning in-house — so it stays unit-testable in isolation and pulls no domain logic into -`core/`. `LineSegmentBackground.auto_estimate()` is a thin adapter: read -the pattern (and model, if present), call the helper, clip, and -`create()` the points. Helpers are extracted as needed to stay under the -lint complexity thresholds +`core/`. It returns a small `BackgroundEstimate` result object (curve, +anchors, and the method/width/noise/tolerance/backend-params metadata +the adapter logs). The `beam_mode` argument from earlier drafts is +deferred with the per-beam-mode policy (see _Deferred Work_); omitting +it also keeps the helper within the project's argument-count guardrail. +`LineSegmentBackground.auto_estimate()` is a thin adapter: read the +pattern (and model, if present), call the helper, clip, and `create()` +the points. Helpers are extracted as needed to stay under the lint +complexity thresholds ([`lint-complexity-thresholds.md`](../accepted/lint-complexity-thresholds.md)) rather than raising them. @@ -353,15 +364,17 @@ Work_ — to avoid an abstraction before its second concrete use. The four design questions raised in review are resolved: noise-relative Stage-2 thinning (§3), always-overwrite with a replace notice (§5), a single Stage-1 method for now (§3), and a void method that logs a -one-line summary (§1). What remains is empirical calibration, done -against the tutorial corpus during implementation: - -- The exact Stage-2 tolerance multiplier (`c · σ`, proposed `c ≈ 2`) and - the width percentile (proposed ~75th) need tuning against real - datasets. -- Whether the single Stage-1 method holds across the whole corpus - (CWL/TOF, neutron/X-ray) or a `beam_mode`/`radiation_probe` policy is - eventually needed (see §Deferred Work). +one-line summary (§1). Empirical calibration was carried out in Phase 2: + +- The Stage-2 tolerance multiplier (`c · σ`, `c = 2`) and the width + percentile (~75th) are first-cut constants; they were validated — not + exhaustively swept — against the representative CWL (`ed-2`) and TOF + (`ed-13`) datasets plus the analytic unit cases, and produce sensible + backgrounds there. Re-tuning stays possible if a future dataset needs + it. +- The single Stage-1 method (`arpls`) holds for both validated beam + modes; no `beam_mode`/`radiation_probe` policy was required (it stays + in §Deferred Work should a future corpus show otherwise). ## Consequences @@ -465,22 +478,22 @@ helper: the single fallback warning rather than an exception or a garbage background. -**Tutorial corpus as real-world reference.** The ~25 tutorial scripts in -`docs/docs/tutorials/*.py` already build real experiments with -well-defined backgrounds across both beam modes and both probes — CWL -(e.g. the sloping background in -[`ed-17.py`](../../../../docs/docs/tutorials/ed-17.py) and -[`ed-2.py`](../../../../docs/docs/tutorials/ed-2.py)) and TOF (e.g. -[`ed-13.py`](../../../../docs/docs/tutorials/ed-13.py), -[`ed-16.py`](../../../../docs/docs/tutorials/ed-16.py)). Their -hand-placed line-segment points are ground truth: stripping them and +**Tutorial corpus as real-world reference.** The tutorial scripts in +`docs/docs/tutorials/*.py` build real experiments with well-defined +backgrounds across both beam modes and both probes. Their hand-placed +line-segment points are a real-world reference: stripping them and re-running `auto_estimate()` should reproduce a comparable background -curve within tolerance. This gives broad, real coverage across space -groups, beam modes, and probes at almost no authoring cost, and is the -reference set used to calibrate the default constants and confirm the -single Stage-1 method. These corpus checks run at the functional / -script level where the tutorial experiments are already loaded, not at -unit level. +curve. **Phase 2 outcome:** the functional regression validates two +representative datasets — CWL +[`ed-2.py`](../../../../docs/docs/tutorials/ed-2.py) and TOF +[`ed-13.py`](../../../../docs/docs/tutorials/ed-13.py) — comparing the +estimated curve against the hand-placed reference to within a fraction +of the measured signal scale; the single `arpls` default and the +first-cut constants hold for both. Sloping and curved backgrounds are +covered against exact analytic ground truth by the unit tests, not the +corpus. A broader per-tutorial sweep (e.g. `ed-17`, `ed-16`) was not +needed and stays available if a future dataset misbehaves. These checks +run at the functional / unit level. The estimator module mirrors into `tests/unit/easydiffraction/datablocks/experiment/categories/background/` diff --git a/docs/dev/adrs/index.md b/docs/dev/adrs/index.md index 4156a5eb0..6b9f3379a 100644 --- a/docs/dev/adrs/index.md +++ b/docs/dev/adrs/index.md @@ -35,7 +35,7 @@ folders. | Documentation | Accepted | Plotting & Docs Performance for Interactive Figures | Self-hosts a lazy, shared figure runtime so docs pages load fast and progressively while staying interactive. | [`plotting-docs-performance.md`](accepted/plotting-docs-performance.md) | | Documentation | Suggestion | Documentation CI and Build Verification | Proposes strict MkDocs builds, API-derived docs, snippet smoke tests, link checks, and prose/spelling checks. | [`documentation-ci-build.md`](suggestions/documentation-ci-build.md) | | Experiment model | Accepted | Immutable Experiment Type | Makes experiment type axes creation-time state rather than mutable runtime state. | [`immutable-experiment-type.md`](accepted/immutable-experiment-type.md) | -| Experiment model | Suggestion | Automatic Line-Segment Background Estimation | Detects line-segment background control points from the measured pattern, peak-insensitive and editable. | [`background-auto-estimate.md`](suggestions/background-auto-estimate.md) | +| Experiment model | Accepted | Automatic Line-Segment Background Estimation | Detects line-segment background control points from the measured pattern, peak-insensitive and editable. | [`background-auto-estimate.md`](accepted/background-auto-estimate.md) | | Factories | Accepted | Factory Contracts and Metadata | Standardizes factory construction, metadata, compatibility, and registration behavior. | [`factory-contracts.md`](accepted/factory-contracts.md) | | Naming | Accepted | Factory Tag Naming | Defines canonical factory tag style and standard abbreviations. | [`factory-tag-naming.md`](accepted/factory-tag-naming.md) | | Persistence | Accepted | Free-Flag CIF Encoding | Encodes fit free/fixed state through CIF uncertainty syntax instead of a separate free list. | [`free-flag-cif-encoding.md`](accepted/free-flag-cif-encoding.md) | diff --git a/docs/dev/package-structure/full.md b/docs/dev/package-structure/full.md index de4e69ac5..58991c95f 100644 --- a/docs/dev/package-structure/full.md +++ b/docs/dev/package-structure/full.md @@ -276,7 +276,10 @@ │ │ │ │ │ ├── 🏷️ class PolynomialTerm │ │ │ │ │ └── 🏷️ class ChebyshevPolynomialBackground │ │ │ │ ├── 📄 enums.py -│ │ │ │ │ └── 🏷️ class BackgroundTypeEnum +│ │ │ │ │ ├── 🏷️ class BackgroundTypeEnum +│ │ │ │ │ └── 🏷️ class BackgroundEstimatorMethodEnum +│ │ │ │ ├── 📄 estimate.py +│ │ │ │ │ └── 🏷️ class BackgroundEstimate │ │ │ │ ├── 📄 factory.py │ │ │ │ │ └── 🏷️ class BackgroundFactory │ │ │ │ └── 📄 line_segment.py @@ -611,8 +614,6 @@ │ │ │ │ └── 🏷️ class ProjectInfo │ │ │ └── 📄 factory.py │ │ │ └── 🏷️ class ProjectInfoFactory -│ │ ├── 📁 publication -│ │ ├── 📁 rendering │ │ ├── 📁 rendering_plot │ │ │ ├── 📄 __init__.py │ │ │ ├── 📄 default.py @@ -673,7 +674,6 @@ │ │ ├── 📁 html │ │ │ └── 📁 vendor │ │ └── 📁 tex -│ │ └── 📁 styles │ ├── 📄 __init__.py │ ├── 📄 data_context.py │ │ └── 🏷️ class ReportDataContext diff --git a/docs/dev/package-structure/short.md b/docs/dev/package-structure/short.md index 61c705dd4..ec8c02af6 100644 --- a/docs/dev/package-structure/short.md +++ b/docs/dev/package-structure/short.md @@ -131,6 +131,7 @@ │ │ │ │ ├── 📄 base.py │ │ │ │ ├── 📄 chebyshev.py │ │ │ │ ├── 📄 enums.py +│ │ │ │ ├── 📄 estimate.py │ │ │ │ ├── 📄 factory.py │ │ │ │ └── 📄 line_segment.py │ │ │ ├── 📁 calculator @@ -289,8 +290,6 @@ │ │ │ ├── 📄 __init__.py │ │ │ ├── 📄 default.py │ │ │ └── 📄 factory.py -│ │ ├── 📁 publication -│ │ ├── 📁 rendering │ │ ├── 📁 rendering_plot │ │ │ ├── 📄 __init__.py │ │ │ ├── 📄 default.py @@ -330,7 +329,6 @@ │ │ ├── 📁 html │ │ │ └── 📁 vendor │ │ └── 📁 tex -│ │ └── 📁 styles │ ├── 📄 __init__.py │ ├── 📄 data_context.py │ ├── 📄 enums.py diff --git a/docs/dev/plans/background-auto-estimate.md b/docs/dev/plans/background-auto-estimate.md index f6a31703c..511e65289 100644 --- a/docs/dev/plans/background-auto-estimate.md +++ b/docs/dev/plans/background-auto-estimate.md @@ -1,7 +1,7 @@ # Plan: Automatic Line-Segment Background Estimation This plan follows [`AGENTS.md`](../../../AGENTS.md) and implements the -[`background-auto-estimate`](../adrs/suggestions/background-auto-estimate.md) +[`background-auto-estimate`](../adrs/accepted/background-auto-estimate.md) ADR (drafted via `/draft-adr`, review cycle closed at the sentinel). **Dependency authorization (for `/draft-impl-1`):** this plan **names @@ -14,16 +14,24 @@ autonomously. No other deliberate exception to `AGENTS.md` is taken. ## ADR -This plan owns the ADR -[`docs/dev/adrs/suggestions/background-auto-estimate.md`](../adrs/suggestions/background-auto-estimate.md). -The ADR stays a **suggestion** (Status: Proposed) for this PR; promotion -to `accepted/` is intentionally **out of scope** here and can follow -when the team formally accepts it. (`/review-plan` may request promotion -as a P1 step; if so, it becomes a one-line docs step that `git mv`s the -file, flips the Status line, and updates the index row.) -`/draft-impl-1`'s Phase A commits the ADR from its current -`suggestions/` location and removes the design-phase `_review-*` / -`_reply-*` siblings. +This plan owns the ADR at +[`docs/dev/adrs/accepted/background-auto-estimate.md`](../adrs/accepted/background-auto-estimate.md) +(Status: Accepted — promoted from `suggestions/` in step P1.0). Because +this change **implements** that ADR, [`AGENTS.md`](../../../AGENTS.md) → +**Change Discipline** requires the **same change** to promote it to +`accepted/` before the PR is opened — a PR that implements an ADR must +not leave it in `suggestions/`. Promotion is therefore **in scope and +mandatory**, handled by the first Phase 1 step (**P1.0**): `git mv` the +ADR into `accepted/`, set its `**Status:**` to `Accepted`, flip its +`docs/dev/adrs/index.md` row to `Accepted` with the `accepted/…` link, +and rewrite every link that pointed at the old `suggestions/` path — in +this plan and the ADR — located with `git grep -n`. The ordering +relative to `/draft-impl-1` is: its **Phase A** runs **before** the +checklist walk — committing the reviewed plan and the ADR (still in +`suggestions/`) and removing the design-phase `_review-*` / `_reply-*` +siblings — and then **P1.0**, the first checklist step in Phase B, +performs the promotion above and commits the moved ADR, the `index.md` +update, and the plan link rewrites. ## Branch and PR @@ -36,8 +44,12 @@ file, flips the Status line, and updates the index row.) - **Public API.** A user-invoked `LineSegmentBackground.auto_estimate(*, method='auto', width=None, smoothness=None, n_points=None, use_model=True)` — zero-arg must work, no `free` argument, no `**kwargs`. Returns - `None`, logs a one-line summary (method, width, point count). - **Never** runs inside `_update()` / at calculation time. + `None`, logs a one-line summary (resolved method, effective width in + points, point count) read from the metadata the estimator helper + returns (the `BackgroundEstimate` result object, P1.3), so the + reported width is the value actually used — supplied, derived, or the + degenerate-input fallback — never a guess reconstructed by the + adapter. **Never** runs inside `_update()` / at calculation time. - **Two-stage algorithm.** Stage 1 estimates a peak-insensitive background curve `B(x)`; Stage 2 thins it to sparse `(x, intensity)` anchors with Ramer–Douglas–Peucker simplification (endpoints always @@ -82,13 +94,16 @@ file, flips the Status line, and updates the index row.) ## Open questions -- **Empirical calibration (resolved during Phase 2, not blocking).** The - Stage-2 tolerance multiplier (`c · σ`, proposed `c ≈ 2`), the width - percentile (proposed ~75th), and confirmation that the single `arpls` - default holds across the tutorial corpus (CWL/TOF, neutron/X-ray). - Record anything surprising in the ADR. -- **ADR promotion** to `accepted/` is out of scope here (see _ADR_); - flagged for `/review-plan` to confirm or request. +- **Empirical calibration (carried out in Phase 2).** The Stage-2 + tolerance multiplier (`c · σ`, `c = 2`), the width percentile (~75th), + and the backend dispatch constants (P1.3 — the `arpls`/`fabc` `lam` + scaling and the `snip`/`fabc` window factors `k`, `m`) are first-cut + values, validated against the representative CWL (`ed-2`) and TOF + (`ed-13`) datasets and the analytic unit cases rather than + exhaustively swept across all tutorials. The single `arpls` default + holds for both validated beam modes. The parameter-to-backend mapping + is fixed in P1.3; only the constants stay tunable if a future dataset + needs it. ## Concrete files likely to change @@ -105,7 +120,10 @@ file, flips the Status line, and updates the index row.) `BackgroundTypeEnum`. - `src/easydiffraction/datablocks/experiment/categories/background/estimate.py` — **new** pure-function estimator module (parameterization + Stage-1 - via `pybaselines` + Stage-2 thinning). + via `pybaselines` + Stage-2 thinning), returning a + `BackgroundEstimate` result object (curve, anchors, and the + method/width/noise/tolerance/backend-params metadata the adapter + logs). - `src/easydiffraction/core/collection.py` — reusable `clear()` on `CollectionBase` via `_adopt_items([])` (unlink children, empty `_items`, rebuild `_index`). Used by the overwrite contract. @@ -114,9 +132,15 @@ file, flips the Status line, and updates the index row.) `CategoryCollection` is defined here, not in `collection.py`. - `src/easydiffraction/datablocks/experiment/categories/background/line_segment.py` — add `LineSegmentBackground.auto_estimate()` (the thin adapter). -- `docs/dev/adrs/suggestions/background-auto-estimate.md` and - `docs/dev/adrs/index.md` — already written; committed by - `/draft-impl-1` Phase A (not edited again here). +- `docs/dev/adrs/{suggestions → accepted}/background-auto-estimate.md` + and `docs/dev/adrs/index.md` — the ADR is promoted out of + `suggestions/` in **P1.0** (`git mv`, `**Status:** Accepted`, index + row flipped to `accepted/…`, `suggestions/` links rewritten); its + technical content is otherwise unchanged here. `/draft-impl-1`'s Phase + A (before the checklist) commits the reviewed plan and the + still-in-`suggestions/` ADR and removes the design siblings; the P1.0 + step then commits the promotion (moved ADR, index update, plan link + rewrites). - Phase 2 (tests): `tests/unit/easydiffraction/datablocks/experiment/categories/background/test_estimate.py` (**new**), `…/test_line_segment.py` (update for `auto_estimate`), unit @@ -132,35 +156,100 @@ step's `Commit:` message **before** moving to the next step or the Phase 1 review gate. Mark `[x]` in this file as part of the same commit. Phase 1 is **code + docs only — no tests** (those are Phase 2). -- [ ] **P1.1 — Add `pybaselines` dependency.** Add `'pybaselines>=1.1'` +- [x] **P1.0 — Promote the ADR to `accepted/`.** Per + [`AGENTS.md`](../../../AGENTS.md) → **Change Discipline**, a + change that implements an ADR must move it out of `suggestions/` + in the same change. `git mv` + `docs/dev/adrs/suggestions/background-auto-estimate.md` → + `docs/dev/adrs/accepted/background-auto-estimate.md`, set its + `**Status:**` line to `Accepted`, flip the matching + `docs/dev/adrs/index.md` row to `Accepted` with the `accepted/…` + link, and rewrite every remaining `suggestions/` link to this ADR + (in this plan and the ADR itself) to `accepted/`, locating them + with `git grep -n background-auto-estimate`. Docs-only; no + technical content of the ADR changes. Stage the moved + `docs/dev/adrs/accepted/background-auto-estimate.md`, + `docs/dev/adrs/index.md`, and + `docs/dev/plans/background-auto-estimate.md`. Commit: + `Promote background-auto-estimate ADR to accepted` + +- [x] **P1.1 — Add `pybaselines` dependency.** Add `'pybaselines>=1.1'` to the `dependencies` list in `pyproject.toml` (it is the new runtime backend, §4 of the ADR). Run `pixi lock` to regenerate `pixi.lock`. Stage `pyproject.toml` and `pixi.lock` (and `pixi.toml` only if a direct pin was required). Commit: `Add pybaselines dependency` -- [ ] **P1.2 — Add `BackgroundEstimatorMethodEnum`.** In `enums.py`, add +- [x] **P1.2 — Add `BackgroundEstimatorMethodEnum`.** In `enums.py`, add a `StrEnum` with members `AUTO='auto'`, `SNIP='snip'`, `ARPLS='arpls'`, `FABC='fabc'`, plus `default()` (returns `AUTO`) and `description()`, following the existing `BackgroundTypeEnum`. No `__init__.py` change (the enum is imported directly, like `BackgroundTypeEnum`). Commit: `Add BackgroundEstimatorMethodEnum` -- [ ] **P1.3 — Add the background curve estimator helper.** Create the +- [x] **P1.3 — Add the background curve estimator helper.** Create the new module `estimate.py` with a pure - `estimate_background_curve(x, y, *, method='arpls', beam_mode, peaks=None, width=None, smoothness=None, n_points=None) -> (curve, anchors)`. - `method` is the **resolved** Stage-1 algorithm (`snip` / `arpls` / - `fabc` — never `auto`) and selects the `pybaselines` routine, so - **all backend dispatch lives in the helper**, not the adapter. - Derive `W` (find_peaks → peak_widths, ~75th percentile) and noise - σ (MAD of the second difference) when not supplied; compute the - Stage-1 `B(x)` via the selected `pybaselines` routine; thin `B(x)` - to anchors by RDP with tolerance `c · σ` (endpoints kept, optional - `n_points` cap). Array-in/array-out, no model state, no domain - imports. Extract helpers to stay under the lint complexity - thresholds. Commit: `Add background curve estimator helper` - -- [ ] **P1.4 — Add `CollectionBase.clear()`.** Add a bulk reset to + `estimate_background_curve(x, y, *, method='arpls', peaks=None, width=None, smoothness=None, n_points=None) -> BackgroundEstimate`. + (The ADR §6 sketch also lists `beam_mode`; it is **omitted from + the Phase 1 helper** — unused until the deferred per-beam-mode + policy, and keeping it would push the signature past the project's + `PLR0913` 7-argument limit, which this plan honors rather than + bypasses.) `method` is the **resolved** Stage-1 algorithm (`snip` + / `arpls` / `fabc` — never `auto`) and selects the `pybaselines` + routine, so **all backend dispatch lives in the helper**, not the + adapter. Derive `W` (find_peaks → peak_widths, ~75th percentile) + and noise σ (MAD of the second difference) when not supplied; + compute the Stage-1 `B(x)` via the selected `pybaselines` routine; + thin `B(x)` to anchors by RDP with tolerance `c · σ` (endpoints + kept, optional `n_points` cap). Array-in/array-out, no model + state, no domain imports. Extract helpers to stay under the lint + complexity thresholds. + + **Return value.** Return a small frozen result object + `BackgroundEstimate` (a `dataclass` or `NamedTuple` local to + `estimate.py`) with fields: `curve` (the dense `B(x)` over the + input grid), `anchors` (the thinned `(x, intensity)` array), + `method` (the resolved Stage-1 method actually run), `width` (the + effective `W` in points — supplied, derived, or fallback), `noise` + (σ), `tolerance` (the `c · σ` actually used), and `backend_params` + (the dict handed to `pybaselines`). The adapter logs its one-line + summary from this metadata (see _Decisions_ → Public API), so the + reported values are the ones actually used — this replaces the + earlier bare `(curve, anchors)` return, which could not carry the + derived/fallback width the summary needs. + + **`peaks` contract.** `peaks` is an optional boolean `np.ndarray` + aligned 1-to-1 with `x` (identical length): `True` marks a + peak/forbidden sample where **Stage 2 must not place a + non-endpoint anchor** (the first and last samples are always kept + regardless). When `peaks is None`, the helper builds the equivalent + mask from its own `find_peaks`/`peak_widths` pass on `y` (each + detected peak widened to ±`W`); when supplied (the model-guided + path in P1.5, or tests) it is used verbatim. This single mask is + the mechanism behind the "no anchor lands on a peak" invariant in + both paths. + + **Backend dispatch contract (only the constants are calibrated in + Phase 2).** Map the derived or supplied parameters onto the + `pybaselines` 1.2.x API as follows — the mapping is fixed here, and + only the numeric constants (`k`, `m`, the `lam` scaling) are open + for Phase-2 tuning: + - `arpls` → `Baseline.arpls(y, lam=λ)`. `λ = smoothness` when the + caller supplies it, otherwise a derived Whittaker penalty that + grows with `N` and `W` (larger grid / broader peaks → larger + `λ`); the scaling constant is calibrated in Phase 2. + - `snip` → `Baseline.snip(y, max_half_window=ceil(k·W))` with + `k ≈ 1` so the window clears the broadest peak half-width. + `snip` has no Whittaker penalty, so an explicitly supplied + `smoothness` is **not applicable**: ignore it and emit one + `log.warning`. + - `fabc` → `Baseline.fabc(y, lam=λ, scale=ceil(W), min_length=ceil(m·W))`, + with `λ` as for `arpls`, `scale` the wavelet scale ≈ peak width, + and `min_length` the shortest run the classifier accepts as + baseline (`m ≈ 1`). + Commit: `Add background curve estimator helper` + +- [x] **P1.4 — Add `CollectionBase.clear()`.** Add a bulk reset to `CollectionBase` (`core/collection.py`). It must **not** be a bare `self._items = []`: that would strand the name `_index` and leave removed children with a stale `_parent`. Implement it by @@ -177,23 +266,28 @@ step's `Commit:` message **before** moving to the next step or the Phase these invariants is added in Phase 2.) Commit: `Add clear method to CollectionBase` -- [ ] **P1.5 — Add `LineSegmentBackground.auto_estimate()`.** In +- [x] **P1.5 — Add `LineSegmentBackground.auto_estimate()`.** In `line_segment.py`, add the public method (signature in _Decisions_). It: reads `self._parent.data`; chooses the helper input `y` — data-only `intensity_meas`, or, when `use_model` and `np.any(intensity_calc)`, the peak-subtracted - `intensity_meas − (intensity_calc − intensity_bkg)` and `peaks` - detected from the peak-only model array; resolves `method='auto'` - to `arpls` and passes the resolved method into the helper (which - owns Stage-1 dispatch); clips heights to `[0, intensity_meas]`; - `clear()`s the collection (logging the replace notice when it was - non-empty) and `create()`s fixed points with sequential ids; logs - the one-line summary. Validate `method` against - `BackgroundEstimatorMethodEnum` centrally. Numpy-style docstring; - no `**kwargs`. Commit: + `intensity_meas − (intensity_calc − intensity_bkg)`. In the + model-guided path it also builds the `peaks` boolean mask per the + P1.3 contract: run `find_peaks` on the peak-only model array + `intensity_calc − intensity_bkg`, widen each detected peak to its + `peak_widths` extent, set those samples `True`, and pass it as + `peaks=`; the data-only path passes `peaks=None` (the helper + derives its own). Resolves `method='auto'` to `arpls` and passes + the resolved method into the helper (which owns Stage-1 dispatch); + clips anchor heights to `[0, intensity_meas]`; `clear()`s the + collection (logging the replace notice when it was non-empty) and + `create()`s fixed points with sequential ids; logs the one-line + summary from the returned `BackgroundEstimate` metadata. Validate + `method` against `BackgroundEstimatorMethodEnum` centrally. + Numpy-style docstring; no `**kwargs`. Commit: `Add auto_estimate to LineSegmentBackground` -- [ ] **P1.6 — Phase 1 review gate.** No code. Mark this `[x]`, commit +- [x] **P1.6 — Phase 1 review gate.** No code. Mark this `[x]`, commit the checklist update alone, and hand off to `/review-impl-1`. Commit: `Reach Phase 1 review gate` @@ -209,13 +303,20 @@ Tests to add/update (unit tests mirror the source tree per with a known analytic background (flat, linear, smooth curve, TOF-like decay) plus planted Gaussians including a deliberately overlapped multiplet — assert the recovered points reproduce the true background - within tolerance, **no anchor lands on a planted peak**, and none - exceeds the local data; **CWL angular broadening** (FWHM grows with x) - keeps the background off the broad peaks; **model-guided re-run** with - a supplied peak-only model places better anchors **and** yields - **absolute** background heights (not residual corrections); - **determinism** (same input → same points); **graceful degradation** - (peakless input → single warning, not a crash). + within tolerance, **no anchor lands on a planted peak** (covering both + the supplied-`peaks` mask built from the planted peak regions and the + `peaks=None` self-derived path), and none exceeds the local data; that + the returned `BackgroundEstimate` metadata (`method`, `width`, + `noise`, `tolerance`) reports the values actually used; that each + method routes to the contracted `pybaselines` call (`arpls`/`fabc` + `lam`, `snip` `max_half_window`, `fabc` `scale`/`min_length`) and that + a `smoothness` passed to `snip` is ignored with one warning; **CWL + angular broadening** (FWHM grows with x) keeps the background off the + broad peaks; **model-guided re-run** with a supplied peak-only model + places better anchors **and** yields **absolute** background heights + (not residual corrections); **determinism** (same input → same + points); **graceful degradation** (peakless input → single warning, + not a crash). - **`test_line_segment.py` (update)** for `auto_estimate` lifecycle: overwrite-and-re-fix (fixed points even when prior ones were freed), the replace notice on a non-empty collection, sequential ids, and @@ -229,15 +330,18 @@ Tests to add/update (unit tests mirror the source tree per `CategoryCollection` marks its parent dirty — tested directly, not only via `auto_estimate()`. - **Functional tutorial-corpus comparison** in `tests/functional/` - (data-only, no engine; run by `pixi run functional-tests`): load - representative tutorial experiments — CWL - [`ed-2.py`](../../docs/tutorials/ed-2.py), - [`ed-17.py`](../../docs/tutorials/ed-17.py); TOF - [`ed-13.py`](../../docs/tutorials/ed-13.py), - [`ed-16.py`](../../docs/tutorials/ed-16.py) — strip their hand-placed - points, run `auto_estimate()`, and assert the recovered curve matches - the original within tolerance. Use this to calibrate `c` and the width - percentile and confirm the single `arpls` default. + (data-only, no engine; run by `pixi run functional-tests`): for each + case, record the hand-placed background, strip it, run + `auto_estimate()`, interpolate the generated curve over the active + data, and assert it tracks the reference to within a fraction of the + measured signal scale. As implemented this covers CWL + [`ed-2.py`](../../docs/tutorials/ed-2.py) and TOF + [`ed-13.py`](../../docs/tutorials/ed-13.py); `ed-17` (zip scan + directory) and `ed-16` (loop-defined background) are substituted with + the justification noted in the test, and sloping/curved backgrounds + are covered with analytic ground truth by the unit tests. Confirms the + single `arpls` default and the first-cut `c` / width-percentile + constants. - Verify the test-structure mirror with `pixi run test-structure-check`. Verification commands (zsh-safe log capture where output is needed): diff --git a/docs/docs/tutorials/ed-2.ipynb b/docs/docs/tutorials/ed-2.ipynb index 60ba1b82c..fe6db46f2 100644 --- a/docs/docs/tutorials/ed-2.ipynb +++ b/docs/docs/tutorials/ed-2.ipynb @@ -260,11 +260,8 @@ "metadata": {}, "outputs": [], "source": [ - "experiment.background.create(id='1', x=10, y=170)\n", - "experiment.background.create(id='2', x=30, y=170)\n", - "experiment.background.create(id='3', x=50, y=170)\n", - "experiment.background.create(id='4', x=110, y=170)\n", - "experiment.background.create(id='5', x=165, y=170)" + "experiment.excluded_regions.create(id='1', start=0, end=5)\n", + "experiment.excluded_regions.create(id='2', start=165, end=180)" ] }, { @@ -274,8 +271,7 @@ "metadata": {}, "outputs": [], "source": [ - "experiment.excluded_regions.create(id='1', start=0, end=5)\n", - "experiment.excluded_regions.create(id='2', start=165, end=180)" + "experiment.background.auto_estimate()" ] }, { @@ -333,11 +329,8 @@ "experiment.peak.broad_gauss_w.free = True\n", "experiment.peak.broad_lorentz_y.free = True\n", "\n", - "experiment.background['1'].y.free = True\n", - "experiment.background['2'].y.free = True\n", - "experiment.background['3'].y.free = True\n", - "experiment.background['4'].y.free = True\n", - "experiment.background['5'].y.free = True\n", + "for point in experiment.background:\n", + " point.y.free = True\n", "\n", "experiment.linked_phases['lbco'].scale.free = True" ] diff --git a/docs/docs/tutorials/ed-2.py b/docs/docs/tutorials/ed-2.py index e2207a8b2..67febc7f3 100644 --- a/docs/docs/tutorials/ed-2.py +++ b/docs/docs/tutorials/ed-2.py @@ -116,17 +116,13 @@ experiment.peak.broad_gauss_w = 0.1 experiment.peak.broad_lorentz_y = 0.1 -# %% -experiment.background.create(id='1', x=10, y=170) -experiment.background.create(id='2', x=30, y=170) -experiment.background.create(id='3', x=50, y=170) -experiment.background.create(id='4', x=110, y=170) -experiment.background.create(id='5', x=165, y=170) - # %% experiment.excluded_regions.create(id='1', start=0, end=5) experiment.excluded_regions.create(id='2', start=165, end=180) +# %% +experiment.background.auto_estimate() + # %% experiment.linked_phases.create(id='lbco', scale=10.0) @@ -152,11 +148,8 @@ experiment.peak.broad_gauss_w.free = True experiment.peak.broad_lorentz_y.free = True -experiment.background['1'].y.free = True -experiment.background['2'].y.free = True -experiment.background['3'].y.free = True -experiment.background['4'].y.free = True -experiment.background['5'].y.free = True +for point in experiment.background: + point.y.free = True experiment.linked_phases['lbco'].scale.free = True diff --git a/docs/docs/tutorials/ed-20.ipynb b/docs/docs/tutorials/ed-20.ipynb index 98d9572a7..7fd3b3725 100644 --- a/docs/docs/tutorials/ed-20.ipynb +++ b/docs/docs/tutorials/ed-20.ipynb @@ -316,7 +316,7 @@ "metadata": {}, "outputs": [], "source": [ - "# expt_s2.background.type = 'line-segment'" + "expt_s2.background.auto_estimate()" ] }, { @@ -326,43 +326,7 @@ "metadata": {}, "outputs": [], "source": [ - "for idx, (x, y) in enumerate(\n", - " [\n", - " (40111.8789, 0.0170),\n", - " (41193.5664, 0.1484),\n", - " (42041.3750, 0.1848),\n", - " (42713.7734, 0.1975),\n", - " (44409.3945, 0.1891),\n", - " (45198.7344, 0.2147),\n", - " (46251.1875, 0.1887),\n", - " (49350.0742, 0.2194),\n", - " (51289.6836, 0.1991),\n", - " (55245.1992, 0.1981),\n", - " (55679.7070, 0.2276),\n", - " (56383.9102, 0.2439),\n", - " (58956.1797, 0.2907),\n", - " (61536.4570, 0.3067),\n", - " (63768.0469, 0.3242),\n", - " (65581.2109, 0.2973),\n", - " (70183.8516, 0.2575),\n", - " (71787.8203, 0.2321),\n", - " (78343.1094, 0.2158),\n", - " (80016.8047, 0.1694),\n", - " (98141.8516, 0.2400),\n", - " (99262.2344, 0.4335),\n", - " (100985.8516, 0.4375),\n", - " (101933.8516, 0.3427),\n", - " (108656.0312, 0.5339),\n", - " (110896.7500, 0.9537),\n", - " (113137.4844, 1.1668),\n", - " (114430.2031, 1.1164),\n", - " (116929.4844, 0.9161),\n", - " (119428.7422, 0.6885),\n", - " (134506.3438, 0.0692),\n", - " ],\n", - " start=1,\n", - "):\n", - " expt_s2.background.create(id=str(idx), x=x, y=y)" + "expt_s2.background.show()" ] }, { diff --git a/docs/docs/tutorials/ed-20.py b/docs/docs/tutorials/ed-20.py index bc5b1e644..ef8e77eb0 100644 --- a/docs/docs/tutorials/ed-20.py +++ b/docs/docs/tutorials/ed-20.py @@ -146,46 +146,10 @@ expt_s2.background.show_supported() # %% -# expt_s2.background.type = 'line-segment' - -# %% -for idx, (x, y) in enumerate( - [ - (40111.8789, 0.0170), - (41193.5664, 0.1484), - (42041.3750, 0.1848), - (42713.7734, 0.1975), - (44409.3945, 0.1891), - (45198.7344, 0.2147), - (46251.1875, 0.1887), - (49350.0742, 0.2194), - (51289.6836, 0.1991), - (55245.1992, 0.1981), - (55679.7070, 0.2276), - (56383.9102, 0.2439), - (58956.1797, 0.2907), - (61536.4570, 0.3067), - (63768.0469, 0.3242), - (65581.2109, 0.2973), - (70183.8516, 0.2575), - (71787.8203, 0.2321), - (78343.1094, 0.2158), - (80016.8047, 0.1694), - (98141.8516, 0.2400), - (99262.2344, 0.4335), - (100985.8516, 0.4375), - (101933.8516, 0.3427), - (108656.0312, 0.5339), - (110896.7500, 0.9537), - (113137.4844, 1.1668), - (114430.2031, 1.1164), - (116929.4844, 0.9161), - (119428.7422, 0.6885), - (134506.3438, 0.0692), - ], - start=1, -): - expt_s2.background.create(id=str(idx), x=x, y=y) +expt_s2.background.auto_estimate() + +# %% +expt_s2.background.show() # %% for point in expt_s2.background: diff --git a/docs/docs/tutorials/ed-6.ipynb b/docs/docs/tutorials/ed-6.ipynb index e38f74ef4..d3d24bc99 100644 --- a/docs/docs/tutorials/ed-6.ipynb +++ b/docs/docs/tutorials/ed-6.ipynb @@ -271,15 +271,7 @@ "metadata": {}, "outputs": [], "source": [ - "expt.background.create(id='1', x=4.4196, y=500)\n", - "expt.background.create(id='2', x=6.6207, y=500)\n", - "expt.background.create(id='3', x=10.4918, y=500)\n", - "expt.background.create(id='4', x=15.4634, y=500)\n", - "expt.background.create(id='5', x=45.6041, y=500)\n", - "expt.background.create(id='6', x=74.6844, y=500)\n", - "expt.background.create(id='7', x=103.4187, y=500)\n", - "expt.background.create(id='8', x=121.6311, y=500)\n", - "expt.background.create(id='9', x=159.4116, y=500)" + "expt.background.auto_estimate()" ] }, { diff --git a/docs/docs/tutorials/ed-6.py b/docs/docs/tutorials/ed-6.py index 50a3ab363..995ff0a50 100644 --- a/docs/docs/tutorials/ed-6.py +++ b/docs/docs/tutorials/ed-6.py @@ -125,15 +125,7 @@ # ### Set Background # %% -expt.background.create(id='1', x=4.4196, y=500) -expt.background.create(id='2', x=6.6207, y=500) -expt.background.create(id='3', x=10.4918, y=500) -expt.background.create(id='4', x=15.4634, y=500) -expt.background.create(id='5', x=45.6041, y=500) -expt.background.create(id='6', x=74.6844, y=500) -expt.background.create(id='7', x=103.4187, y=500) -expt.background.create(id='8', x=121.6311, y=500) -expt.background.create(id='9', x=159.4116, y=500) +expt.background.auto_estimate() # %% [markdown] # ### Set Linked Phases diff --git a/docs/docs/user-guide/analysis-workflow/analysis.md b/docs/docs/user-guide/analysis-workflow/analysis.md index c7035a985..3819765b7 100644 --- a/docs/docs/user-guide/analysis-workflow/analysis.md +++ b/docs/docs/user-guide/analysis-workflow/analysis.md @@ -271,6 +271,27 @@ To plot the measured and calculated data after the fit, you can use the project.display.pattern(expt_name='hrpt') ``` +### Re-estimating the Background + +If you seeded the background automatically (see the +[Background Category](experiment.md#background-category) section), you +can improve it once a fit has produced a model. Calling `auto_estimate` +again after a fit automatically uses the fitted peaks (the default +`use_model=True`) to place better points, especially across crowded +regions where peaks overlap: + +```python +# Re-estimate the background using the fitted model +project.experiments['hrpt'].background.auto_estimate() + +# Optionally free some of the new (fixed) points and fit again +project.experiments['hrpt'].background['1'].y.free = True +project.analysis.fit() +``` + +This estimate → refine → re-estimate loop is safe to repeat: each call +overwrites the previous points with a fresh, fixed background. + ## Bayesian Analysis Bayesian minimizers sample a posterior distribution rather than only diff --git a/docs/docs/user-guide/analysis-workflow/experiment.md b/docs/docs/user-guide/analysis-workflow/experiment.md index 043d4c58e..61dddd0d9 100644 --- a/docs/docs/user-guide/analysis-workflow/experiment.md +++ b/docs/docs/user-guide/analysis-workflow/experiment.md @@ -227,6 +227,36 @@ project.experiments['hrpt'].background.create(x=110, y=170) project.experiments['hrpt'].background.create(x=165, y=170) ``` +Instead of placing every point by hand, you can let EasyDiffraction +detect a sensible set of background points directly from the measured +pattern with the `auto_estimate` method. Called with no arguments, it +builds a peak-insensitive background curve, places points between the +peaks, and reads their heights from that curve so they do not eat into +peak intensities: + +```python +# Automatically estimate background points from the measured pattern +project.experiments['hrpt'].background.auto_estimate() +``` + +The generated points are ordinary, editable control points. They are +created **fixed** (not refined); you can review them, keep them, or free +any of them for refinement (see [Analysis](analysis.md)). Each call +**overwrites** the existing points (when there are active data to +estimate from), so you always start from a clean, reproducible +background; if no active data remain — for example every point is +excluded, or data are not yet loaded — it warns and leaves your existing +points unchanged. It works for both constant-wavelength and +time-of-flight data, neutron and X-ray. + +You can also guide the estimate with optional arguments, for example to +cap the number of points or choose a specific method: + +```python +# Estimate with at most 10 background points +project.experiments['hrpt'].background.auto_estimate(n_points=10) +``` + ### 5. Linked Phases Category { #linked-phases-category } ```python diff --git a/docs/docs/user-guide/parameters/background.md b/docs/docs/user-guide/parameters/background.md index e307c981b..45e890b0e 100644 --- a/docs/docs/user-guide/parameters/background.md +++ b/docs/docs/user-guide/parameters/background.md @@ -7,6 +7,13 @@ when calculating diffractograms. Please see the [IUCr page](https://www.iucr.org/resources/cif/dictionaries/browse/cif_pd) for further details. +!!! tip "Automatic background estimation" + + Line-segment background points can be detected automatically from the + measured pattern with + [`background.auto_estimate()`](../analysis-workflow/experiment.md#background-category), + instead of entering them by hand. + ## [\_pd_background.line_segment_X](https://www.iucr.org/resources/cif/dictionaries/browse/cif_pd) List of X-coordinates used to create many straight-line segments diff --git a/pixi.lock b/pixi.lock index d4ab13b04..59f1ad7ed 100644 --- a/pixi.lock +++ b/pixi.lock @@ -241,6 +241,7 @@ environments: - pypi: https://files.pythonhosted.org/packages/38/8b/7ec325b4e9e78beefc2d025b01ee8a2fde771ef7c957c3bff99b9e1fbffa/xraydb-4.5.8-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3a/eb/fea4d1d51c49832120f7f285d07306db3960f423a2612c6057caf3e8196f/pip-26.1.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3c/26/1062c7ec1b053db9e499b4d2d5bc231743201b74051c973dadeac80a8f43/questionary-2.1.1-py3-none-any.whl + - pypi: https://files.pythonhosted.org/packages/3d/66/f044d53935b142d47ce2a65b8c4f51fdb5ca85ee1035fb2b7857971b122e/pybaselines-1.2.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3e/17/1f31d8562e6f970d64911f1abc330d233bc0c0601411cf7e19c1292be6da/spdx_headers-1.5.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3e/5c/fb93d3092640a24dfb7bd7727a24016d7c01774ca013e60efd3f683c8002/backrefs-7.0-py314-none-any.whl - pypi: https://files.pythonhosted.org/packages/3f/d0/7b958df957e4827837b590944008f0b28078f552b451f7407b4b3d54f574/asciichartpy-1.5.25-py2.py3-none-any.whl @@ -560,6 +561,7 @@ environments: - pypi: https://files.pythonhosted.org/packages/38/8b/7ec325b4e9e78beefc2d025b01ee8a2fde771ef7c957c3bff99b9e1fbffa/xraydb-4.5.8-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3a/eb/fea4d1d51c49832120f7f285d07306db3960f423a2612c6057caf3e8196f/pip-26.1.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3c/26/1062c7ec1b053db9e499b4d2d5bc231743201b74051c973dadeac80a8f43/questionary-2.1.1-py3-none-any.whl + - pypi: https://files.pythonhosted.org/packages/3d/66/f044d53935b142d47ce2a65b8c4f51fdb5ca85ee1035fb2b7857971b122e/pybaselines-1.2.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3e/17/1f31d8562e6f970d64911f1abc330d233bc0c0601411cf7e19c1292be6da/spdx_headers-1.5.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3e/5c/fb93d3092640a24dfb7bd7727a24016d7c01774ca013e60efd3f683c8002/backrefs-7.0-py314-none-any.whl - pypi: https://files.pythonhosted.org/packages/3f/d0/7b958df957e4827837b590944008f0b28078f552b451f7407b4b3d54f574/asciichartpy-1.5.25-py2.py3-none-any.whl @@ -875,6 +877,7 @@ environments: - pypi: https://files.pythonhosted.org/packages/38/8b/7ec325b4e9e78beefc2d025b01ee8a2fde771ef7c957c3bff99b9e1fbffa/xraydb-4.5.8-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3a/eb/fea4d1d51c49832120f7f285d07306db3960f423a2612c6057caf3e8196f/pip-26.1.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3c/26/1062c7ec1b053db9e499b4d2d5bc231743201b74051c973dadeac80a8f43/questionary-2.1.1-py3-none-any.whl + - pypi: https://files.pythonhosted.org/packages/3d/66/f044d53935b142d47ce2a65b8c4f51fdb5ca85ee1035fb2b7857971b122e/pybaselines-1.2.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3e/14/615a450205e1b56d16c6783f5ccd116cde05550faad70ae077c955654a75/h5py-3.16.0-cp314-cp314-win_amd64.whl - pypi: https://files.pythonhosted.org/packages/3e/17/1f31d8562e6f970d64911f1abc330d233bc0c0601411cf7e19c1292be6da/spdx_headers-1.5.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3e/5c/fb93d3092640a24dfb7bd7727a24016d7c01774ca013e60efd3f683c8002/backrefs-7.0-py314-none-any.whl @@ -1218,6 +1221,7 @@ environments: - pypi: https://files.pythonhosted.org/packages/38/8b/7ec325b4e9e78beefc2d025b01ee8a2fde771ef7c957c3bff99b9e1fbffa/xraydb-4.5.8-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3a/eb/fea4d1d51c49832120f7f285d07306db3960f423a2612c6057caf3e8196f/pip-26.1.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3c/26/1062c7ec1b053db9e499b4d2d5bc231743201b74051c973dadeac80a8f43/questionary-2.1.1-py3-none-any.whl + - pypi: https://files.pythonhosted.org/packages/3d/66/f044d53935b142d47ce2a65b8c4f51fdb5ca85ee1035fb2b7857971b122e/pybaselines-1.2.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3e/17/1f31d8562e6f970d64911f1abc330d233bc0c0601411cf7e19c1292be6da/spdx_headers-1.5.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3f/d0/7b958df957e4827837b590944008f0b28078f552b451f7407b4b3d54f574/asciichartpy-1.5.25-py2.py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3f/f9/2b3ff4e56e5fa7debfaf9eb135d0da96f3e9a1d5b27222223c7296336e5f/typer-0.25.1-py3-none-any.whl @@ -1547,6 +1551,7 @@ environments: - pypi: https://files.pythonhosted.org/packages/38/8b/7ec325b4e9e78beefc2d025b01ee8a2fde771ef7c957c3bff99b9e1fbffa/xraydb-4.5.8-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3a/eb/fea4d1d51c49832120f7f285d07306db3960f423a2612c6057caf3e8196f/pip-26.1.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3c/26/1062c7ec1b053db9e499b4d2d5bc231743201b74051c973dadeac80a8f43/questionary-2.1.1-py3-none-any.whl + - pypi: https://files.pythonhosted.org/packages/3d/66/f044d53935b142d47ce2a65b8c4f51fdb5ca85ee1035fb2b7857971b122e/pybaselines-1.2.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3e/17/1f31d8562e6f970d64911f1abc330d233bc0c0601411cf7e19c1292be6da/spdx_headers-1.5.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3f/d0/7b958df957e4827837b590944008f0b28078f552b451f7407b4b3d54f574/asciichartpy-1.5.25-py2.py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3f/f9/2b3ff4e56e5fa7debfaf9eb135d0da96f3e9a1d5b27222223c7296336e5f/typer-0.25.1-py3-none-any.whl @@ -1859,6 +1864,7 @@ environments: - pypi: https://files.pythonhosted.org/packages/38/8b/7ec325b4e9e78beefc2d025b01ee8a2fde771ef7c957c3bff99b9e1fbffa/xraydb-4.5.8-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3a/eb/fea4d1d51c49832120f7f285d07306db3960f423a2612c6057caf3e8196f/pip-26.1.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3c/26/1062c7ec1b053db9e499b4d2d5bc231743201b74051c973dadeac80a8f43/questionary-2.1.1-py3-none-any.whl + - pypi: https://files.pythonhosted.org/packages/3d/66/f044d53935b142d47ce2a65b8c4f51fdb5ca85ee1035fb2b7857971b122e/pybaselines-1.2.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3e/17/1f31d8562e6f970d64911f1abc330d233bc0c0601411cf7e19c1292be6da/spdx_headers-1.5.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3f/d0/7b958df957e4827837b590944008f0b28078f552b451f7407b4b3d54f574/asciichartpy-1.5.25-py2.py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3f/f9/2b3ff4e56e5fa7debfaf9eb135d0da96f3e9a1d5b27222223c7296336e5f/typer-0.25.1-py3-none-any.whl @@ -2200,6 +2206,7 @@ environments: - pypi: https://files.pythonhosted.org/packages/38/8b/7ec325b4e9e78beefc2d025b01ee8a2fde771ef7c957c3bff99b9e1fbffa/xraydb-4.5.8-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3a/eb/fea4d1d51c49832120f7f285d07306db3960f423a2612c6057caf3e8196f/pip-26.1.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3c/26/1062c7ec1b053db9e499b4d2d5bc231743201b74051c973dadeac80a8f43/questionary-2.1.1-py3-none-any.whl + - pypi: https://files.pythonhosted.org/packages/3d/66/f044d53935b142d47ce2a65b8c4f51fdb5ca85ee1035fb2b7857971b122e/pybaselines-1.2.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3e/17/1f31d8562e6f970d64911f1abc330d233bc0c0601411cf7e19c1292be6da/spdx_headers-1.5.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3e/5c/fb93d3092640a24dfb7bd7727a24016d7c01774ca013e60efd3f683c8002/backrefs-7.0-py314-none-any.whl - pypi: https://files.pythonhosted.org/packages/3f/d0/7b958df957e4827837b590944008f0b28078f552b451f7407b4b3d54f574/asciichartpy-1.5.25-py2.py3-none-any.whl @@ -2519,6 +2526,7 @@ environments: - pypi: https://files.pythonhosted.org/packages/38/8b/7ec325b4e9e78beefc2d025b01ee8a2fde771ef7c957c3bff99b9e1fbffa/xraydb-4.5.8-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3a/eb/fea4d1d51c49832120f7f285d07306db3960f423a2612c6057caf3e8196f/pip-26.1.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3c/26/1062c7ec1b053db9e499b4d2d5bc231743201b74051c973dadeac80a8f43/questionary-2.1.1-py3-none-any.whl + - pypi: https://files.pythonhosted.org/packages/3d/66/f044d53935b142d47ce2a65b8c4f51fdb5ca85ee1035fb2b7857971b122e/pybaselines-1.2.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3e/17/1f31d8562e6f970d64911f1abc330d233bc0c0601411cf7e19c1292be6da/spdx_headers-1.5.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3e/5c/fb93d3092640a24dfb7bd7727a24016d7c01774ca013e60efd3f683c8002/backrefs-7.0-py314-none-any.whl - pypi: https://files.pythonhosted.org/packages/3f/d0/7b958df957e4827837b590944008f0b28078f552b451f7407b4b3d54f574/asciichartpy-1.5.25-py2.py3-none-any.whl @@ -2834,6 +2842,7 @@ environments: - pypi: https://files.pythonhosted.org/packages/38/8b/7ec325b4e9e78beefc2d025b01ee8a2fde771ef7c957c3bff99b9e1fbffa/xraydb-4.5.8-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3a/eb/fea4d1d51c49832120f7f285d07306db3960f423a2612c6057caf3e8196f/pip-26.1.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3c/26/1062c7ec1b053db9e499b4d2d5bc231743201b74051c973dadeac80a8f43/questionary-2.1.1-py3-none-any.whl + - pypi: https://files.pythonhosted.org/packages/3d/66/f044d53935b142d47ce2a65b8c4f51fdb5ca85ee1035fb2b7857971b122e/pybaselines-1.2.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3e/14/615a450205e1b56d16c6783f5ccd116cde05550faad70ae077c955654a75/h5py-3.16.0-cp314-cp314-win_amd64.whl - pypi: https://files.pythonhosted.org/packages/3e/17/1f31d8562e6f970d64911f1abc330d233bc0c0601411cf7e19c1292be6da/spdx_headers-1.5.1-py3-none-any.whl - pypi: https://files.pythonhosted.org/packages/3e/5c/fb93d3092640a24dfb7bd7727a24016d7c01774ca013e60efd3f683c8002/backrefs-7.0-py314-none-any.whl @@ -8757,6 +8766,7 @@ packages: - pillow - plotly - pooch + - pybaselines>=1.1 - rich - scipy - sympy @@ -9921,6 +9931,40 @@ packages: requires_dist: - prompt-toolkit>=2.0,<4.0 requires_python: '>=3.9' +- pypi: https://files.pythonhosted.org/packages/3d/66/f044d53935b142d47ce2a65b8c4f51fdb5ca85ee1035fb2b7857971b122e/pybaselines-1.2.1-py3-none-any.whl + name: pybaselines + version: 1.2.1 + sha256: d8f224a0b5ac4cdcef861bc60533131c37255b4d1193f18a410bc37fe5217c73 + requires_dist: + - numpy>=1.20 + - scipy>=1.6 + - build ; extra == 'dev' + - bump-my-version ; extra == 'dev' + - matplotlib ; extra == 'dev' + - numba>=0.53 ; extra == 'dev' + - numpydoc ; extra == 'dev' + - pentapy>=1.1 ; extra == 'dev' + - pytest>=6.0 ; extra == 'dev' + - ruff ; extra == 'dev' + - sphinx ; extra == 'dev' + - sphinx-copybutton ; extra == 'dev' + - sphinx-gallery>=0.16 ; extra == 'dev' + - sphinx-rtd-theme ; extra == 'dev' + - twine ; extra == 'dev' + - matplotlib ; extra == 'docs' + - numpydoc ; extra == 'docs' + - sphinx ; extra == 'docs' + - sphinx-copybutton ; extra == 'docs' + - sphinx-gallery>=0.16 ; extra == 'docs' + - sphinx-rtd-theme ; extra == 'docs' + - numba>=0.53 ; extra == 'full' + - pentapy>=1.1 ; extra == 'full' + - build ; extra == 'release' + - bump-my-version ; extra == 'release' + - twine ; extra == 'release' + - pytest>=6.0 ; extra == 'test' + - ruff ; extra == 'test' + requires_python: '>=3.9' - pypi: https://files.pythonhosted.org/packages/3e/14/615a450205e1b56d16c6783f5ccd116cde05550faad70ae077c955654a75/h5py-3.16.0-cp314-cp314-win_amd64.whl name: h5py version: 3.16.0 diff --git a/pyproject.toml b/pyproject.toml index 6e5b27250..89e23d7bd 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -23,31 +23,32 @@ classifiers = [ ] requires-python = '>=3.12' dependencies = [ - 'numpy', # Numerical computing library - 'asciichartpy', # ASCII charts for terminal output - 'pooch', # Data downloader - 'typer', # Command-line interface creation - 'rich', # Rich text and beautiful formatting in the terminal - 'varname', # Variable name introspection - 'asteval', # An expression evaluator for Python - 'scipy', # Scientific computing library - 'sympy', # Symbolic mathematics library - 'lmfit', # Non-linear optimization and curve fitting - 'bumps', # Non-linear optimization and curve fitting - 'emcee', # Affine-invariant MCMC sampler - 'dfo-ls', # Non-linear optimization and curve fitting - 'gemmi', # Crystallography library - 'cryspy', # Calculations of diffraction patterns - 'crysfml', # Calculations of diffraction patterns - 'diffpy.pdffit2', # Calculations of Pair Distribution Function (PDF) - 'diffpy.utils', # Utilities for PDF calculations - 'uncertainties', # Propagation of uncertainties - 'h5py', # HDF5 file handling - 'typeguard', # Runtime type checking - 'darkdetect', # Detecting dark mode (system-level) - 'pandas', # Displaying tables in Jupyter notebooks - 'plotly', # Interactive plots - 'pillow', # Rendering structure figures (labels, legend) for reports + 'numpy', # Numerical computing library + 'asciichartpy', # ASCII charts for terminal output + 'pooch', # Data downloader + 'typer', # Command-line interface creation + 'rich', # Rich text and beautiful formatting in the terminal + 'varname', # Variable name introspection + 'asteval', # An expression evaluator for Python + 'scipy', # Scientific computing library + 'sympy', # Symbolic mathematics library + 'lmfit', # Non-linear optimization and curve fitting + 'bumps', # Non-linear optimization and curve fitting + 'emcee', # Affine-invariant MCMC sampler + 'dfo-ls', # Non-linear optimization and curve fitting + 'gemmi', # Crystallography library + 'cryspy', # Calculations of diffraction patterns + 'crysfml', # Calculations of diffraction patterns + 'diffpy.pdffit2', # Calculations of Pair Distribution Function (PDF) + 'diffpy.utils', # Utilities for PDF calculations + 'uncertainties', # Propagation of uncertainties + 'h5py', # HDF5 file handling + 'typeguard', # Runtime type checking + 'darkdetect', # Detecting dark mode (system-level) + 'pandas', # Displaying tables in Jupyter notebooks + 'plotly', # Interactive plots + 'pillow', # Rendering structure figures (labels, legend) for reports + 'pybaselines>=1.1', # Background curve estimation backend (SNIP, arPLS, fabc) ] [project.optional-dependencies] diff --git a/src/easydiffraction/core/category.py b/src/easydiffraction/core/category.py index 93afa631f..53b51d9b9 100644 --- a/src/easydiffraction/core/category.py +++ b/src/easydiffraction/core/category.py @@ -269,3 +269,13 @@ def create(self, **kwargs: object) -> None: setattr(child_obj, attr, val) self.add(child_obj) + + def clear(self) -> None: + """ + Remove every item, then mark the parent datablock dirty. + + Layers dirty-marking on :meth:`CollectionBase.clear`, mirroring + how :meth:`add` layers it on the base insert. + """ + super().clear() + self._mark_parent_dirty() diff --git a/src/easydiffraction/core/collection.py b/src/easydiffraction/core/collection.py index 625ff6610..677af2d39 100644 --- a/src/easydiffraction/core/collection.py +++ b/src/easydiffraction/core/collection.py @@ -126,6 +126,16 @@ def remove(self, name: str) -> None: """ del self[name] + def clear(self) -> None: + """ + Remove every item, unlinking each from this collection. + + Delegates to :meth:`_adopt_items` with an empty list: every + child has ``_parent`` cleared, ``_items`` is emptied, and the + index is rebuilt, matching the invariants ``__delitem__`` keeps. + """ + self._adopt_items([]) + def _key_for(self, item: GuardedBase) -> str | None: # noqa: PLR6301 """ Return the identity key for *item*. diff --git a/src/easydiffraction/datablocks/experiment/categories/background/enums.py b/src/easydiffraction/datablocks/experiment/categories/background/enums.py index c0e325a73..de3e0e13d 100644 --- a/src/easydiffraction/datablocks/experiment/categories/background/enums.py +++ b/src/easydiffraction/datablocks/experiment/categories/background/enums.py @@ -1,6 +1,6 @@ # SPDX-FileCopyrightText: 2026 EasyScience contributors # SPDX-License-Identifier: BSD-3-Clause -"""Enumerations for background model types.""" +"""Enumerations for background model types and estimation methods.""" from __future__ import annotations @@ -26,3 +26,29 @@ def description(self) -> str: if self is BackgroundTypeEnum.CHEBYSHEV: return 'Chebyshev polynomial background' return None + + +class BackgroundEstimatorMethodEnum(StrEnum): + """Supported automatic background-estimation methods.""" + + AUTO = 'auto' + SNIP = 'snip' + ARPLS = 'arpls' + FABC = 'fabc' + + @classmethod + def default(cls) -> BackgroundEstimatorMethodEnum: + """Return the default estimation method.""" + return cls.AUTO + + def description(self) -> str: + """Human-friendly description for the enum value.""" + if self is BackgroundEstimatorMethodEnum.AUTO: + return 'Let the library choose (currently arPLS)' + if self is BackgroundEstimatorMethodEnum.SNIP: + return 'SNIP iterative peak-clipping baseline' + if self is BackgroundEstimatorMethodEnum.ARPLS: + return 'Asymmetrically reweighted penalized least squares' + if self is BackgroundEstimatorMethodEnum.FABC: + return 'Fully automatic baseline correction (classification)' + return None diff --git a/src/easydiffraction/datablocks/experiment/categories/background/estimate.py b/src/easydiffraction/datablocks/experiment/categories/background/estimate.py new file mode 100644 index 000000000..bf20b271e --- /dev/null +++ b/src/easydiffraction/datablocks/experiment/categories/background/estimate.py @@ -0,0 +1,548 @@ +# SPDX-FileCopyrightText: 2026 EasyScience contributors +# SPDX-License-Identifier: BSD-3-Clause +""" +Automatic background-curve estimation for powder patterns. + +Pure, array-in / array-out helpers with no experiment-model state. The +estimator runs in two stages (see the ``background-auto-estimate`` ADR): + +* Stage 1 builds a peak-insensitive background curve ``B(x)`` over the + whole grid using :mod:`pybaselines`. +* Stage 2 thins ``B(x)`` to a sparse set of ``(x, intensity)`` anchors + with a vertical Ramer-Douglas-Peucker simplification, keeping the + endpoints and never placing a non-endpoint anchor on a peak. + +All per-dataset parameters (peak width, noise, smoothing penalty) are +derived from the data so a bare call needs no tuning. The numeric +constants below are first cuts; they are calibrated against the tutorial +corpus in Phase 2. +""" + +from __future__ import annotations + +from dataclasses import dataclass + +import numpy as np +from pybaselines import Baseline +from scipy.signal import find_peaks +from scipy.signal import peak_widths + +from easydiffraction.utils.logging import log + +# Stage-2 RDP tolerance as a multiple of the noise sigma (c in c*sigma). +_NOISE_TOLERANCE_FACTOR = 2.0 +# Robust upper percentile of measured peak widths used as the window. +_WIDTH_PERCENTILE = 75.0 +# Peak prominence threshold for find_peaks, in units of noise sigma. +_PEAK_PROMINENCE_FACTOR = 3.0 +# Relative height at which peak widths are measured (FWHM). +_PEAK_WIDTH_REL_HEIGHT = 0.5 +# snip max_half_window as a multiple of the peak width W. +_SNIP_WINDOW_FACTOR = 1.0 +# fabc min_length as a multiple of the peak width W. +_FABC_MIN_LENGTH_FACTOR = 1.0 +# Second-difference noise inflation: var(diff2) = 6 * var(noise). +_SECOND_DIFF_SCALE = 6.0**0.5 +# MAD-to-sigma scaling for a normal distribution. +_MAD_TO_SIGMA = 1.4826 +# Floor for the derived Whittaker penalty. +_LAM_FLOOR = 1.0e2 +# Smallest pattern (in points) the estimator can work on. +_MIN_POINTS = 5 +# Fallback peak width (in points) when no peaks can be detected. +_FALLBACK_WIDTH = 10.0 +# Geometric growth of the RDP tolerance when capping the anchor count. +_TOLERANCE_GROWTH = 1.3 +# Maximum tolerance-growth iterations when enforcing ``n_points``. +_MAX_CAP_ITERATIONS = 20 + + +@dataclass(frozen=True) +class BackgroundEstimate: + """ + Result of a background-curve estimation. + + Attributes + ---------- + curve : np.ndarray + Dense peak-insensitive background ``B(x)`` over the input grid. + anchors : np.ndarray + Thinned control points with shape ``(n_anchors, 2)`` whose rows + are ``(x, intensity)``; heights are read from ``curve``. + method : str + Resolved Stage-1 method actually run (``snip``/``arpls``/ + ``fabc``). + width : float + Effective peak width ``W`` in points (supplied, derived, or the + degenerate-input fallback). + noise : float + Robust noise estimate ``sigma`` from the second difference. + tolerance : float + Stage-2 RDP tolerance actually used (``c * sigma``). + backend_params : dict[str, float] + Parameters handed to the :mod:`pybaselines` routine. + """ + + curve: np.ndarray + anchors: np.ndarray + method: str + width: float + noise: float + tolerance: float + backend_params: dict[str, float] + + +def _robust_noise(y: np.ndarray) -> float: + """ + Estimate the noise standard deviation, insensitive to peaks. + + Uses the median absolute deviation (MAD) of the second difference of + the intensities; the second difference suppresses the smooth + background and most peak signal, leaving noise. + + Parameters + ---------- + y : np.ndarray + Intensities over the grid. + + Returns + ------- + float + Estimated noise sigma; ``0.0`` for flat input. + """ + diff2 = np.diff(y, n=2) + mad = float(np.median(np.abs(diff2 - np.median(diff2)))) + return _MAD_TO_SIGMA * mad / _SECOND_DIFF_SCALE + + +def _measure_width(y: np.ndarray, sigma: float) -> tuple[float, np.ndarray]: + """ + Measure a robust peak width and the prominent peak positions. + + Peaks are found with a prominence threshold relative to the noise + and their full-width-at-half-maximum is summarised by a high + percentile, so the window clears the broadest (e.g. high-angle CWL) + peaks. + + Parameters + ---------- + y : np.ndarray + Intensities over the grid. + sigma : float + Noise estimate used for the prominence threshold. + + Returns + ------- + width : float + Robust peak width in points; the fallback when no peaks are + found. + peaks : np.ndarray + Indices of the detected peaks (possibly empty). + """ + prominence = _PEAK_PROMINENCE_FACTOR * sigma if sigma > 0 else None + peaks, _ = find_peaks(y, prominence=prominence) + if not peaks.size: + return _FALLBACK_WIDTH, peaks + widths = peak_widths(y, peaks, rel_height=_PEAK_WIDTH_REL_HEIGHT)[0] + width = float(np.percentile(widths, _WIDTH_PERCENTILE)) + return max(width, 1.0), peaks + + +def _forbidden_from_peaks(n: int, peaks: np.ndarray, width: float) -> np.ndarray: + """ + Build a peak-region mask from detected peak positions. + + Each peak is widened by ``+/- width`` points; Stage 2 must not place + a non-endpoint anchor on a masked sample. + + Parameters + ---------- + n : int + Number of grid points. + peaks : np.ndarray + Indices of detected peaks. + width : float + Half-width (in points) masked around each peak. + + Returns + ------- + np.ndarray + Boolean mask of length ``n``; ``True`` marks a peak region. + """ + mask = np.zeros(n, dtype=bool) + half = int(np.ceil(width)) + for peak in peaks: + lo = max(0, int(peak) - half) + hi = min(n, int(peak) + half + 1) + mask[lo:hi] = True + return mask + + +def _derive_lam(n: int, width: float) -> float: + """ + Derive a Whittaker smoothing penalty for arPLS/fabc. + + The penalty grows with the grid size and the peak width so the + baseline stays smooth under broad features. The scaling is a + monotonic first cut; the constant is calibrated in Phase 2. + + Parameters + ---------- + n : int + Number of grid points. + width : float + Peak width in points. + + Returns + ------- + float + The ``lam`` penalty passed to the backend. + """ + return float(max(_LAM_FLOOR, n * max(width, 1.0))) + + +def _stage1_baseline( + x: np.ndarray, + y: np.ndarray, + method: str, + width: float, + smoothness: float | None, +) -> tuple[np.ndarray, dict[str, float]]: + """ + Compute the Stage-1 background curve via pybaselines. + + Dispatches to the resolved ``method`` and maps the derived width and + optional smoothness onto the backend parameters (the plan's backend + dispatch contract). + + Parameters + ---------- + x : np.ndarray + Grid coordinates. + y : np.ndarray + Intensities (data-only or peak-subtracted) to baseline. + method : str + Resolved method: ``snip``, ``arpls`` or ``fabc``. + width : float + Peak width in points. + smoothness : float | None + Optional Whittaker penalty override. + + Returns + ------- + curve : np.ndarray + Estimated background over the grid. + backend_params : dict[str, float] + Parameters passed to the backend. + + Raises + ------ + ValueError + If ``method`` is not a supported Stage-1 routine. + """ + fitter = Baseline(x_data=x) + if method == 'arpls': + lam = smoothness if smoothness is not None else _derive_lam(y.size, width) + curve, _ = fitter.arpls(y, lam=lam) + return curve, {'lam': float(lam)} + if method == 'snip': + if smoothness is not None: + log.warning("Method 'snip' ignores the 'smoothness' parameter.") + max_half_window = int(np.ceil(_SNIP_WINDOW_FACTOR * width)) + curve, _ = fitter.snip(y, max_half_window=max_half_window) + return curve, {'max_half_window': float(max_half_window)} + if method == 'fabc': + lam = smoothness if smoothness is not None else _derive_lam(y.size, width) + scale = int(np.ceil(width)) + min_length = int(np.ceil(_FABC_MIN_LENGTH_FACTOR * width)) + curve, _ = fitter.fabc(y, lam=lam, scale=scale, min_length=min_length) + return curve, {'lam': float(lam), 'scale': float(scale), 'min_length': float(min_length)} + msg = f'Unsupported Stage-1 background method: {method!r}' + raise ValueError(msg) + + +def _rdp_indices(x: np.ndarray, curve: np.ndarray, epsilon: float) -> np.ndarray: + """ + Vertical Ramer-Douglas-Peucker simplification of a curve. + + Returns the indices of the points to keep so that every dropped + point lies within ``epsilon`` (in intensity units) of the + piecewise-linear interpolation through the kept points. The + endpoints are always kept. + + Parameters + ---------- + x : np.ndarray + Monotonic grid coordinates. + curve : np.ndarray + Curve values to simplify. + epsilon : float + Maximum allowed vertical deviation. + + Returns + ------- + np.ndarray + Sorted indices of the retained points. + """ + n = x.size + keep = np.zeros(n, dtype=bool) + keep[0] = True + keep[-1] = True + stack = [(0, n - 1)] + while stack: + start, end = stack.pop() + if end <= start + 1: + continue + span = x[end] - x[start] + if span <= 0: + continue + segment = slice(start, end + 1) + line = curve[start] + (curve[end] - curve[start]) * (x[segment] - x[start]) / span + deviation = np.abs(curve[segment] - line) + deviation[0] = 0.0 + deviation[-1] = 0.0 + local = int(np.argmax(deviation)) + if deviation[local] > epsilon: + index = start + local + keep[index] = True + stack.extend(((start, index), (index, end))) + return np.flatnonzero(keep) + + +def _drop_forbidden(indices: np.ndarray, forbidden: np.ndarray, n: int) -> np.ndarray: + """ + Drop non-endpoint anchors that fall on a forbidden (peak) sample. + + Parameters + ---------- + indices : np.ndarray + Candidate anchor indices (sorted, includes the endpoints). + forbidden : np.ndarray + Boolean peak-region mask. + n : int + Number of grid points, used to identify the endpoints. + + Returns + ------- + np.ndarray + Filtered indices, always retaining ``0`` and ``n - 1``. + """ + endpoints = {0, n - 1} + kept = [int(i) for i in indices if int(i) in endpoints or not forbidden[i]] + return np.array(sorted(set(kept)), dtype=int) + + +def _cap_by_deviation( + x: np.ndarray, + curve: np.ndarray, + indices: np.ndarray, + n_points: int, +) -> np.ndarray: + """ + Reduce anchors to ``n_points``, keeping the endpoints. + + The two endpoints are always retained; the remaining slots go to the + interior anchors that deviate most from the straight chord between + them. Guarantees the cap even when the RDP tolerance cannot reduce + the count (e.g. zero-noise data, where the tolerance stays zero). + + Parameters + ---------- + x : np.ndarray + Grid coordinates. + curve : np.ndarray + Background curve. + indices : np.ndarray + Candidate anchor indices (sorted, includes the endpoints). + n_points : int + Target maximum number of anchors (``>= 2``). + + Returns + ------- + np.ndarray + ``min(indices.size, n_points)`` sorted indices. + """ + if indices.size <= n_points: + return indices + first = indices[0] + last = indices[-1] + interior = indices[1:-1] + keep_count = max(n_points - 2, 0) + span = x[last] - x[first] + if span <= 0 or keep_count == 0: + chosen = interior[:keep_count] + else: + line = curve[first] + (curve[last] - curve[first]) * (x[interior] - x[first]) / span + deviation = np.abs(curve[interior] - line) + start = interior.size - keep_count + chosen = interior[np.sort(np.argsort(deviation)[start:])] + return np.concatenate(([first], chosen, [last])) + + +def _thin_to_anchors( + x: np.ndarray, + curve: np.ndarray, + epsilon: float, + forbidden: np.ndarray, + n_points: int | None, +) -> np.ndarray: + """ + Select anchor indices: RDP, drop peak-region anchors, cap the count. + + When more than ``n_points`` anchors survive, the RDP tolerance is + grown geometrically and re-run until the count fits; if that cannot + reduce it (e.g. zero noise), a deviation-based cap guarantees the + bound. The endpoints are always retained. + + Parameters + ---------- + x : np.ndarray + Grid coordinates. + curve : np.ndarray + Background curve to thin. + epsilon : float + RDP tolerance (intensity units). + forbidden : np.ndarray + Boolean peak-region mask; non-endpoint anchors here are dropped. + n_points : int | None + Optional maximum number of anchors (endpoints included). + + Returns + ------- + np.ndarray + Sorted anchor indices, always including the two endpoints. + """ + indices = _drop_forbidden(_rdp_indices(x, curve, epsilon), forbidden, x.size) + if n_points is None or indices.size <= n_points: + return indices + tolerance = epsilon + for _ in range(_MAX_CAP_ITERATIONS): + if tolerance <= 0 or indices.size <= n_points: + break + tolerance *= _TOLERANCE_GROWTH + indices = _drop_forbidden(_rdp_indices(x, curve, tolerance), forbidden, x.size) + if indices.size > n_points: + indices = _cap_by_deviation(x, curve, indices, n_points) + return indices + + +def _flat_estimate( + x: np.ndarray, + y: np.ndarray, + method: str, + width: float | None, + noise: float, +) -> BackgroundEstimate: + """ + Build a trivial flat-background estimate for degenerate input. + + Parameters + ---------- + x : np.ndarray + Grid coordinates. + y : np.ndarray + Intensities. + method : str + Resolved method (recorded for the summary). + width : float | None + Supplied width, if any. + noise : float + Noise estimate. + + Returns + ------- + BackgroundEstimate + A flat curve at the data minimum with two endpoint anchors. + """ + level = float(np.min(y)) if y.size else 0.0 + curve = np.full(x.size, level) + anchors = np.array([[x[0], level], [x[-1], level]]) if x.size else np.empty((0, 2)) + return BackgroundEstimate( + curve=curve, + anchors=anchors, + method=method, + width=float(width) if width is not None else _FALLBACK_WIDTH, + noise=noise, + tolerance=_NOISE_TOLERANCE_FACTOR * noise, + backend_params={}, + ) + + +def estimate_background_curve( + x: np.ndarray, + y: np.ndarray, + *, + method: str = 'arpls', + peaks: np.ndarray | None = None, + width: float | None = None, + smoothness: float | None = None, + n_points: int | None = None, +) -> BackgroundEstimate: + """ + Estimate background control points from a measured pattern. + + Stage 1 builds a peak-insensitive curve ``B(x)`` with the resolved + ``method``; Stage 2 thins it to sparse anchors. Every per-dataset + parameter defaults to a data-derived value, so a bare call works. + + Parameters + ---------- + x : np.ndarray + Grid coordinates (e.g. 2theta or time-of-flight), monotonic. + y : np.ndarray + Intensities to baseline: the measured pattern (data-only) or the + peak-subtracted measured pattern (model-guided). + method : str, default='arpls' + Resolved Stage-1 routine: ``arpls`` (default), ``snip`` or + ``fabc``. ``auto`` is resolved by the caller, never here. + peaks : np.ndarray | None, default=None + Boolean mask aligned with ``x``; ``True`` forbids a non-endpoint + anchor. When ``None`` the mask is derived from ``y`` itself. + width : float | None, default=None + Peak width in points; derived from ``y`` when ``None``. + smoothness : float | None, default=None + Whittaker penalty override for ``arpls``/``fabc``; ignored by + ``snip``. + n_points : int | None, default=None + Maximum number of anchors (endpoints included); uncapped when + ``None``. + + Returns + ------- + BackgroundEstimate + The curve, anchors, and metadata describing the run. + """ + x = np.asarray(x, dtype=float) + y = np.asarray(y, dtype=float) + noise = _robust_noise(y) + + if y.size < _MIN_POINTS: + log.warning('Pattern too short to estimate a background; returning a flat one.') + return _flat_estimate(x, y, method, width, noise) + + detected: np.ndarray = np.array([], dtype=int) + if width is None or peaks is None: + measured_width, detected = _measure_width(y, noise) + if width is None: + width = measured_width + + if peaks is None: + forbidden = _forbidden_from_peaks(y.size, detected, width) + if not detected.size: + log.warning('No peaks detected; background anchors may be unreliable.') + else: + forbidden = np.asarray(peaks, dtype=bool) + + curve, backend_params = _stage1_baseline(x, y, method, width, smoothness) + tolerance = _NOISE_TOLERANCE_FACTOR * noise + indices = _thin_to_anchors(x, curve, tolerance, forbidden, n_points) + anchors = np.column_stack((x[indices], curve[indices])) + return BackgroundEstimate( + curve=curve, + anchors=anchors, + method=method, + width=float(width), + noise=noise, + tolerance=float(tolerance), + backend_params=backend_params, + ) diff --git a/src/easydiffraction/datablocks/experiment/categories/background/line_segment.py b/src/easydiffraction/datablocks/experiment/categories/background/line_segment.py index 0e48ddb91..79540795d 100644 --- a/src/easydiffraction/datablocks/experiment/categories/background/line_segment.py +++ b/src/easydiffraction/datablocks/experiment/categories/background/line_segment.py @@ -10,6 +10,8 @@ import numpy as np from scipy.interpolate import interp1d +from scipy.signal import find_peaks +from scipy.signal import peak_widths from easydiffraction.core.category import CategoryItem from easydiffraction.core.display_handler import DisplayHandler @@ -22,6 +24,8 @@ from easydiffraction.core.variable import NumericDescriptor from easydiffraction.core.variable import Parameter from easydiffraction.core.variable import StringDescriptor +from easydiffraction.datablocks.experiment.categories.background import enums +from easydiffraction.datablocks.experiment.categories.background import estimate from easydiffraction.datablocks.experiment.categories.background.base import BackgroundBase from easydiffraction.datablocks.experiment.categories.background.factory import BackgroundFactory from easydiffraction.datablocks.experiment.item.enums import BeamModeEnum @@ -31,6 +35,8 @@ from easydiffraction.utils.logging import log from easydiffraction.utils.utils import render_table +_MIN_ANCHOR_POINTS = 2 # Minimum line-segment anchors (the two endpoints) + class LineSegment(CategoryItem): """Single background control point for interpolation.""" @@ -143,6 +149,102 @@ def y(self, value: float) -> None: self._y.value = value +def _resolve_method(method: str) -> str: + """ + Validate a method name and resolve ``auto``. + + Parameters + ---------- + method : str + Requested method; one of the ``BackgroundEstimatorMethodEnum`` + values. + + Returns + ------- + str + The resolved Stage-1 method (``auto`` becomes ``arpls``). + + Raises + ------ + ValueError + If ``method`` is not a known estimator method. + """ + try: + chosen = enums.BackgroundEstimatorMethodEnum(method) + except ValueError as exc: + valid = ', '.join(member.value for member in enums.BackgroundEstimatorMethodEnum) + msg = f'Unknown background method {method!r}. Choose one of: {valid}.' + raise ValueError(msg) from exc + if chosen is enums.BackgroundEstimatorMethodEnum.AUTO: + return enums.BackgroundEstimatorMethodEnum.ARPLS.value + return chosen.value + + +def _validate_overrides( + width: float | None, + smoothness: float | None, + n_points: int | None, +) -> None: + """ + Validate the public numeric overrides of ``auto_estimate``. + + Parameters + ---------- + width : float | None + Peak width override; must be positive when supplied. + smoothness : float | None + Smoothing override; must be positive when supplied. + n_points : int | None + Anchor cap; must be an integer ``>= 2`` when supplied. + + Raises + ------ + ValueError + If any supplied override is out of range. + """ + if width is not None and width <= 0: + msg = f'width must be positive, got {width!r}.' + raise ValueError(msg) + if smoothness is not None and smoothness <= 0: + msg = f'smoothness must be positive, got {smoothness!r}.' + raise ValueError(msg) + if n_points is not None and (not isinstance(n_points, int) or n_points < _MIN_ANCHOR_POINTS): + msg = f'n_points must be an integer >= 2, got {n_points!r}.' + raise ValueError(msg) + + +def _model_peak_mask(peak_only: np.ndarray) -> np.ndarray: + """ + Build a forbidden-anchor mask from a peak-only model array. + + Peaks in the peak-only model are detected and widened by their own + full-width-at-half-maximum; Stage 2 must not place a non-endpoint + anchor on any masked sample. + + Parameters + ---------- + peak_only : np.ndarray + Peak-only model intensities (``intensity_calc - + intensity_bkg``). + + Returns + ------- + np.ndarray + Boolean mask aligned with ``peak_only``. + """ + mask = np.zeros(peak_only.size, dtype=bool) + peaks, _ = find_peaks(peak_only) + if not peaks.size: + return mask + widths = peak_widths(peak_only, peaks, rel_height=0.5)[0] + for index, peak_width in zip(peaks, widths, strict=True): + half = int(np.ceil(peak_width)) + lo = max(0, int(index) - half) + hi = int(index) + half + 1 + mask[lo:hi] = True + return mask + + @BackgroundFactory.register class LineSegmentBackground(BackgroundBase): """Linear-interpolation background between user-defined points.""" @@ -190,6 +292,85 @@ def _update( y = interp_func(x) data._set_intensity_bkg(y) + def auto_estimate( + self, + *, + method: str = 'auto', + width: float | None = None, + smoothness: float | None = None, + n_points: int | None = None, + use_model: bool = True, + ) -> None: + """ + Detect background control points from the measured pattern. + + Builds a peak-insensitive background curve and thins it to a + sparse set of fixed line-segment points, overwriting any + existing ones. Heights come from the de-peaked curve, clipped to + the measured intensities so they never eat into peaks. After at + least one calculation, ``use_model`` lets the fitted model place + better points across overlapped regions. + + Parameters + ---------- + method : str, default='auto' + Estimation method: ``auto`` (default, resolves to + ``arpls``), ``snip``, ``arpls`` or ``fabc``. + width : float | None, default=None + Peak width in points; measured from the data when ``None``. + smoothness : float | None, default=None + Backend smoothing override; derived when ``None``. + n_points : int | None, default=None + Maximum number of points; uncapped when ``None``. + use_model : bool, default=True + When a calculation has run, subtract the fitted peaks before + estimating so anchors land in true inter-peak gaps. + """ + resolved = _resolve_method(method) + _validate_overrides(width, smoothness, n_points) + data = self._parent.data + x = np.asarray(data.x, dtype=float) + if x.size == 0: + log.warning('No active data points; cannot estimate a background.') + return + intensity_meas = np.asarray(data.intensity_meas, dtype=float) + intensity_calc = np.asarray(data.intensity_calc, dtype=float) + + if use_model and np.any(intensity_calc): + peak_only = intensity_calc - np.asarray(data.intensity_bkg, dtype=float) + y = intensity_meas - peak_only + peaks = _model_peak_mask(peak_only) + else: + y = intensity_meas + peaks = None + + result = estimate.estimate_background_curve( + x, + y, + method=resolved, + peaks=peaks, + width=width, + smoothness=smoothness, + n_points=n_points, + ) + + anchor_x = result.anchors[:, 0] + measured = np.interp(anchor_x, x, intensity_meas) + heights = np.clip(result.anchors[:, 1], 0.0, measured) + + if len(self): + log.info('Replacing existing background points with a new estimate.') + self.clear() + for index, (point_x, height) in enumerate(zip(anchor_x, heights, strict=True), start=1): + self.create(id=str(index), x=float(point_x), y=float(height)) + for point in self._items: + point.y.free = False + + count = len(self) + width_pts = result.width + summary = f'Background estimate: {resolved}, {count} points, width {width_pts:.0f} pts' + log.info(summary) + def show(self) -> None: """Print a table of control points (x, intensity).""" columns_headers: list[str] = ['X', 'Intensity'] diff --git a/src/easydiffraction/display/tablers/pandas.py b/src/easydiffraction/display/tablers/pandas.py index 38ff2590e..5327cb830 100644 --- a/src/easydiffraction/display/tablers/pandas.py +++ b/src/easydiffraction/display/tablers/pandas.py @@ -36,9 +36,13 @@ # ``min-width: 0`` neutralise MkDocs Material's ``table:not([class])`` # rules, which otherwise inject a per-row ``border-top`` (stray rules # between rows) and ``th { min-width: 5rem }`` (over-wide columns) onto -# class-less embedded tables. Inline values win over the theme -# stylesheet, so no CSS class or ``