Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
7c84fbd
initial commit with examples
rozyczko May 4, 2026
dbe8376
ruff format
rozyczko May 4, 2026
2800ca3
refactor core modifications to live in reflectometry lib only
rozyczko May 4, 2026
f534a97
modified the notebook a bit
rozyczko May 4, 2026
3dd2855
Merge branch 'develop' into bayesian
rozyczko May 6, 2026
e884477
refactored MCMC sampling to easyscience module
rozyczko May 6, 2026
55dfc30
added pass-through for cancellation and total number of calculations
rozyczko May 9, 2026
52a5c56
Merge branch 'develop' into bayesian
rozyczko May 12, 2026
cf26ab7
silly typo
rozyczko May 12, 2026
0f59bd2
yet another linting
rozyczko May 12, 2026
e580956
moved the notebook to the correct location and fixed the formatting
rozyczko May 12, 2026
d112d8d
added more initial arguments to sample. Added unit tests
rozyczko May 13, 2026
d3f1536
prettify charts. Updated notebook
rozyczko May 26, 2026
a33ad4f
fixed issue with the notebook
rozyczko May 26, 2026
90e522f
move file fetch to reflectometry/data
rozyczko May 26, 2026
c67b52e
PR review fixes
rozyczko May 26, 2026
5389d3a
be careful with small chains
rozyczko May 26, 2026
08ad494
ruff
rozyczko May 26, 2026
2d6f90a
try to use explicit dtype for Ubuntu
rozyczko May 26, 2026
79f32a2
another attempt at fixing failing test
rozyczko May 27, 2026
a854285
Merge branch 'develop' into bayesian
rozyczko May 28, 2026
360351f
better corner and distribution plots
rozyczko May 29, 2026
6600287
updates to the Bayesian API after changes to core
rozyczko Jun 4, 2026
a2cb11e
use bayesian_extend
rozyczko Jun 4, 2026
31ddc05
minor fixes
rozyczko Jun 5, 2026
b608fd5
fix failing test
rozyczko Jun 5, 2026
cae1d18
test against develop, so we can merge the PR
rozyczko Jun 5, 2026
7f1fafc
use mcmc_sample
rozyczko Jun 5, 2026
2922f08
fixed tests
rozyczko Jun 5, 2026
2024b19
minor notebook fix
rozyczko Jun 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
361 changes: 361 additions & 0 deletions BAYESIAN_IN_ERL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,361 @@
# Bayesian Analysis in EasyReflectometryLib — Implementation Plan

## Current State

- `BAYESIAN_BUMPS.md` (in `MD/`) describes a 3-phase plan. Phase 1
(docs/notebook) is complete.
- `bayesian_bumps.py` (attached notebook) works but users must directly
import from `bumps.fitters`, `bumps.names`, `bumps.parameter` — a
leaky abstraction.
- The `Bumps` minimizer in `easyscience`
(`core/src/easyscience/fitting/minimizers/minimizer_bumps.py`) only
exposes **classical optimization** methods (`amoeba`, `newton`, `lm`),
NOT the DREAM/MCMC sampler.
- The `AvailableMinimizers` enum only has `Bumps`, `Bumps_simplex`,
`Bumps_newton`, `Bumps_lm`. This should remain optimizer-focused;
DREAM should be exposed as a sampling workflow, not as another
minimizer choice.
- `MultiFitter` in reflectometry-lib currently only has `fit()` and
`fit_single_data_set_1d()`.

## Goal

Users should be able to run Bayesian MCMC sampling with a **clean
high-level API** like:

```python
fitter = MultiFitter(model)
fitter.switch_minimizer(AvailableMinimizers.Bumps)

# Classical fit first
analysed = fitter.fit(data)

# Bayesian sampling
posterior = fitter.sample(data, samples=5000, burn=1000, thin=10)

# Analyze
from easyreflectometry.analysis.bayesian import plot_corner, posterior_summary
plot_corner(posterior)
print(posterior_summary(posterior))
```

Important API boundary: `fit()` remains classical optimization only.
Bayesian DREAM sampling is exposed through `sample()` so users do not
receive sampler-shaped results from an optimizer-shaped API.

## Implementation Plan

### Step 1 — Keep DREAM separate from `AvailableMinimizers`

**File**: `core/src/easyscience/fitting/available_minimizers.py`

Do **not** add `Bumps_dream` as a normal `AvailableMinimizers` member.
The enum is currently used to instantiate minimizer backends and to
route calls through `Fitter.fit()`, which expects optimizer-style
`FitResults`. DREAM is an MCMC sampler and returns a sampler state/chain
rather than a best-fit result.

Use `AvailableMinimizers.Bumps` to select the BUMPS backend, then expose
DREAM through a dedicated `sample()` method. This avoids making
`project.minimizer = AvailableMinimizers.Bumps_dream` look like a valid
classical fitting mode.

### Step 2 — Add dedicated DREAM sampling support to the Bumps minimizer (core repo)

**File**: `core/src/easyscience/fitting/minimizers/minimizer_bumps.py`

**2a.** Keep `supported_methods()` optimizer-only (`amoeba`, `newton`,
`lm`). Do not add `'dream'` there unless the EasyScience fitting
abstraction is later split into optimizer and sampler concepts.

**2b.** Add a new method `sample()` to the `Bumps` class:

```python
def sample(
self,
x: np.ndarray,
y: np.ndarray,
weights: np.ndarray,
samples: int = 10000,
burn: int = 2000,
thin: int = 10,
chains: int | None = None,
population: int | None = None,
model: Callable | None = None,
parameters: list | None = None,
progress_callback: Callable | None = None,
seed: int | None = None,
**kwargs,
) -> dict:
"""Run Bayesian MCMC sampling using BUMPS DREAM sampler.

Returns a dict with:
- 'draws': np.ndarray, shape (n_samples, n_params) — posterior samples
- 'param_names': list[str] — parameter names
- 'state': DreamState — raw BUMPS state for save/restore
- 'logp': np.ndarray — log-posterior values
"""
```

The implementation would:

1. Build the `Curve` model + `FitProblem` (reuse `_make_model()` logic)
2. Translate user-friendly aliases to BUMPS DREAM settings and call
`bumps_fit(problem, method='dream', samples=samples, burn=burn, thin=thin, pop=population, ...)`
3. Extract `result.state.draw().points` and return structured dict
4. Preserve and restore the EasyScience global object stack state,
mirroring the existing `fit()` implementation
5. Handle multi-dataset via the same `MultiFitter._precompute_reshaping`
pattern

**2c.** Do **not** modify `fit()` to handle `method='dream'`, and do
**not** delegate `fit(method='dream')` to `sample()`. `fit()` returns
`FitResults`; `sample()` returns posterior samples and sampler metadata.
Keeping the methods separate prevents incompatible return types from
leaking into `Fitter.fit()` and reflectometry-lib `MultiFitter.fit()`.

Recommended alias mapping:

| Public argument | BUMPS DREAM setting | Rationale |
| ---------------- | --------------------------- | ----------------------------------------------------------------------- |
| `samples` | `samples` | Clear user-facing chain length; prefer over optimizer-oriented `steps`. |
| `burn` | `burn` | Matches BUMPS and common MCMC terminology. |
| `thin` | `thin` | Matches BUMPS and common MCMC terminology. |
| `chains` | `pop` | User-friendly MCMC wording; maps to BUMPS population count. |
| `population` | `pop` | BUMPS-aware alias for advanced users. |
| `initialization` | `init` | More readable than `init`, but pass through to BUMPS. |
| `seed` | RNG seeding before sampling | Expose reproducibility without requiring users to know BUMPS internals. |

If both `chains` and `population` are provided, raise `ValueError`
unless they match. Accept `steps` only as a deprecated alias for
`samples`, with a warning, because `steps` already means optimizer
budget in EasyScience `Bumps.fit()`.

### Step 3 — Add `sample()` to reflectometry-lib `MultiFitter`

**File**: `reflectometry-lib/src/easyreflectometry/fitting.py`

Add a `sample()` method to `MultiFitter`:

```python
def sample(
self,
data: sc.DataGroup,
samples: int = 10000,
burn: int = 2000,
thin: int = 10,
chains: int | None = None,
population: int | None = None,
seed: int | None = None,
objective: str | None = None,
) -> dict:
"""Run Bayesian MCMC sampling on reflectometry data.

:param data: DataGroup with reflectivity data.
:param samples: Number of retained DREAM samples requested from BUMPS.
:param burn: Burn-in steps.
:param thin: Thinning interval.
:param chains: User-friendly alias for BUMPS DREAM population count.
:param population: BUMPS DREAM population count (`pop`) for advanced users.
:param seed: Random seed for reproducibility.
:param objective: Zero-variance handling strategy.
:return: Dict with posterior samples, parameter names, and sampler state.
"""
```

Internally:

1. Reuse `_prepare_fit_arrays` for data preparation
2. Mirror the EasyScience `Fitter.fit()` lifecycle for reshaping, fit
function wrapping, and restoration
3. Delegate to `self.easy_science_multi_fitter.minimizer.sample(...)`
for the MCMC
4. Handle multi-model / multi-contrast aggregation as one joint
posterior
5. Return structured posterior dict or `PosteriorResults`

The `MultiFitter` currently stores `self.easy_science_multi_fitter`
which has `.minimizer` — we'll call `sample()` on it when the minimizer
is a `Bumps` instance.

### Step 4 — Create Bayesian analysis module in reflectometry-lib

**New file**:
`reflectometry-lib/src/easyreflectometry/analysis/__init__.py` **New
file**: `reflectometry-lib/src/easyreflectometry/analysis/bayesian.py`

The `bayesian.py` module provides:

```python
class PosteriorResults:
"""Container for Bayesian posterior samples with analysis methods."""

draws: np.ndarray # (n_samples, n_params)
param_names: list[str]
logp: np.ndarray | None
sampler_state: Any | None

def summary(self) -> str:
"""Return formatted summary table with mean, sd, HDI for each parameter."""

def corner(self, **kwargs) -> None:
"""Plot parameter correlation corner plot using the `corner` library."""

def credible_interval(self, alpha: float = 0.95) -> dict:
"""Return {param_name: (lower, upper)} credible intervals."""

def gelman_rubin(self) -> dict:
"""Compute R-hat convergence diagnostic."""

def posterior_summary(draws, param_names) -> str: ...
def plot_corner(draws, param_names, **kwargs) -> None: ...
def plot_trace(draws, param_names, **kwargs) -> None: ...
def credible_intervals(draws, param_names, alpha=0.95) -> dict: ...
```

**Posterior predictive functions** (reflectivity & SLD):

```python
def posterior_predictive_reflectivity(
draws, param_names, model, q_values, n_samples=200
) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
"""Return (median, lower_95, upper_95) reflectivity arrays."""

def posterior_sld_profile(
draws, param_names, model, n_samples=200
) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
"""Return (z, median, lower_95, upper_95) SLD profile arrays."""
```

Posterior predictive helpers must save original parameter values and
errors before applying any posterior draw, and restore them in a
`finally` block after prediction. Parameter lookup should use
EasyScience parameter `unique_name` values, matching the BUMPS names
after removing the minimizer prefix, rather than display names. This
avoids leaving the model mutated after plotting and avoids collisions
when repeated models or multi-contrast fits contain similarly named
parameters.

### Step 5 — Dependencies

Add to `pyproject.toml` in reflectometry-lib:

```toml
[project.optional-dependencies]
bayesian = ["corner>=2.2", "arviz>=0.18"]
```

Make `corner` and `arviz` optional imports — the analysis module should
work with graceful fallbacks when they're not installed.

### Step 6 — Update exports

**File**: `reflectometry-lib/src/easyreflectometry/__init__.py`

Add `PosteriorResults` and analysis functions to the public API if
desired.

### Step 7 — Update the example notebook

**File**:
`reflectometry-lib/docs/src/tutorials/advancedfitting/bayesian_bumps.py`

Replace low-level BUMPS calls with the new high-level API:

- `fitter.sample(data, samples=500, burn=100, thin=10)` instead of
manual `FitProblem` + `bumps_fit`
- `plot_corner(posterior['draws'], posterior['param_names'])` instead of
manual `corner.corner()`
- `posterior_summary(...)` instead of manual numpy statistics

### Step 8 — Tests

**File**: `reflectometry-lib/tests/test_bayesian.py` (new)

```python
def test_sample_basic(): ...
def test_posterior_summary_format(): ...
def test_corner_plot_does_not_crash(): ...
def test_credible_intervals(): ...
def test_sample_seed_reproducibility(): ...
```

Also add a test in `core/tests/` for the `Bumps.sample()` method.

## Architecture Diagram

```
User Code
├─ fitter.fit(data) ──► classical chi² minimization
├─ fitter.sample(data, ...) ──► Bayesian DREAM MCMC
reflectometry-lib MultiFitter
│ ._prepare_fit_arrays() ← reused from fit()
│ delegates to ↓
easyscience Bumps minimizer
│ .fit(x, y, weights) ──► amoeba/newton/lm
│ .sample(x, y, weights, ...) ──► DREAM MCMC (NEW)
bumps.fitters.fit / FitProblem / Curve
└──► reflectivity model evaluation via fit_func

Post-hoc analysis:
reflectometry-lib analysis.bayesian
├── PosteriorResults (container)
├── plot_corner() → corner.corner()
├── posterior_summary() → numpy stats
└── posterior_predictive_reflectivity() → model.interface.fit_func()
```

## Risk Assessment

| Risk | Mitigation |
| -------------------------------------------- | -------------------------------------------------------------------------------- |
| DREAM may not converge with default settings | Expose `samples`, `burn`, `thin`, `chains`/`population`; document best practices |
| MCMC is 10-100× slower than least-squares | Document that classical fit first is recommended; add progress callback |
| `corner` / `arviz` may not be installed | Make optional dependencies with graceful fallbacks |
| Multi-dataset sampling aggregation | Follow existing `MultiFitter._precompute_reshaping` pattern |
| BUMPS DREAM API changes | Pin bumps version; wrap in our API |
| Reproducibility | Expose `seed` parameter; document how to save/load DreamState |
| Model mutation during posterior prediction | Save/restore original parameter values and map draws by `unique_name` |

## Files to Create / Modify

### Create:

1. `reflectometry-lib/src/easyreflectometry/analysis/__init__.py`
2. `reflectometry-lib/src/easyreflectometry/analysis/bayesian.py`
3. `reflectometry-lib/tests/test_bayesian.py`

### Modify:

4. `core/src/easyscience/fitting/available_minimizers.py` — no
`Bumps_dream`; optionally document that samplers are exposed
separately
5. `core/src/easyscience/fitting/minimizers/minimizer_bumps.py` — add
dedicated `sample()` method without adding `'dream'` to optimizer
methods
6. `reflectometry-lib/src/easyreflectometry/fitting.py` — add `sample()`
to `MultiFitter`
7. `reflectometry-lib/src/easyreflectometry/__init__.py` — optional:
export new classes
8. `reflectometry-lib/pyproject.toml` — add optional `bayesian`
dependencies
9. `reflectometry-lib/docs/src/tutorials/advancedfitting/bayesian_bumps.py`
— update to use new API

## Implementation Order

1. **core changes** (Steps 1-2): Add `Bumps.sample()` method while
keeping DREAM out of optimizer enum/method dispatch
2. **reflectometry-lib fitting** (Step 3): Add `MultiFitter.sample()`
3. **reflectometry-lib analysis** (Step 4): Create
`analysis/bayesian.py` with corner plot & stats
4. **Dependencies** (Step 5): Add optional deps to pyproject.toml
5. **Exports** (Step 6): Update `__init__.py`
6. **Example update** (Step 7): Update notebook to use new API
7. **Tests** (Step 8): Add test coverage
Loading
Loading