diff --git a/.codecov.yml b/.codecov.yml index d001c41da..af816b1a2 100644 --- a/.codecov.yml +++ b/.codecov.yml @@ -4,6 +4,11 @@ github_checks: annotations: true +ignore: + - 'src/easydiffraction/report/templates/html/vendor/**' + - 'src/easydiffraction/report/templates/tex/styles/**' + - 'src/easydiffraction/utils/_vendored/jupyter_dark_detect/**' + comment: layout: 'reach, diff, flags, files' behavior: default diff --git a/.codefactorignore b/.codefactorignore index 48d64e0bf..23d7fb1a1 100644 --- a/.codefactorignore +++ b/.codefactorignore @@ -1,11 +1,13 @@ # CodeFactor exclude patterns configuration must be done online: # https://www.codefactor.io/repository/github/easyscience/diffraction-lib/ignore # -# Last updated: 2026-01-05 +# Last updated: 2026-05-29 # # Exclude patterns: # deps/** # docs/** +# src/easydiffraction/report/templates/html/vendor/** +# src/easydiffraction/report/templates/tex/styles/** # src/easydiffraction/utils/_vendored/jupyter_dark_detect/** # tests/** # tmp/** diff --git a/.copier-answers.yml b/.copier-answers.yml index 009e6acc4..32c8c49a5 100644 --- a/.copier-answers.yml +++ b/.copier-answers.yml @@ -1,6 +1,6 @@ # WARNING: Do not edit this file manually. # Any changes will be overwritten by Copier. -_commit: v0.11.4 +_commit: v0.12.0 _src_path: gh:easyscience/templates app_docs_url: https://easyscience.github.io/diffraction-app app_doi: 10.5281/zenodo.18163581 diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md deleted file mode 100644 index 46a066d2b..000000000 --- a/.github/copilot-instructions.md +++ /dev/null @@ -1,224 +0,0 @@ -# Copilot Instructions for EasyDiffraction - -## Project Context - -- Python library for crystallographic diffraction analysis (refining - structural models against experimental data). -- Domain axes: `sample_form` (powder, single crystal), `beam_mode` - (time-of-flight, constant wavelength), `radiation_probe` (neutron, - x-ray), `scattering_type` (bragg, total). -- Calculation backends: `cryspy` and `crysfml` (Bragg), `pdffit2` (total - scattering). -- CIF maps to `DatablockItem`/`DatablockCollection` and - `CategoryItem`/`CategoryCollection` (loops). Follow CIF naming; - deviate only for a clearly better API. -- Metadata via frozen dataclasses: `TypeInfo`, `Compatibility`, - `CalculatorSupport`. -- Audience is scientists, often non-programmers: prioritize - discoverability, clear errors, and safe defaults over developer - ergonomics. -- Critical-software rigor: every code path tested, edge cases handled - explicitly, no silent failures. - -## Code Style - -- snake_case (functions/vars), PascalCase (classes), UPPER_SNAKE_CASE - (constants). -- `from __future__ import annotations` in every module. Type-annotate - all public signatures. -- Numpy-style docstrings on all public classes/methods (Parameters / - Returns / Raises where applicable). Summary is one line ≤72 chars - (`max-doc-length`); shorten wording rather than wrap. -- Flat over nested, explicit over clever, composition over deep - inheritance. No defensive checks for unlikely edge cases. -- One class per file when substantial; group small related classes. -- No `**kwargs` — use explicit keyword arguments. -- No string-based dispatch (e.g. `getattr(self, f'_{name}')`); write - named methods (`_set_sample_form`, `_set_beam_mode`). Narrow framework - metadata lookups are allowed when the attribute name is a class-level - declaration, is not user input, and is validated in one central place; - for example, `CategoryItem._category_entry_name`. -- Public attrs are either editable (getter+setter property) or read-only - (getter only). For internal mutation of read-only props, use a private - `_set_` method, not a public setter. -- Lint complexity thresholds in `pyproject.toml` (`max-args`, - `max-branches`, `max-statements`, `max-locals`, `max-nested-blocks`, - …) are guardrails. A violation means refactor (extract helpers, - parameter objects, flatten) — do not raise thresholds, add `# noqa`, - or otherwise silence them. For complex refactors touching many lines - or public API, propose a plan and wait for approval. - -## Architecture - -- Eager top-of-module imports by default. Lazy imports only to break - circular deps or to keep `core/` free of heavy imports on rarely- - called paths (e.g. `help()`). -- No `pkgutil`/`importlib` auto-discovery, no background threads, no - monkey-patching or runtime class mutation. -- No `__all__`; control public API via explicit `__init__.py` imports. - No redundant `import X as X` aliases. -- Concrete classes use `@Factory.register`. Each package's `__init__.py` - must explicitly import every concrete class to trigger registration — - always update it when adding a class. -- Switchable categories (factory-swappable at runtime) follow this fixed - API on the owner (experiment / structure / analysis): `` - (read-only), `_type` (getter+setter), - `show_supported__types()`, `show_current__type()`. - The owner owns the type setter and show methods; show methods delegate - to `Factory.show_supported(...)`. Required even if only one - implementation exists. -- Categories are flat siblings within their owner. Never nest a category - as a child of another category of a different type; cross-reference - via IDs instead. -- Every finite, closed set of values (factory tags, axes, enumerated - descriptors) is a `(str, Enum)`; compare against members, not raw - strings. -- Keep `core/` free of domain logic (base classes and utilities only). -- Don't introduce abstractions before a concrete second use case. Don't - add dependencies without asking. - -## Testing - -- Every new module, class, or bug fix ships with tests. See - `docs/dev/adrs/accepted/test-strategy.md` for the full strategy. -- Unit tests mirror the source tree: - `src/easydiffraction//.py` → - `tests/unit/easydiffraction//test_.py`. Verify with - `pixi run test-structure-check`. Supplementary tests: - `test__coverage.py`. Category packages with only - `default.py`/`factory.py` may use one parent-level - `test_.py`. -- Tests expecting `log.error()` to raise must `monkeypatch` Logger to - RAISE mode (another test may have leaked WARN mode). -- `@typechecked` setters raise `typeguard.TypeCheckError`, not - `TypeError`. -- No test-ordering dependence, no network, no sleeping, no real - calculation engines in unit tests. - -## Tutorials - -- Notebooks in `docs/docs/tutorials/*.ipynb` are generated artifacts. - Edit only the corresponding `*.py`, then run - `pixi run notebook-prepare`. - -## Change Discipline - -- Before any structural/design change (new categories, factories, - switchable-category wiring, datablocks, CIF serialisation), read - `docs/dev/adrs/index.md` and the relevant accepted ADRs. Localised bug - fixes or test updates need only this file. -- Development documentation lives under `docs/dev/`. Use - `docs/dev/adrs/index.md` as the architecture and decision navigation - surface; there is no separate `architecture.md` source of truth. -- Project is in beta: no legacy shims, no deprecation warnings — update - tests and tutorials to the current API. -- Minimal diffs; don't reformat working code. Fix only what's asked; - flag adjacent issues as comments. Don't add features or refactor - unless asked. Don't remove TODOs or comments unless the change fully - resolves them. -- Never remove or replace existing functionality without explicit - confirmation — highlight every removal and wait for approval. -- When renaming or auditing usages, search the entire project (code, - tests, tutorials, docs). Use `git grep -n` because all contributors - have Git; do not assume `rg` is installed. If `git grep` is - unavailable, fall back to `find ... -type f` plus `grep -n`. -- Each change is atomic and single-commit-sized: make one change, - suggest the commit message, then stop and wait for confirmation. -- When in doubt, ask. - -## Commits - -- Suggest a commit message after each change: code block, ≤72 chars, - imperative mood, no type prefix, no `Co-authored-by: Copilot`. - Examples: - - Add ChebyshevPolynomialBackground class - - Implement background_type setter on Experiment - - Standardize switchable-category naming convention -- Stage only the files modified for the step, using explicit paths where - practical. Do not include data, project, CIF, or other generated - artifacts produced by integration/script/notebook tests unless the - user explicitly asked to update them. -- Before each commit, inspect the worktree and avoid staging unrelated - user changes. If unrelated dirty files exist, leave them untouched and - mention them only when relevant. - -## Workflow - -Non-trivial changes use a two-phase workflow: - -- **Phase 1 — Implementation.** Code and docs updates only. Update ADRs - when the change affects architecture or documented decisions. Do not - create or run tests unless the user explicitly asks. When done, - present for review and iterate until approved. -- **Phase 2 — Verification.** Add/update tests, then run `pixi run fix`, - `pixi run check`, `pixi run unit-tests`, `pixi run integration-tests`, - `pixi run script-tests`. - -Notes: - -- `pixi run fix` regenerates `docs/dev/package-structure/full.md` and - `docs/dev/package-structure/short.md` automatically — never edit those - by hand. Don't review auto-fixes; accept and move on. Then - `pixi run check` until clean. -- When a check command needs saved output for analysis, capture the log - and preserve the command exit code with a zsh-safe variable name: - `pixi run check > /tmp/easydiffraction-check.log 2>&1; check_exit_code=$?; tail -n 200 /tmp/easydiffraction-check.log; exit $check_exit_code`. - Never assign to `status` in zsh; it is readonly. Use task-specific - names such as `check_exit_code`, `unit_tests_exit_code`, or - `script_tests_exit_code`. -- Open issues / design questions / planned improvements live in - `docs/dev/issues/open.md` (priority-ordered). On resolution, move to - `docs/dev/issues/closed.md` and update the relevant ADR or - `docs/dev/adrs/index.md` if affected. - -### Planning - -When asked to create a plan: - -- Start the plan by referencing this file: - `.github/copilot-instructions.md`. State any deliberate exception to - these instructions in the plan itself. -- First gather enough repository context to make the plan concrete. Ask - all ambiguous or unclear questions in one concise batch; record - unresolved questions in the plan if the user wants it saved before - answering them. -- Save plans as `docs/dev/plans/.md` (lowercase, - dash-separated, e.g. `docs/dev/plans/background-refactor.md`). When a - plan implements one ADR, use the same slug as the ADR file; for - example, `docs/dev/adrs/suggestions/foo.md` maps to - `docs/dev/plans/foo.md`. If a plan has no corresponding ADR or spans - multiple ADRs, choose a concise feature slug and list all related ADRs - in the plan. Use the same `` for the implementation - branch (`feature/`). Do not push the branch unless - asked. -- Include a status checklist with `[ ]` items; mark `[x]` as completed - during implementation. -- Apply the two-phase workflow (Phase 1 implementation, Phase 2 - verification) to non-trivial plans. Stop after Phase 1 and ask the - user to review before starting Phase 2. -- The plan must explicitly state that, when an AI agent follows it, - every completed Phase 1 implementation step must be staged with - explicit paths and committed locally before moving to the next - implementation step or the Phase 1 review gate. Follow the rules in - **Commits**. Keep commits atomic, single-purpose, and aligned with - plan steps. -- If implementation uncovers a serious requirement, risk, design issue, - or scope change not covered by the plan, stop and ask the user for - clarification or approval before proceeding. Record the unresolved - issue in the plan when useful. -- The plan should be easy to maintain while working: include concrete - files likely to change, decisions already made, open questions, - verification commands for Phase 2, and a short suggested commit - message or branch name when useful. -- Verification commands in plans must include the zsh-safe log-capture - pattern from **Workflow** whenever saved output is needed for later - analysis. -- Before saving a plan, verify that referenced files, directories, - scripts, and task names exist locally when that is practical. If a - referenced tool is optional or missing, include an available fallback. -- End every plan with a "Suggested Pull Request" section containing a - short PR title and a brief end-user-oriented description. Keep this - section non-technical enough for scientists and other users to - understand the benefit. Update it during implementation if extra - approved changes become important enough to mention in the PR title or - description. diff --git a/.github/workflows/coverage.yml b/.github/workflows/coverage.yml index d96e5b87d..5208a3333 100644 --- a/.github/workflows/coverage.yml +++ b/.github/workflows/coverage.yml @@ -63,32 +63,8 @@ jobs: files: ./coverage-unit.xml token: ${{ secrets.CODECOV_TOKEN }} - # Job 2: Run integration tests with coverage and upload to Codecov - integration-tests-coverage: - runs-on: ubuntu-latest - - steps: - - name: Check-out repository - uses: actions/checkout@v6 - - - name: Set up pixi - uses: ./.github/actions/setup-pixi - - - name: Run integration tests with coverage - run: - pixi run integration-tests-coverage --cov-report=xml:coverage-integration.xml - - - name: Upload integration tests coverage to Codecov - if: ${{ !cancelled() }} - uses: ./.github/actions/upload-codecov - with: - name: integration-tests-job - flags: integration - files: ./coverage-integration.xml - token: ${{ secrets.CODECOV_TOKEN }} - # Job 4: Build and publish dashboard (reusable workflow) run-reusable-workflows: - needs: [docstring-coverage, unit-tests-coverage, integration-tests-coverage] # depend on the previous jobs + needs: [docstring-coverage, unit-tests-coverage] # depend on the previous jobs uses: ./.github/workflows/dashboard.yml secrets: inherit diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml index ea6bdad11..27a9f9543 100644 --- a/.github/workflows/docs.yml +++ b/.github/workflows/docs.yml @@ -121,6 +121,12 @@ jobs: # if: false # Temporarily disabled to speed up the docs build run: pixi run notebook-exec-ci + # Sync the canonical Three.js snapshot into the docs assets so + # MkDocs can serve it (belt-and-braces; docs-build also depends on + # this task). + - name: Sync vendored JS into docs assets + run: pixi run docs-sync-vendored-js + # Build the static files for the documentation site for local inspection # Input: docs/ directory containing the Markdown files # Output: site/ directory containing the generated HTML files diff --git a/.github/workflows/tutorial-tests.yml b/.github/workflows/tutorial-tests.yml index f924a31d3..4e93a6383 100644 --- a/.github/workflows/tutorial-tests.yml +++ b/.github/workflows/tutorial-tests.yml @@ -47,7 +47,7 @@ jobs: - name: Test tutorials as python scripts shell: bash - run: pixi run script-tests + run: pixi run script-tests-checked - name: Prepare notebooks shell: bash @@ -55,7 +55,7 @@ jobs: - name: Test tutorials as notebooks shell: bash - run: pixi run notebook-tests + run: pixi run notebook-tests-checked # Job 2: Build and publish dashboard (reusable workflow) run-reusable-workflows: diff --git a/.gitignore b/.gitignore index ec2e0d270..22e2be679 100644 --- a/.gitignore +++ b/.gitignore @@ -15,6 +15,10 @@ build/ # MkDocs docs/site/ +# Generated docs-serving copy of the canonical Three.js (synced from +# src/ by `pixi run docs-sync-vendored-js`; the source of truth is src/) +docs/docs/assets/javascripts/vendor/threejs/ + # Jupyter Notebooks .ipynb_checkpoints @@ -45,6 +49,10 @@ CMakeLists.txt.user* *.log *.zip +# Agents +AGENTS.md +CLAUDE.md + # ED # Used to fetch tutorials data during their runtime. Need to have '/' at # the beginning to avoid ignoring 'data' module in the src/. diff --git a/.prettierignore b/.prettierignore index a08c3c48b..32b5970dc 100644 --- a/.prettierignore +++ b/.prettierignore @@ -25,6 +25,15 @@ docs/docs/assets/ # Node node_modules +# Vendored snapshots +src/easydiffraction/display/structure/renderers/vendor/ +src/easydiffraction/report/templates/html/vendor/ +src/easydiffraction/report/templates/tex/styles/ +src/easydiffraction/utils/_vendored/jupyter_dark_detect/ + +# Tox +.tox + # Misc .benchmarks .cache diff --git a/THIRD_PARTY_LICENSES.md b/THIRD_PARTY_LICENSES.md new file mode 100644 index 000000000..96e7c70ec --- /dev/null +++ b/THIRD_PARTY_LICENSES.md @@ -0,0 +1,25 @@ +# Third-Party Licenses + +This file indexes third-party assets vendored in this repository. + +## Report LaTeX Styles + +The vendored report LaTeX style files are documented in +`src/easydiffraction/report/templates/tex/styles/LICENSES.md`. + +## Report HTML Assets + +The vendored report HTML assets are documented in +`src/easydiffraction/report/templates/html/vendor/LICENSES.md`. + +## Structure-View Three.js + +The vendored Three.js assets for the crysview structure view (MIT) are +documented in +`src/easydiffraction/display/structure/renderers/vendor/threejs/LICENSES.md`. + +## Structure-View Element Data + +The bundled per-element radii and colour palettes (with per-source +provenance) are documented in +`src/easydiffraction/display/structure/assets/LICENSES.md`. diff --git a/docs/dev/adrs/accepted/analysis-cif-fit-state.md b/docs/dev/adrs/accepted/analysis-cif-fit-state.md index a38f0e7c7..9051c0ff4 100644 --- a/docs/dev/adrs/accepted/analysis-cif-fit-state.md +++ b/docs/dev/adrs/accepted/analysis-cif-fit-state.md @@ -15,8 +15,8 @@ Analysis and fitting. ## Context `analysis/analysis.cif` already persists analysis configuration such as -`_fitting.minimizer_type`, `_fitting.mode_type`, aliases, constraints, -and active fit-mode settings. That configuration alone is not enough to +`_minimizer.type`, `_fitting_mode.type`, aliases, constraints, and +active fit-mode settings. That configuration alone is not enough to reopen a saved project and continue the same fit-result, plotting, and command-line workflow. @@ -25,10 +25,11 @@ Analysis-owned fit state needs to persist: - fit bounds and bound provenance - pre-fit scalar snapshots for recovery workflows - compact status metadata for the latest saved fit projection +- software-provenance snapshot for the latest successful fit - deterministic correlation summaries -- Bayesian summary metadata and manifests for bulk array sidecars -- plot-ready Bayesian caches so restored posterior displays do not need - to recompute on first use +- minimizer-specific fit outputs on the paired `_fit_result.*` category +- per-parameter posterior summaries on `_fit_parameter` +- large posterior arrays and plot caches in `analysis/results.h5` Committed model parameter values and uncertainties already persist in structure and experiment CIF files through the accepted free-flag CIF @@ -41,13 +42,13 @@ projection. This ADR defines that narrower saved projection. ## Decision -Persist analysis-owned fit state as explicit sibling categories in -`analysis/analysis.cif`, with large Bayesian arrays stored in +Persist analysis-owned fit state as explicit analysis categories in +`analysis/analysis.cif`, with large posterior arrays stored in `analysis/results.h5`. Do not add a dedicated `_fit_state` category or `_fit_state.schema_version`. Persisted fit state is detected from -`_fit_result` and the related fit-state categories. +`_fit_result`, `_fit_parameter`, and `_fit_parameter_correlation`. ### Common fit-state categories @@ -63,11 +64,35 @@ pre-fit scalar snapshots: - `param_unique_name` - `fit_min` - `fit_max` -- `fit_bounds_uncertainty_multiplier` - `start_value` - `start_uncertainty` -`_fit_result` stores the latest saved fit header: +When any row has uncertainty-derived bounds, `_fit_parameter` also +stores the provenance field: + +- `fit_bounds_uncertainty_multiplier` + +For Bayesian fit projections, `_fit_parameter` also stores per-parameter +posterior summaries: + +- `posterior_best_sample_value` +- `posterior_median` +- `posterior_uncertainty` +- `posterior_interval_68_low` +- `posterior_interval_68_high` +- `posterior_interval_95_low` +- `posterior_interval_95_high` +- `posterior_gelman_rubin` +- `posterior_effective_sample_size_bulk` + +`_fit_result` stores the latest saved fit header and scalar +family-specific fit outputs. In the default project save this category +is topology-neutral: single-crystal and powder fits both persist under +`_fit_result.*`. The IUCr submission export may remap these same values +to topology-specific dictionary categories (`_refine_ls.*`, +`_pd_proc_ls.*`, `_reflns.*`) as described by +[`iucr-cif-tag-alignment.md`](iucr-cif-tag-alignment.md), but the +round-trip project schema remains common. - `result_kind` - `success` @@ -80,15 +105,40 @@ pre-fit scalar snapshots: correlation summaries keyed by a persisted `id`. Only unique parameter pairs are stored. -### Deterministic fit projection +### Software provenance + +`_software` stores the runtime software snapshot recorded after a +successful fit. It is part of the project save / load contract and feeds +report rendering plus the IUCr export software labels. It is not a user +configuration category. + +The category stores name, version, and URL triples for: + +- `framework` +- `calculator` +- `minimizer` + +Each role is persisted as scalar items on the same category: + +- `_name` +- `_version` +- `_url` + +The category also stores: + +- `timestamp` + +`timestamp` is an ISO-8601 UTC string for the fit that produced the +snapshot. Projects saved before this category existed load with all +software fields unset and `timestamp` set to `None`; rerunning a fit +populates the snapshot. -Deterministic fits persist `_deterministic_result` in addition to the -common categories above. +### Minimizer fit projection -`_deterministic_result` stores compact optimizer metadata and counts: +The active `_minimizer.*` category stores user-selected solver inputs +only. Scalar outputs are written to the paired `_fit_result.*` category. +Deterministic fit-result classes add compact fit output counts: -- `optimizer_name` -- `method_name` - `objective_name` - `objective_value` - `n_data_points` @@ -98,59 +148,100 @@ common categories above. - `covariance_available` - `correlation_available` +These deterministic fields are always written once a deterministic +fit-result projection exists. + +Reflection-result fields are written only when a fitted experiment has +persisted reflection rows: + +- `R_factor_all` +- `wR_factor_all` +- `R_factor_gt` +- `wR_factor_gt` +- `threshold_expression` +- `number_reflns_total` +- `number_reflns_gt` + +Powder-profile fields are written only when the result contains powder +profile diagnostics: + +- `prof_R_factor` +- `prof_wR_factor` +- `prof_wR_expected` +- `profile_function` +- `background_function` + +Restraint and constraint counts are written only when positive: + +- `number_restraints` +- `number_constraints` + +The deterministic R-factor, profile, restraint / constraint, and +reflection-aggregate fields use dictionary-canonical item names where +those exist, including uppercase `R` / `wR`, while retaining the +project-side `_fit_result` category prefix in the default save. Live +deterministic fit results may also carry transient diagnostics such as +`shift_over_su_max` and `shift_over_su_mean`; those are not written to +`analysis/analysis.cif` until a topology-specific persistence contract +needs them. + +When the LSQ backend provides a termination reason that differs from the +common `_fit_result.message`, deterministic fit results also store: + +- `exit_reason` + Do not persist a `_deterministic_parameter_result` category. Final deterministic parameter values and uncertainties already persist in the model CIF files, and restored deterministic ordering comes from `_fit_parameter`. -### Bayesian fit projection - -Bayesian fits persist these additional categories: +Bayesian minimizer classes store sampler inputs under `_minimizer.*`: -- `_bayesian_result` -- `_bayesian_sampler` -- `_bayesian_convergence` -- `_bayesian_parameter_posterior` -- `_bayesian_distribution_cache` -- `_bayesian_pair_cache` -- `_bayesian_predictive_dataset` +- `sampling_steps` +- `burn_in_steps` +- `thinning_interval` +- `population_size` +- `parallel_workers` +- `initialization_method` +- `random_seed` -`_bayesian_result` stores the saved Bayesian header and sidecar flags, -including `sidecar_file`, `has_posterior_samples`, -`has_distribution_cache`, `has_pair_cache`, and -`has_posterior_predictive`. +Bayesian fit-result classes store scalar outputs under `_fit_result.*`: -`_bayesian_sampler` stores the resolved sampler settings used for the -run. `parallel` persists the resolved non-negative worker count as an -integer. +- `point_estimate_name` +- `sampler_completed` +- `credible_interval_inner` +- `credible_interval_outer` +- `resolved_random_seed` +- `gelman_rubin_max` +- `effective_sample_size_min` +- `best_log_posterior` -`_bayesian_convergence` stores convergence metadata and posterior array -shape counts. +When the backend reports an acceptance rate, Bayesian fit results also +store: -`_bayesian_parameter_posterior` stores one summary row per sampled -parameter, including credible intervals, uncertainty, ESS, and R-hat. -Its row order defines the saved posterior parameter order. +- `acceptance_rate_mean` -`_bayesian_distribution_cache`, `_bayesian_pair_cache`, and -`_bayesian_predictive_dataset` store manifest rows for plot-ready -posterior caches. Distribution and predictive caches are persisted for -any Bayesian fit with posterior samples, including single-parameter -fits. Pair caches and posterior correlation summaries are only persisted -when more than one parameter was sampled. +Bayesian per-parameter posterior summaries are stored on the +corresponding `_fit_parameter` rows. Their row order defines the saved +posterior parameter order. -`parameter.posterior` is not part of this accepted design. This ADR -persists analysis-level posterior summaries and caches only. Any future -parameter-level posterior API remains a separate decision. +`FitResults.optimizer_name` and `FitResults.method_name` are restored +from the active minimizer category class instead of being persisted as +independent CIF fields. Each concrete minimizer category declares a +class-level `_engine_metadata: ClassVar[dict[str, str]]` containing +those two display values. This keeps the persisted projection to the +user-selected `_minimizer.type` and removes duplicated deterministic +metadata from `_minimizer.*`. -### Bayesian sidecar +### Posterior sidecar -Persist large Bayesian arrays in `analysis/results.h5` using `h5py`. -This includes canonical posterior arrays and any saved distribution, -pair, and predictive cache arrays referenced by the CIF manifests. +Persist large posterior arrays in `analysis/results.h5` using `h5py`. +This includes canonical posterior arrays and saved distribution, pair, +and predictive cache arrays. The HDF5 file is self-describing; no CIF +manifest rows or sidecar filename tags are persisted. -The persisted `sidecar_file` value is a local file name only. It must -resolve to a basename inside the project `analysis/` directory. Absolute -paths and traversal paths are rejected and fall back to `results.h5`. +The sidecar filename is fixed to `results.h5` inside the project +`analysis/` directory. If the sidecar is missing on load, summary rows in `analysis/analysis.cif` still restore fit tables and metadata. Features @@ -167,10 +258,11 @@ posterior displays. Load order is: 1. standard analysis configuration -2. common fit-state categories -3. deterministic or Bayesian fit-specific categories according to - `_fit_result.result_kind` -4. Bayesian sidecar arrays when a Bayesian sidecar is expected +2. `_minimizer.*` settings according to the active `_minimizer.type` +3. `_software.*` provenance fields when present +4. common and family-specific `_fit_result.*` fields on the paired class +5. `_fit_parameter` and `_fit_parameter_correlation` +6. posterior sidecar arrays when a Bayesian result is expected Persist backend runtime objects, optimizer instances, and raw driver payloads nowhere in this design. @@ -186,7 +278,8 @@ values remain in the model CIF files instead of being duplicated in a second deterministic per-parameter result loop. Bayesian persistence spans CIF metadata and an HDF5 sidecar, so save and -load must validate consistency between manifest rows and bulk datasets. +load must validate consistency between `_fit_parameter` rows and bulk +datasets. The accepted runtime fit-results ADR should now be read as runtime-only except where this narrower projection explicitly persists fit-state diff --git a/docs/dev/adrs/accepted/category-owner-sections.md b/docs/dev/adrs/accepted/category-owner-sections.md index 46f9b3db0..0091b4c59 100644 --- a/docs/dev/adrs/accepted/category-owner-sections.md +++ b/docs/dev/adrs/accepted/category-owner-sections.md @@ -79,17 +79,19 @@ Project-level configuration follows the same pattern via a private Its current children are: - `ProjectInfo` -- `Rendering` +- `Chart` +- `Table` The public API stays flat and user-facing: - `project.info` -- `project.rendering` +- `project.rendering_plot` +- `project.rendering_table` Saved `project.cif` remains a section file without a `data_` header. It -serializes the `_project.*` metadata category and the `_rendering.*` -configuration category without pretending that the project config is a -real datablock. +serializes the `_project.*` metadata category plus the +`_rendering_plot.*` and `_rendering_table.*` configuration categories +without pretending that the project config is a real datablock. ### 4. CIF serialization is split by responsibility diff --git a/docs/dev/adrs/accepted/crysview-structure-visualization.md b/docs/dev/adrs/accepted/crysview-structure-visualization.md new file mode 100644 index 000000000..0852726f3 --- /dev/null +++ b/docs/dev/adrs/accepted/crysview-structure-visualization.md @@ -0,0 +1,742 @@ +# ADR: crysview Structure Visualization + +## Status + +Accepted. + +**Date:** 2026-05-31 + +**Implementation note:** A later implementation pass split +structure-view configuration into `rendering_structure`, +`structure_view`, and `structure_style`. This ADR reflects that final +surface; no separate `structure-view-settings` ADR exists. + +## Context + +EasyDiffraction refines crystal structures but offers no interactive 3D +view of them. The accepted [Display UX Facade](display-ux.md) ADR +defines `project.display` for 1D pattern charts and for parameter, fit, +and posterior tables and plots, but nothing spatial: there is no way to +look at the atoms, the unit cell, or the anisotropic displacement +parameters a refinement is adjusting. + +A working prototype establishes the target experience and the data it +needs. It lives at +[`crysview-threejs-demo.html`](crysview-structure-visualization/crysview-threejs-demo.html) +and demonstrates, against a non-orthogonal unit cell: + +- atoms as spheres with element radius and colour; +- anisotropic ADP ellipsoids (semi-axis lengths plus orientation); +- mixed-occupancy atoms drawn as occupancy wedges (a sphere split by + site occupancy); +- two-colour bonds split at their midpoint; +- magnetic-moment arrows; +- an a/b/c axis triad drawn longer than the cell edges; +- a Plotly-style modebar: perspective/parallel projection toggle, + view-along-a/b/c buttons, a home/reset button, and per-feature + visibility toggles for cell, axes, atoms, bonds, moments, and labels; +- a shrink-wrapped legend, hover tooltips, and persistent atom labels; +- orbit / zoom / pan controls and both perspective and orthographic + cameras, with parallel projection as the default. + +The prototype's input comment already separates the concerns: a +crystallography layer performs symmetry expansion, fractional → +Cartesian conversion, ADP eigendecomposition, and element-radius lookup; +a visualization layer chooses sizes, colours, bonds, and occupancy +splitting; and the renderer only consumes prepared geometry. + +Relevant facts about the current codebase: + +- The structure model lives under + `src/easydiffraction/datablocks/structure/categories/`: `cell`, + `atom_sites` (fractional coordinates, occupancy, isotropic ADP), + `atom_site_aniso` (anisotropic ADP), and `space_group`. +- The 1D charting subsystem already uses a switchable-engine pattern. + `project.rendering_plot.type` selects a plotter engine implemented + under `src/easydiffraction/display/plotters/` (`ascii.py`, + `plotly.py`), and `project.rendering_table.type` selects a tabler. + These follow the switchable-category ADRs, with CIF tags + `_rendering_plot.type` and `_rendering_table.type`. +- `easycrystallography` is **not** a dependency today and is not + imported anywhere in `src/`. Any layering that places a separate + visualization package between `easycrystallography` and + `easydiffraction` is therefore a future direction, not the current + state. + +The audience is scientists, often non-programmers, working mostly in +Jupyter notebooks and the planned GUI. Discoverability, clear names, and +safe defaults take priority over developer ergonomics. + +## Decision + +This ADR records the accepted first version of the structure viewer. +Points that earlier reviews left open are settled in the sections below; +the historical open questions are retained only to document the final +choices. + +### 1. Build a renderer-neutral structure scene + +crysview converts a crystal structure into a prepared, renderer-neutral +**structure scene**: a flat collection of typed primitives expressed in +Cartesian space and carrying no rendering-library types. The primitive +set matches the prototype: + +- atom spheres (centre, radius, colour); +- occupancy wedges for mixed-occupancy sites; +- ADP ellipsoids (semi-axes scaled to the configured probability, plus + orientation); +- bonds (two endpoints, split colour); +- magnetic-moment arrows; +- unit-cell edges; +- the a/b/c axis triad; +- text labels. + +All crystallographic computation — symmetry expansion over the +configured cell range (section 3), fractional → Cartesian conversion, +ADP eigendecomposition, radii and colours from the selected model and +colour scheme, bond detection, and occupancy splitting — happens while +building the scene, upstream of any renderer. This is the contract the +prototype already assumes. + +### 2. Draw the scene with thin, pluggable renderers + +Renderers consume the scene and draw it; they hold no crystallographic +logic. Renderer choice mirrors `project.rendering_plot.type`: + +- an ASCII renderer for terminal, CLI, and headless contexts; +- a Three.js renderer for notebooks (embedded HTML/JS) and standalone + HTML; +- a raster renderer that emits a static, z-buffered PNG image for the + TeX/PDF report — a trimetric projection of the same scene with a + per-pixel depth buffer, so hidden-surface removal is exact (atoms, + bonds, cell edges, and axes all occlude correctly). It is **not** a + user-selectable engine (it is invoked by the report, like `pgfplots` + is for the fit plot). The z-buffer rasterisation is plain numpy; it + uses `Pillow` to draw the a/b/c axis labels and the element legend and + to encode the PNG; +- a Qt Quick 3D renderer for the GUI is planned. + +ASCII and Three.js are the initial interactive engines, shipping +together exactly as the `ascii` and `plotly` chart engines do; the +raster renderer serves the TeX/PDF report, and Qt Quick 3D follows for +the GUI. + +A switchable engine selector is added on the project owner, parallel to +`project.rendering_plot` / `project.rendering_table`. It is named +`rendering_structure`: + +```python +project.rendering_structure.type = 'auto' # default: 'threejs' in Jupyter, 'ascii' in a terminal +project.rendering_structure.show_supported() +``` + +with CIF tag `_rendering_structure.type`. The name parallels +`rendering_plot` / `rendering_table`, and follows the category-owned +selector contract: `project.rendering_structure` is a read-only +attribute on the owner; `project.rendering_structure.type` is the +writable selector; `project.rendering_structure.show_supported()` lists +engines. Switching `type` calls the owner's private +`_swap_rendering_structure` hook, which rebinds the active renderer — +the same Family B rebinding the plot engine selector uses — so no public +`rendering_structure_type` setter or +`show_supported_rendering_structure_types()` is added. The default is +`auto`, which resolves at draw time to `threejs` in a Jupyter notebook +and `ascii` in a terminal — exactly as `_rendering_plot.type` / +`_rendering_table.type` resolve their environment defaults. + +### 3. Add a `structure()` entry point on the display facade + +Add `project.display.structure(struct_name=...)`, parallel to the +existing `project.display.pattern(expt_name=...)`. It renders one +structure with the active `view` engine — interactive 3D in a notebook, +a schematic projection in the terminal. In the Three.js engine, feature +visibility, projection, and view-along presets are interactive through +the modebar with sensible defaults (parallel projection; cell, axes, +atoms, and bonds visible, plus moments where the data exists; labels +off). In a notebook it embeds an interactive view (an IPython HTML +representation); like the HTML report it can also write a standalone +HTML file to a path. The exact return and save signature is left to the +implementation plan. + +Content selection uses an `include=` argument: + +```python +project.display.structure(struct_name='lbco') +project.display.structure(struct_name='lbco', include='auto') +project.display.structure( + struct_name='lbco', + include=('atoms', 'bonds', 'cell', 'axes', 'moments', 'labels'), +) +``` + +`include='auto'` shows what the structure state supports (cell, axes, +atoms, bonds, and moments where moment data exists; labels off by +default). The option vocabulary is `auto`, `atoms`, `bonds`, `cell`, +`axes`, `moments`, and `labels`. ADP ellipsoids and mixed-occupancy +splits are not separate keywords: they are drawn automatically as part +of `atoms` (anisotropic ADP gives an ellipsoid, isotropic a sphere; a +mixed site is split), so the data decides. The interactive modebar +toggles the same features after the initial view is drawn, so `include` +sets the starting state and the modebar refines it. + +A companion `project.display.show_structure_options(struct_name=...)` +lists each `include=` option with whether the active engine and the +current structure state support it, and the reason when they do not — +for example `moments` is unavailable until the structure model carries +moment fields, and the `ascii` engine reports the features only the 3D +engines draw. This gives the structure view per-option discoverability. + +The view also has a spatial extent: which symmetry-equivalent atoms the +scene contains. The scene builder takes the unique (asymmetric-unit) +atoms, applies the space-group symmetry, and keeps every generated copy +whose fractional coordinates fall within a per-axis range, **borders +included**. The default range is `[0, 1]` on each of a, b, and c, so a +full unit cell is drawn with the atoms on the 0 and 1 faces, edges, and +corners all present (a corner site therefore appears at all eight +corners). The range is user-settable per axis, validated so each minimum +is below its maximum, and need not be integer — `[0, 2]` along a draws +two cells, `[-0.2, 1.2]` adds a margin. Like the other settings it is +persisted and overridable per call: + +```python +# Persisted per-axis bounds — six scalar settings, like the cell +# parameters (defaults 0 and 1 on each axis = the full cell, borders +# included): +project.structure_view.range_a_max = 2 # two cells along a +project.structure_view.range_c_min, project.structure_view.range_c_max = -0.2, 1.2 # margin on c + +# A convenience tuple overrides the persisted range for one call only: +project.display.structure( + struct_name='lbco', + range=((0, 2), (0, 1), (0, 1)), +) +``` + +Symmetry expansion can map several operations onto one point — a site on +a special position, or the shared 0-and-1 faces the default range keeps +— so the scene builder applies a **scene-atom identity rule** as it +collects copies. Two generated atoms are the same scene atom when they +come from the same atom-site row _and_ their fractional coordinates +coincide within a small tolerance (`1e-4` in fractional units); the +builder keeps one and drops the rest. The tolerance is far below any +cell fraction, so a copy at 0 and its border-included copy at 1 are +distinct positions and both survive — special-position overlaps collapse +without discarding the intentional boundary translations. The atom-site +row participates in the key, so two different rows that happen to share +a position are not merged here; that case is occupancy grouping, handled +next. + +When two or more atom-site rows resolve to the same position (within +that same tolerance), the scene builder groups them into one +**occupancy-wedge sphere** rather than overdrawing coincident spheres: +each row contributes a wedge whose angular share is proportional to its +occupancy. Coincident position is the only grouping signal the model +offers — atom sites carry an occupancy but no disorder-group or +occupancy-group field — so it is the documented version-1 criterion. +When the grouped occupancies sum below one, the remainder is drawn as a +vacancy wedge so the empty fraction is visible (a lone site with +occupancy below one is the one-row case); when they meet or exceed one, +the shares are normalized to their sum and no vacancy wedge is drawn. +The builder invents no occupancies — it shows exactly what the rows +carry. + +Because expansion happens in the scene builder (section 1), the 3D +engines draw this expanded set in full. The `ascii` engine is the +reduced-fidelity sibling (section 7): it always renders the single +default cell and reports a wider view range as a 3D-only capability +through `show_structure_options()`, the same way it announces the other +features only the 3D engines draw. + +### 4. Start internal, design for later extraction + +Implement crysview first as an internal subpackage that mirrors +`display/plotters/` (for example +`src/easydiffraction/display/structure/` with renderers under a +`renderers/` subpackage). Keep the scene model free of +easydiffraction-domain imports so it can later be extracted into a +standalone `crysview` package and, eventually, consume +`easycrystallography`. Do **not** add `easycrystallography` as a +dependency now. + +### 5. Pin and deliver Three.js deliberately + +The prototype loads a pinned Three.js (`three@0.160.0`) plus +`OrbitControls` and `CSS2DRenderer` through a CDN importmap. Production +ships the pinned Three.js bundled with the package so the notebook and +standalone-HTML views are autonomous — they render with no network, +which is what a CDN-blocked or sandboxed context (where the demo renders +blank) needs. This mirrors how the existing report path already embeds +its JavaScript when asked: `report/html_renderer.py` exposes an +`offline` flag that sets `include_plotlyjs=True` (embed) versus `'cdn'`, +gated in the template by `html_offline`. + +The HTML report's structure view follows the same rule: it honours the +report's `html_offline` flag, embedding the Three.js assets when offline +and otherwise linking them, so a structure figure behaves like the +existing Plotly figures in a report. + +### 6. Source styling from standard models and colour schemes + +Atom radii and colours are not typed in per element. They follow from +**standard, user-selected models** that every structure viewer +recognises, looked up automatically from each atom's element (and its +charge where the model needs it): + +- a **radius model** turns an element into a sphere radius — van der + Waals, ionic (Shannon; the site charge where a model carries one, + otherwise a documented per-element default, see below), or covalent; +- a **colour scheme** is a named element-colour palette — the Jmol/CPK + scheme, the VESTA scheme, and similar well-known sets. + +A scientist picks one model and one scheme instead of editing dozens of +per-element rows, which keeps the view consistent and reproducible: + +```python +project.structure_style.atom_view = 'covalent' # vdw | covalent | ionic | adp +project.structure_style.color_scheme = 'jmol' # jmol | vesta +project.structure_style.atom_view.show_supported() +project.structure_style.color_scheme.show_supported() +``` + +How an atom is sized and shaped is a single **display-style switch**, +`atom_view`, because the standard radius models and the ADP probability +surface are alternative depictions and a view shows one of them at a +time: + +- `'vdw'`, `'covalent'`, `'ionic'` draw every atom as a **radius-model + sphere** for the named standard radius table; displacement parameters + do not affect size. This is the familiar ball-and-stick depiction and + works for any structure, with or without ADP. +- `'adp'` draws each atom as its **ADP probability surface** — a sphere + for an atom with only isotropic ADP, an ellipsoid (semi-axes and + orientation from the ADP tensor) for an anisotropic one. Atoms that + carry no ADP fall back to a covalent-radius sphere. This is the + thermal-ellipsoid (ORTEP) depiction crystallographers use to inspect + the displacement parameters a refinement adjusts. + +The default is `'covalent'`, because it gives every structure a stable +charge-free ball view. Users can switch to `'adp'` when they want to +inspect displacement surfaces. + +> **Amendment — `atom_view` merge.** An earlier design split this into +> two settings: `atom_shape` (`ball`/`ortep`) and `radius_model` +> (`vdw`/`covalent`/`ionic`/`atomic`). They were merged into the single +> `atom_view` selector because `radius_model` was meaningful only in +> ball mode, so the two-field form carried four degenerate +> `ortep`×radius-model combinations. The flat list removes the dead +> states and matches how VESTA/Mercury present the choice. The +> `atomic`/empirical option was then dropped, leaving +> `{vdw, covalent, ionic, adp}`: its radii are within a few percent of +> `covalent` for most elements (and identical for some), so after +> ball-size compression it was visually indistinguishable and added a +> redundant choice. The atomic radii remain in the element database, +> unused by the public selector. The `adp` view still uses covalent +> radii for the ball fallback and for mixed-occupancy sites. CIF field: +> `_structure_style.atom_view`. + +In `'adp'` the surfaces are drawn at one **probability level**, +`adp_probability`, a fraction in the open interval (0, 1) — not a +percentage — validated on assignment. It defaults to `0.5` (the ORTEP +and journal 50% convention) and is freely changeable (for example +`0.95`). It has no effect in the radius-model views. + +Which bonds the view draws is **not** a styling choice — it is a +geometric property of the structure, and it follows the **standard +cif_core `_geom` auto-bonding model**, not the display `atom_view`. A +bond is drawn between two sites when their distance `d` satisfies +`_geom.min_bond_distance_cutoff ≤ d ≤ r_bond(i) + r_bond(j) + _geom.bond_distance_incr`, +where the per-type bonding radius `r_bond` is `_atom_type.radius_bond` +when the structure carries it, otherwise the element's covalent radius +from the bundled database. Matches are then pruned to the first +coordination shell — a contact is kept only when it is within `1.3×` the +nearer atom's nearest-neighbour distance — so the large covalent radii +of ionic A-site cations do not bond to every surrounding anion (a +heuristic stop-gap; see open issue #108 for the full near-neighbour +approach). These two cutoffs live on the **structure** and persist in +the structure's own CIF (see section 8), not in +`project.structure_style`. The `atom_view` radius models (vdw / covalent +/ ionic) change only the rendered sphere _size_ — they never decide +which bonds appear; bond detection is governed solely by the `_geom` +cutoffs and the per-type bonding radius. Version 1 draws bonds computed +on the fly from this rule while the scene is built and persists no bond +table. The full computed bond and angle geometry — the standard +`_geom_bond` and `_geom_angle` loops, with distances, angles, symmetry +codes, and standard uncertainties — is a separate, related feature that +reuses the same symmetry-expansion and distance math (see Deferred +Work). + +`atom_view` and `color_scheme` are finite, closed value sets, so each is +a `(str, Enum)` validated on assignment per the +[Enum-Backed Closed Value Sets](enum-backed-closed-values.md) ADR, and +each selector lists its accepted values through descriptor-level +`show_supported()` — for example +`project.structure_style.atom_view.show_supported()`. `structure_style` +is a plain category, not a switchable one: it has no factory-swapped +`type`, only these validated value settings. + +The defaults are the **`covalent`** atom view and the **Jmol/CPK** +colour scheme, so the view looks right with no configuration. Covalent +radii are preferred because they are backed by complete, well-documented +per-element data and need no oxidation state: today's atom-site model +carries only an element symbol — no charge, oxidation-state, or +coordination field — so a model that depends on charge cannot be +resolved per site yet. + +The radii and colours come from a **bundled element database** — a +package asset, like the colour palettes, not a per-project value, so it +is not CIF-serialized; the project CIF records only which model and +scheme are selected. The database carries, per element, the van der +Waals, covalent, ionic (a representative Shannon radius at a documented +default oxidation state and coordination), and atomic/empirical radii, +plus the Jmol/CPK and VESTA colour palettes, each value carrying a +documented provenance. The ionic entries let `atom_view = 'ionic'` work +today against the documented default oxidation state; when a future +atom-site charge field exists the ionic model will prefer the site's +charge. An element with no entry for the selected radius model falls +back to its covalent radius, and `show_structure_options()` reports the +substitution instead of failing. Version 1 adds no per-element overrides +on top of the chosen model and scheme. + +All of this is CIF-persisted, so a reopened project renders identically. +The decision is that styling is **an atom-shape mode plus model, scheme, +and probability-level selection**, not a per-element table; the exact +CIF tag names and serialization shape are pinned in the implementation +plan (Open Questions, resolved). + +The view also adapts to the host's **colour theme**. Like the Plotly +chart engine — which selects the `plotly_dark` or `plotly_white` +template from the detected theme — the structure view reuses the +project's existing dark/light detection (`is_dark()` in +`utils/_vendored`) and switches the scene background and the label, +axis, and edge colours to match, so a notebook in dark mode gets a dark +canvas. Element colours still come from the selected colour scheme +regardless of theme; only the surrounding canvas and annotations follow +it. The theme is auto-detected, not a persisted styling value. + +### 7. Terminal view (ASCII engine) + +The `ascii` engine renders in the terminal, mirroring the existing +`ascii` chart plotter: it builds a character grid and prints it, with no +GUI or JavaScript. Like that chart engine — which openly announces the +features only Plotly can draw — it is a deliberately reduced-fidelity +sibling of the 3D engines: one schematic projection, one unit cell, and +no bonds, labels, ADP ellipsoids, or moment arrows. When an `include=` +request asks for one of those features, the engine announces it is +available with the 3D engines and skips it, just as the ascii chart +engine does for Plotly-only features. A view range wider than the +default single cell is treated the same way: the terminal view always +draws one cell and announces that multi-cell and margin ranges are +honored only by the 3D engines, so its schematic stays uncluttered and +the single parallelogram never disagrees with the atoms it frames. + +Like the other engines it consumes the same renderer-neutral scene +(section 1): it projects the scene's Cartesian atom centres and +unit-cell edges onto a plane and draws a schematic 2D view. The longest +in-plane cell axis runs horizontally, the shortest vertically, and the +remaining (middle-length) axis is the viewing direction. + +The cell is drawn as a schematic parallelogram. Its two side edges are +rasterized with the asciichartpy glyph set (`│ ╭ ╮ ╯ ╰ ─`), and the +staircase slope encodes the in-plane angle: near 90° gives long `│` runs +with few corners (a rectangle at exactly 90°), while a larger deviation +from 90° introduces more `╭╯` steps (mirrored to `╰╮` for the opposite +lean). The view is schematic — lengths and angles are approximate, just +enough to convey the cell — so non-orthogonal cells render the same way +as orthogonal ones, with the slant shown rather than dropped. + +Atoms are drawn as coloured Unicode circles: colour by element from the +selected colour scheme (the scene colour from section 6, mapped to the +nearest terminal colour) and size by a small radius-bucketed glyph ramp +(for example `· • ● ⬤`). Each axis arrow points to its letter: the +vertical axis is the letter stacked over an up-arrow above the cell (`c` +then `↑`), and the horizontal axis is a right-arrow pointing to the +letter at the end of the bottom-border line, after a short gap (`→ a`). +Each axis arrow and its letter are tinted with that axis's colour — the +same a/b/c colours the scene gives the 3D engines, mapped to the nearest +terminal colour and reset afterwards, just as the existing ASCII chart +legend colours its entries. A legend maps each glyph to its element +name, and both the legend glyph and its element label are tinted with +that element's colour-scheme colour (mapped to the nearest terminal +colour and reset afterwards), so the atoms in the cell, the legend, and +the axis letters all share the one selected colour scheme. The mocks +below are monochrome; a real terminal shows these colours. + +An orthorhombic cell viewed down b, with vertical side edges: + +``` + c + ↑ + ╭─────────────────────────╮ + │ ● ● │ + │ ⬤ │ + │ • ● │ + ╰─────────────────────────╯ → a + + Legend: ● La ● Ba ⬤ Co • O +``` + +A monoclinic cell viewed down b, with slanted side edges: + +``` + c + ↑ + ╭─────────────────────────╮ + ╭╯ ● ● ╭╯ + ╭╯ ⬤ ╭╯ + ╭╯ • ● ╭╯ + ╰─────────────────────────╯ → a + + Legend: ● La ● Ba ⬤ Co • O +``` + +A small gap-free line helper provides the edge rasterization: it +generalizes the asciichartpy connector (fill vertical runs with `│`, cap +bends with corner glyphs) so it can be walked row-major for the +near-vertical edges that the column-major chart code cannot express. + +### 8. Configuring what is shown and how + +The view has three configuration axes — _which engine_ draws it, _what_ +is shown, and _how_ it is styled. They are three flat project +categories, all persisted to CIF: + +- `project.rendering_structure` selects the renderer engine only. +- `project.structure_view` stores durable content and region settings. +- `project.structure_style` stores durable appearance settings. + +```python +# How: renderer engine +project.rendering_structure.type = 'auto' # default: 'threejs' in Jupyter, 'ascii' in a terminal +project.rendering_structure.show_supported() + +# How: standard styling models, not per-element values (visual only) +project.structure_style.atom_view = 'covalent' # vdw | covalent | ionic | adp +project.structure_style.color_scheme = 'jmol' # jmol | vesta +project.structure_style.adp_probability = 0.5 # ADP probability level (0, 1) +project.structure_style.atom_scale = 0.3 # overall atom scale (0, 1] +project.structure_style.atom_view.show_supported() +project.structure_style.color_scheme.show_supported() + +# Which bonds exist: a per-structure geometric property, not styling. +# Standard cif_core _geom auto-bonding (r_bond defaults to covalent radius): +# bond iff min_cutoff <= d <= r_bond(i) + r_bond(j) + incr. +structure = project.structures['lbco'] +structure.geom.min_bond_distance_cutoff = 0.0 # default 0.0 Å +structure.geom.bond_distance_incr = 0.25 # default 0.25 Å (documented, tunable) + +# What (per call): content for one view, overriding the initial defaults +project.display.structure(struct_name='lbco') # 'auto' +project.display.structure( + struct_name='lbco', + include=('atoms', 'bonds', 'cell', 'axes'), +) + +# Initial view state (persisted): what is shown when the view opens. The +# Three.js modebar stays active, so the user can still toggle each +# feature live afterwards. show_moments stays inert until the structure +# model carries moment fields (see Deferred Work). +project.structure_view.show_labels = False +project.structure_view.show_moments = True + +# What region (persisted): six per-axis fractional bounds (defaults 0 and +# 1 = full cell, borders included), mirroring the six scalar cell +# parameters. +project.structure_view.range_a_min = 0 +project.structure_view.range_a_max = 1 # range_b_min/max and range_c_min/max likewise +``` + +The persisted equivalent in the project CIF: + +``` +# In the project CIF (project-level view + style): +_rendering_structure.type auto + +_structure_view.show_labels false +_structure_view.show_moments true +_structure_view.range_a_min 0 +_structure_view.range_a_max 1 +_structure_view.range_b_min 0 +_structure_view.range_b_max 1 +_structure_view.range_c_min 0 +_structure_view.range_c_max 1 + +_structure_style.atom_view covalent +_structure_style.color_scheme jmol +_structure_style.adp_probability 0.5 +_structure_style.atom_scale 0.3 + +# In the structure (sample) CIF, beside _cell / _atom_site (per-structure): +_geom.min_bond_distance_cutoff 0.0 +_geom.bond_distance_incr 0.25 +``` + +The `_rendering_structure.type` tag follows `_rendering_plot.type` / +`_rendering_table.type` from the Display UX Facade ADR, including their +`auto` environment-default convention (resolved to `threejs` in Jupyter, +`ascii` in a terminal); `_geom.min_bond_distance_cutoff` and +`_geom.bond_distance_incr` are the **standard cif_core** bond-cutoff +tags (`_atom_type.radius_bond` is the standard per-type bonding radius, +used when present). The `_structure_view.*`, `_structure_style.*`, and +`_rendering_structure.type` tags are project-internal app settings. + +Initial visibility resolves in a fixed order, so a reopened project and +a per-call request behave predictably: + +1. **An explicit `include=(...)` tuple wins outright.** The view opens + showing exactly those features; persisted `_structure_view.show_*` + flags are ignored for that call. So `include=('atoms',)` shows only + atoms even when `show_labels=True` is persisted. +2. **`include='auto'`** — the default, and what a bare `structure()` + call uses — resolves each feature in turn from: data availability + first (a feature with no data is off, such as moments without moment + fields), then the persisted `_structure_view.show_*` flag where one + exists, then the built-in default otherwise. Version 1 persists flags + only for the two features whose default a scientist most often flips + — `show_labels` (off) and `show_moments` (on where data exists); + atoms, bonds, cell, and axes follow their built-in 'auto' defaults + and are set per call through an explicit `include=` tuple. So + `show_labels=True` with `include='auto'` opens with labels on. +3. **Unsupported options are skipped and announced, never errored.** + Whether it arrived through an explicit tuple or 'auto', a feature the + engine cannot draw (any 3D-only feature under `ascii`) or the data + does not support (moments without fields) is reported by + `show_structure_options()` and at draw time. +4. **Live modebar changes apply on top of that initial state and are + runtime-only.** Toggling a feature in the Three.js modebar never + rewrites the persisted `_structure_view.show_*` flags or the + `include=` set, so reopening the project restores the resolved + initial state rather than the last live toggle. + +## Consequences + +- `project.display` gains a spatial view (`structure()`) that + complements the 1D `pattern()` view with an `include=` feature + selector. +- `project.display` also gains `show_structure_options()`, so the + supported content for a given structure and engine is discoverable + with reasons. +- Keeping crystallography in the scene builder and out of renderers lets + several front-ends (Three.js now, Qt Quick 3D later) share one model. +- A switchable `rendering_structure` category + (`project.rendering_structure.type`, CIF `_rendering_structure.type`) + selects only the engine, per the switchable-category and + category-owner ADRs. Plain `structure_view` and `structure_style` + sibling categories hold content/region and appearance settings. +- The `ascii` and `threejs` engines ship together, mirroring the chart + engines: `ascii` needs no JavaScript and renders a schematic view in + the terminal, CLI, and headless contexts, while `threejs` covers + notebooks and HTML. +- Content selection (`include=`) and a small set of visibility flags + become persisted _initial-view_ settings, so a project reopens looking + the same; the interactive engines still let the user toggle features + live. +- The scene's spatial extent is configurable: a per-axis fractional + range (default `[0, 1]`, borders included) decides which + symmetry-equivalent atoms are generated, so a single cell, an added + margin, or several cells need no new primitives. The 3D engines draw + the expanded set; the `ascii` engine draws the single default cell and + reports wider ranges as a 3D-only capability. +- The styling category lets scientists choose a standard atom view + (`vdw`, `covalent`, `ionic`, or `adp`), a colour scheme, an ADP + probability level, and an overall atom scale — not hand-edit + per-element rows — all CIF-persisted, with defaults that work + unconfigured. The radii and colours come from a bundled element + database (covalent, vdW, ionic, and atomic radii; Jmol/CPK and VESTA + palettes) shipped as a package asset. +- Bond generation is a per-structure geometric property, not styling: it + uses the standard cif_core `_geom` auto-bonding cutoffs + (`_geom.min_bond_distance_cutoff`, `_geom.bond_distance_incr`) plus a + per-type bonding radius (`_atom_type.radius_bond`, defaulting to the + covalent radius), all on the structure and persisted in the structure + CIF — not in `project.structure_style`, and independent of the display + `atom_view`. Version 1 draws bonds on the fly and persists no bond + table; the full computed `_geom_bond` / `_geom_angle` tables are + deferred to a separate feature. +- The structure view auto-detects the host's dark/light theme (reusing + the project's existing `is_dark()` detection) and adapts its + background and annotation colours, mirroring how the Plotly chart + engine switches templates; element colours still come from the + selected colour scheme. +- The scene builder must expose occupancy splitting, anisotropic ADP, + and magnetic moments. Where the current structure model lacks a field + (magnetic moments are not in `atom_sites`/`atom_site_aniso` today), + that feature stays gated until the model provides the data. +- A pinned Three.js version becomes a bundled package asset to keep up + to date, and the HTML report embeds it under `html_offline`. +- Tutorials and public API docs gain a structure-view example. + +## Alternatives Considered + +- **Reuse the 1D chart engines (Plotly) for 3D.** Rejected: Plotly's 3D + primitives do not express ADP ellipsoids, occupancy wedges, or + crystallographic camera/axis controls cleanly. +- **Put rendering directly in easydiffraction with no scene + abstraction.** Rejected: it couples crystallography to one rendering + library and blocks the planned GUI renderer. +- **Start as a standalone `crysview` package and adopt + `easycrystallography` now.** Rejected for the first step as a + premature dependency and repo split before the design is proven; + retained as the strategic direction. +- **Server-rendered static images instead of an interactive scene.** + Rejected: it loses the interactivity (rotate, toggle, view-along) + scientists expect when inspecting a structure. + +## Open Questions + +All items below are now **resolved** so the implementation plan can be +executed autonomously; the plan records the verified data sources and +the final names. + +- **CIF tag spelling — resolved (see the §8 _Updated_ note for the final + split).** Project CIF: `_structure_style.atom_view` / + `_structure_style.color_scheme` / `_structure_style.adp_probability` / + `_structure_style.atom_scale`; `_structure_view.show_labels` / + `_structure_view.show_moments` / + `_structure_view.range_{a,b,c}_{min,max}`; and + `_rendering_structure.type` (engine only). These are project-internal + app/settings tags (`_rendering_structure.type` follows the Display-UX + `_rendering_plot.type` / `_rendering_table.type` precedent); the radii + and colours are a bundled element-database asset, not CIF-serialized. +- **Per-structure bond-cutoff category — resolved (standard + `_geom.*`).** A single-record `structure.geom` category holding the + cif_core cutoffs `_geom.min_bond_distance_cutoff` (default `0.0` Å) + and `_geom.bond_distance_incr` (default `0.25` Å, documented and + tunable), in the structure datablock. A bond is drawn when + `min_bond_distance_cutoff ≤ d ≤ r_bond(i) + r_bond(j) + bond_distance_incr`, + with `r_bond` = `_atom_type.radius_bond` when present, else the + covalent radius. These are the **standard** cif_core tags (review-4 + finding 1): `_geom.min_bond_distance_cutoff` (dic 13084), + `_geom.bond_distance_incr` (dic 13044), `_atom_type.radius_bond` (dic + 25419); the earlier project-internal `_bonds.*` proposal was dropped. + The computed `_geom_bond.*` / `_geom_angle.*` loops remain reserved + for the deferred geometry tables. +- **ASCII rendering details — resolved.** A 4-bucket radius glyph ramp + (`· • ● ⬤`) and the 8/16-colour ANSI mapping the existing ascii chart + legend already uses. +- **Per-axis range boundary completion — resolved.** Version 1 draws + only atoms inside the range (borders included) and bonds only between + in-scene atoms — no out-of-range partner atoms or edge-coordination + completion. The range is persisted as six scalar tags + `_structure_view.range_{a,b,c}_{min,max}` (one number each, defaults 0 + and 1), mirroring the six scalar cell parameters; a per-call `range=` + tuple on `structure()` overrides them for one call. + +## Deferred Work + +- The computed bond and angle geometry tables — the standard + `_geom_bond` and `_geom_angle` loops (atom-pair/triplet labels, + distances, angles, site-symmetry codes, standard uncertainties, + `publ_flag`) — as a separate, related feature. It reuses crysview's + symmetry-expansion and distance math and the same per-structure + `_geom` cutoffs (extended with the angle/contact increments cif_core + already defines). Version 1 draws bonds on the fly from the `_geom` + bond cutoffs and persists no geometry table. +- The Qt Quick 3D renderer for the GUI. +- Magnetic-moment fields on the structure model (a separate + magnetic-structure effort); the scene's moment-arrow primitive stays + gated until they exist. +- Extraction of a standalone `crysview` package and the + `easycrystallography` layering. +- Advanced depictions beyond atoms, bonds, and ADP surfaces, such as + coordination polyhedra. Symmetry expansion and multiple-cell views are + in scope through the per-axis range (section 3). diff --git a/docs/dev/adrs/accepted/crysview-structure-visualization/crysview-threejs-demo.html b/docs/dev/adrs/accepted/crysview-structure-visualization/crysview-threejs-demo.html new file mode 100644 index 000000000..82148ff35 --- /dev/null +++ b/docs/dev/adrs/accepted/crysview-structure-visualization/crysview-threejs-demo.html @@ -0,0 +1,855 @@ + + + + + + crysview — Three.js structure prototype + + + + +
+
+ +
+
+ + + + + +
+
+
+ + + + + + +
+
+ +
+
+ A/B (50/50) +
+
C
+
D
+
+ +
+
drag = rotate
+
wheel = zoom
+
right-drag = pan
+
+ + + + diff --git a/docs/dev/adrs/accepted/display-ux.md b/docs/dev/adrs/accepted/display-ux.md index 23a780812..edd55f3b3 100644 --- a/docs/dev/adrs/accepted/display-ux.md +++ b/docs/dev/adrs/accepted/display-ux.md @@ -42,23 +42,23 @@ defaults. ## Decision Use `project.display` as the user-facing facade for display actions. -Move serialized renderer settings out of that facade and into a separate -project category named `project.rendering`. +Move serialized renderer settings out of that facade and into separate +project categories named `project.rendering_plot` and +`project.rendering_table`. Renderer settings: ```python -project.rendering.chart_engine = 'plotly' -project.rendering.table_engine = 'pandas' -project.rendering.show_chart_engines() -project.rendering.show_table_engines() -project.rendering.show_config() +project.rendering_plot.type = 'plotly' +project.rendering_table.type = 'pandas' +project.rendering_plot.show_supported() +project.rendering_table.show_supported() ``` CIF names: -- `_rendering.chart_engine` -- `_rendering.table_engine` +- `_rendering_plot.type` +- `_rendering_table.type` No legacy loader is required for `_display.plotter_type` or `_display.tabler_type`. The project is in beta, so this cleanup may @@ -82,8 +82,6 @@ project.display.fit.series(param, versus='diffrn.ambient_temperature') project.display.posterior.pairs() project.display.posterior.distribution(param) project.display.posterior.predictive(expt_name='hrpt') - -project.display.show_pattern_options(expt_name='hrpt') ``` `project.analysis.display` is removed from the primary public API. Its @@ -116,77 +114,14 @@ project.display.pattern(expt_name='hrpt') project.display.pattern(expt_name='hrpt', x_min=40, x_max=55) ``` -By default, `pattern()` uses `include='auto'` and displays as much -useful information as the project state supports: - -- measured data if present -- calculated data if linked structure state and calculated intensities - are available -- background if powder Bragg measured and calculated data plus defined - background points are available -- Bragg ticks if powder Bragg measured and calculated data plus - reflection rows are available -- residual if both measured and calculated data are available and the - experiment type supports a residual panel -- excluded regions if available on the experiment -- uncertainty bands where posterior predictive data exists and the chart - engine supports them - -Specific subsets are selected with `include`: - -```python -project.display.pattern(expt_name='hrpt', include='auto') -project.display.pattern(expt_name='hrpt', include='measured') -project.display.pattern(expt_name='hrpt', include='calculated') -project.display.pattern( - expt_name='hrpt', - include=('measured', 'calculated', 'background', 'residual', 'bragg'), -) -``` - -`include` was chosen over alternatives: - -| Name | Reason not selected | -| ------------- | ----------------------------------------------- | -| `layers` | Sounds graphical rather than user intent. | -| `components` | Precise, but longer. | -| `content` | Too broad. | -| `view` | Better for presets than arbitrary combinations. | -| `series` | Does not fit residual rows or Bragg ticks well. | -| boolean flags | Explicit, but scales poorly. | - -Add discovery for supported pattern content: - -```python -project.display.show_pattern_options(expt_name='hrpt') -``` - -The table shows option name, description, availability for the -experiment, whether `include='auto'` includes it, and the reason an -option is unavailable. - -Pattern option names: - -- `auto` -- `measured` -- `calculated` -- `background` -- `residual` -- `bragg` -- `excluded` -- `uncertainty` - -`uncertainty` is available where posterior predictive data exists for a -supported experiment and the active chart engine can render bands. It is -unavailable, with a clear reason, when no posterior predictive data is -present. - -Explicit combinations are validated against the same project state used -by `include='auto'`. `background`, `bragg`, and `residual` require both -measured and calculated data in the same view. `excluded` requires -measured, calculated, or uncertainty content in the same view, and -excluded-region overlays currently require the experiment's default -x-axis. +`pattern()` renders every kind of data the project state supports — +measured, calculated, residual, Bragg ticks, background, excluded +regions, and posterior predictive uncertainty, each shown when +available. It takes no view-selection argument. The content rules, the +removed `include` / `show_pattern_options` design, and the shared +single- and multi-panel figure sizing are recorded in the +[Unified Pattern View](pattern-display-unification.md) ADR, which +supersedes the `include`-based pattern design once described here. ## Deterministic And Bayesian Consistency @@ -201,6 +136,12 @@ Use these naming rules: path for `versus`. - `posterior.*` names are used only when posterior samples are required. +`project.display.fit.results()` also prints a "Settings used" block +above the result tables. The block is sourced from +`analysis.minimizer.*` so the minimizer inputs and paired +`analysis.fit_result.*` outputs are visible from the accepted display +facade without adding a new `Analysis`-level display method. + ## Rejected Alternatives Flat display facade: @@ -224,8 +165,9 @@ users should not need to decide the output type before asking for information. Some outputs may render as a chart or a table depending on backend and state. -Separate `measured()` and `calculated()` methods were rejected because -they duplicate `pattern(..., include=...)`. +Separate `measured()` and `calculated()` methods are unnecessary: +`pattern()` shows every available kind of data directly, so there is no +subset for them to select. ## Consequences diff --git a/docs/dev/adrs/accepted/fit-mode-categories.md b/docs/dev/adrs/accepted/fit-mode-categories.md index 278d3752a..1ced6aa58 100644 --- a/docs/dev/adrs/accepted/fit-mode-categories.md +++ b/docs/dev/adrs/accepted/fit-mode-categories.md @@ -101,50 +101,65 @@ to keep legacy runtime aliases. ## Decision +This ADR is amended by +[`switchable-category-owned-selectors.md`](switchable-category-owned-selectors.md). +The active-sibling design remains, but the selector surface is now the +`FittingMode` category: + +```python +project.analysis.fitting_mode.type = 'sequential' +project.analysis.fitting_mode.show_supported() +project.analysis.fit() +``` + +The selector persists as `_fitting_mode.type`. The old +`analysis.fitting_mode_type`, `show_supported_fitting_mode_types()`, +`show_current_fitting_mode_type()`, and `_fitting.mode_type` surfaces +are superseded. + ### 1. Split fitting configuration from fit execution `Analysis.fit()` becomes the public operation that executes the current fit mode. -Common fitting configuration moves to a dedicated category: +Common fitting configuration lives directly on `Analysis`: ```python -project.analysis.fitting.minimizer_type = 'lmfit (leastsq)' +project.analysis.minimizer.type = 'lmfit (leastsq)' project.analysis.fit() ``` `project.analysis.fit` is no longer a category. It is an action method. -The common `fitting` category owns configuration shared by all fit +The owner-level analysis surface owns configuration shared by all fit modes. Initially this includes: - `minimizer_type` Additional settings that apply to all fit modes can be added here later. Verbosity remains a call-level or project-level concern and does not -need to be persisted in this category. +need a fitting category. -**Single source of truth.** `Analysis.fitting_mode_type` is the only +**Single source of truth.** `Analysis.fitting_mode.type` is the only writable surface for the active mode, and the only place the mode is -stored at runtime. The CIF field `_fitting.mode_type` (§8) is -synthesized directly from `analysis.fitting_mode_type` at serialization -time and applied back to the selector on load. There is no mirror -descriptor on the `fitting` category. This keeps the runtime model free -of duplicated state. +stored at runtime. The CIF field `_fitting_mode.type` (§8) is emitted +from the `FittingMode` category and applied back to that category on +load. There is no mirror descriptor on a `fitting` category. This keeps +the runtime model free of duplicated state. -### 2. Add an owner-level fitting-mode selector +### 2. Add a `FittingMode` selector category -`Analysis` owns the fitting-mode selector, following the existing -switchable-category style used by experiment categories. +`Analysis` owns a `fitting_mode` category whose `.type` selector follows +the common category-owned switchable selector style. -The selector name must start with the public category name. This mirrors -`peak_profile_type` and `show_peak_profile_types()`: the category is -`peak`, and the selected aspect is the peak profile. For fitting, the -category is `fitting`, and the selected aspect is the fitting mode. +The category name is the public noun. The selected value is always +exposed through `.type`, just like `analysis.minimizer.type` and +`experiment.peak.type`. ```python -project.analysis.show_fitting_mode_types() -project.analysis.fitting_mode_type = 'sequential' +project.analysis.fitting_mode.show_supported() +project.analysis.fitting_mode.type = 'sequential' +print(project.analysis.fitting_mode.type) ``` The selector is backed by `FitModeEnum` and accepts: @@ -153,12 +168,14 @@ The selector is backed by `FitModeEnum` and accepts: - `joint` - `sequential` -`show_fitting_mode_types()` should show all fitting modes, mark the -current mode, and describe the execution requirements for each mode. It -should not hide `sequential` simply because the project currently has -only one experiment. Sequential fitting uses one template experiment -plus files from `sequential_fit.data_dir`, so filtering it out based on -experiment count is misleading. +`fitting_mode.show_supported()` should show all fitting modes and +describe the execution requirements for each mode. The active mode is +marked in the table; a separate show-current method is intentionally not +part of the public API. The supported list should not hide `sequential` +simply because the project currently has only one experiment. Sequential +fitting uses one template experiment plus files from +`sequential_fit.data_dir`, so filtering it out based on experiment count +is misleading. The selector changes the active fit mode and controls which mode-specific public categories are visible and serialized. @@ -166,12 +183,11 @@ mode-specific public categories are visible and serialized. Note that this is **not** the same mechanism as `peak_profile_type`. `peak_profile_type` swaps the concrete class behind a single category (`peak`); `fitting_mode_type` swaps which _sibling_ category -(`joint_fit` / `sequential_fit`) is active and visible. The `fitting` -category itself does not change shape. This is a new pattern — call it -the **active-sibling selector** — and it is documented here as a -first-class convention for owners that gate sibling categories on a -run-time choice. Future categories with the same shape should follow the -same naming and lifecycle rules. +(`joint_fit` / `sequential_fit`) is active and visible. This is the +**active-sibling selector** pattern, documented here as a first-class +convention for owners that gate sibling categories on a run-time choice. +Future categories with the same shape should follow the same naming and +lifecycle rules. ### 3. Keep mode-specific categories as flat Analysis siblings @@ -181,14 +197,14 @@ These categories are not nested under `fitting`. Public API: ```python -project.analysis.fitting_mode_type = 'joint' +project.analysis.fitting_mode.type = 'joint' project.analysis.joint_fit.create(experiment_id='sepd', weight=0.7) project.analysis.joint_fit.create(experiment_id='nomad', weight=0.3) project.analysis.fit() ``` ```python -project.analysis.fitting_mode_type = 'sequential' +project.analysis.fitting_mode.type = 'sequential' project.analysis.sequential_fit.data_dir = 'data/d20_scan' project.analysis.sequential_fit.file_pattern = '*.xye' project.analysis.sequential_fit.max_workers = 'auto' @@ -228,7 +244,7 @@ specified deterministically: - A `joint_fit` row whose `experiment_id` does not match any project experiment raises an error before fitting starts. It is not silently pruned, because that would mask user typos. -- Switching `fitting_mode_type` to `joint` does **not** auto-populate. +- Switching `fitting_mode.type` to `joint` does **not** auto-populate. Auto-population happens only at execution time so that intermediate configuration states are never silently mutated. @@ -419,17 +435,16 @@ categories are conditional workflow surfaces. The help output should show common analysis properties and only the category relevant to the active fit mode. -For `single` mode, help should show fitting configuration and the -`fit()` operation, but no joint or sequential category: +For `single` mode, help should show common analysis configuration and +the `fit()` operation, but no joint or sequential category: ```text Properties -fitting +minimizer display Methods fit() -show_fitting_mode_types() ``` For `joint` mode, help should additionally show: @@ -452,12 +467,12 @@ surface should only show categories relevant to the selected mode. ### 8. Serialize common and active mode-specific categories -Persist common fitting configuration in `analysis/analysis.cif` using a -category name that matches the new Python category: +Persist selector categories in `analysis/analysis.cif` using one +`_.type` tag per selector: ```cif -_fitting.minimizer_type "lmfit (leastsq)" -_fitting.mode_type sequential +_minimizer.type "lmfit (leastsq)" +_fitting_mode.type sequential ``` Persist only the active mode-specific category. @@ -465,8 +480,8 @@ Persist only the active mode-specific category. Sequential example: ```cif -_fitting.minimizer_type "lmfit (leastsq)" -_fitting.mode_type sequential +_minimizer.type "lmfit (leastsq)" +_fitting_mode.type sequential _sequential_fit.data_dir "data/d20_scan" _sequential_fit.file_pattern "*.xye" @@ -485,8 +500,8 @@ temperature diffrn.ambient_temperature "^TEMP\s+([0-9.]+)" false Joint example: ```cif -_fitting.minimizer_type "lmfit (leastsq)" -_fitting.mode_type joint +_minimizer.type "lmfit (leastsq)" +_fitting_mode.type joint loop_ _joint_fit.experiment_id @@ -498,8 +513,8 @@ nomad 0.3 Single example: ```cif -_fitting.minimizer_type "lmfit (leastsq)" -_fitting.mode_type single +_minimizer.type "lmfit (leastsq)" +_fitting_mode.type single ``` Inactive mode-specific categories should not be serialized. This avoids @@ -512,12 +527,13 @@ workflow, it is serialized only when the active fitting mode is Deserialization order must be: -1. restore the common `fitting` category -2. read `_fitting.mode_type` -3. set `analysis.fitting_mode_type` -4. restore the active mode-specific category, if present -5. restore active child collections such as `sequential_fit_extract` -6. restore other analysis categories such as aliases and constraints +1. read `_minimizer.type` +2. instantiate and restore `analysis.minimizer` +3. read `_fitting_mode.type` +4. set `analysis.fitting_mode.type` +5. restore the active mode-specific category, if present +6. restore active child collections such as `sequential_fit_extract` +7. restore other analysis categories such as aliases and constraints This mirrors the switchable-category restoration pattern used by experiment categories: the active mode is known before mode-specific @@ -547,8 +563,8 @@ new settings requires an explicit save step. ### Positive - `fit()` has one meaning: execute fitting. -- `fitting` has one meaning: common fitting configuration. -- Fit modes follow the same owner-level selection style as existing +- `minimizer.type` and `fitting_mode.type` live on their categories. +- Fit modes follow the same category-owned selector style as existing switchable categories. - `joint_fit` and `sequential_fit` are visible only when relevant. - Sequential fitting becomes runnable from CLI without a special Python @@ -559,7 +575,7 @@ new settings requires an explicit save step. public surfaces. - CIF structure is flat, explicit, and aligned with public API names. - Mode-specific configuration can grow independently without polluting - the common fitting category. + the common analysis surface. ### Trade-offs @@ -586,11 +602,14 @@ The following public API shapes are replaced by the new design: - `project.analysis.fit.mode` - `project.analysis.fit_sequential(...)` - `project.analysis.joint_fit_experiments` +- `project.analysis.fitting.minimizer_type` +- `project.analysis.minimizer_type` +- `project.analysis.fitting_mode_type` The replacement API is: -- `project.analysis.fitting.minimizer_type` -- `project.analysis.fitting_mode_type` +- `project.analysis.minimizer.type` +- `project.analysis.fitting_mode.type` - `project.analysis.joint_fit` - `project.analysis.sequential_fit` - `project.analysis.sequential_fit_extract` @@ -658,19 +677,18 @@ mode. It weakens help output and makes CIF harder to read. Rejected for the public API. -Although `_fitting.mode_type` is the CIF spelling, the public selector -should follow the existing switchable-category owner style: +Although `_fitting.mode_type` was the original CIF spelling, the public +selector should follow the category-owned switchable selector style: ```python -project.analysis.fitting_mode_type = 'sequential' +project.analysis.fitting_mode.type = 'sequential' ``` -A separate `fitting.mode` descriptor on the runtime `fitting` category -is also rejected: it would duplicate state already held by -`fitting_mode_type`. `_fitting.mode_type` is synthesized at -serialization time instead of being mirrored on a runtime object. +A separate `fitting.mode` descriptor on a runtime category is also +rejected: the accepted category is `fitting_mode`, not a resurrected +`fitting` intermediate. -### Replace the `fitting` category object per fit mode +### Replace a fitting category object per fit mode Rejected. @@ -679,12 +697,12 @@ directly, but switching by assigning a property on the object being replaced creates stale-reference hazards: ```python -fitting = project.analysis.fitting -project.analysis.fitting_mode_type = 'sequential' -# fitting may now point to the old object +mode_config = project.analysis.single_fit +project.analysis.fitting_mode.type = 'sequential' +# mode_config may now point to an inactive object ``` -Keeping `fitting` stable and adding active sibling mode categories gives +Keeping mode-specific categories as active siblings on `Analysis` gives better long-term API stability. ### Persist inactive mode-specific categories @@ -740,10 +758,10 @@ follow-up design topics that may need future ADRs if behaviour changes. token `auto`. Open: when CLI overrides resolve `auto` to a concrete integer for one run, is that integer ever written back, or is the token always preserved on disk regardless of runtime resolution? -- **Serialization order for `_fitting.*`.** \u00a79 specifies - deserialization order. Open: pin serialization order too (mode first, - then `minimizer_type`, then mode-specific siblings) so generated files - are stable for diffing? +- **Serialization order for selector categories.** \u00a79 specifies + deserialization order. Open: pin serialization order too (minimizer + type first, then fitting-mode type, then mode-specific siblings) so + generated files are stable for diffing? - **Failure mid-sequential-run.** Open: if `fit()` fails partway through a sequential scan, what is the state of `analysis/results.csv` and the persisted `sequential_fit` \u2014 resumable, discarded, or left as-is @@ -776,6 +794,10 @@ follow-up design topics that may need future ADRs if behaviour changes. - Optional `single_fit` category if single-mode-specific settings are introduced. -- A separate ADR for changing switchable category selectors globally - from owner-level names such as `peak_profile_type` toward - category-owned selectors such as `peak.profile_type`. + +## Resolved Follow-Up Work + +- [`switchable-category-owned-selectors.md`](switchable-category-owned-selectors.md) + changes switchable selectors globally from owner-level names such as + `peak_profile_type` toward category-owned selectors such as + `peak.type`. diff --git a/docs/dev/adrs/accepted/fit-results-display-naming.md b/docs/dev/adrs/accepted/fit-results-display-naming.md new file mode 100644 index 000000000..afcf508a5 --- /dev/null +++ b/docs/dev/adrs/accepted/fit-results-display-naming.md @@ -0,0 +1,336 @@ +# ADR: Fit Results Display Naming Convention + +## Status + +Accepted. + +## Date + +2026-05-25 + +## Group + +User-facing API. + +## Context + +`project.display.fit.results()` and `project.display.posterior.*` (see +[`display-ux.md`](display-ux.md)) currently emit fit-result tables with +inconsistent and sometimes long column headers across the two fitting +modes: + +- **LSQ:** `📈 Fitted parameters:` with columns + `start | fitted | uncertainty | change`. +- **Bayesian:** two tables. + - `📈 Committed parameters:` with columns + `start | best posterior sample | uncertainty | change`. + - `📊 Posterior parameter summaries:` with columns + `median | 95% interval | r-hat | ess bulk`. + +Three problems: + +1. `best posterior sample` (21 chars) is too wide for HTML / markdown + layouts and forces the other columns into narrow space. +2. `uncertainty` is the column header in both LSQ and Bayesian committed + tables but the underlying quantities differ (covariance-derived σ vs + posterior SD). The display layer does not annotate the difference. +3. LSQ's `fitted` and Bayesian's `best posterior sample` are + conceptually parallel (the value committed back to the project) but + the headers do not signal that parallelism, complicating side-by-side + reading. + +Two conventions guide the cross-method naming choice: + +- **IUCr CIF** prefers the `_su` suffix (standard uncertainty); `_esd` + (estimated standard deviation) is deprecated. +- **GUM** (Guide to the Expression of Uncertainty in Measurement) treats + Bayesian posterior SD and frequentist standard uncertainty as the same + physical quantity — 1σ of the inferred distribution of the measurand. + +Both converge on `s.u.` as the appropriate cross-method label. + +[`display-ux.md`](display-ux.md) defines facade method names but not +column headers or footnotes; +[`iucr-cif-tag-alignment.md`](iucr-cif-tag-alignment.md) defines +persisted CIF tag names but not display labels; +[`analysis-cif-fit-state.md`](analysis-cif-fit-state.md) defines Python +and CIF attribute names but not user-visible labels. Display naming for +fit-results tables is a real gap. + +## Decision + +### 1. Short headers paired with a footnote glossary + +Every fit-results table emits a glossary block immediately below the +table that expands the short column headers into one-line descriptions. +The footnote disambiguates per fitting mode so the column header itself +can stay short. + +### 2. Cross-method consistency where the physical quantity is the same + +Same column header where the underlying physical quantity matches: + +- `start` — initial parameter value, both modes. +- `value` — refined / committed value, both modes. +- `s.u.` — 1σ standard uncertainty, both modes (covariance for LSQ, + posterior SD for Bayesian; same physical meaning per GUM). +- `change` — `value − start`, both modes. + +Different headers only for Bayesian-only quantities (no LSQ analogue): +`median`, `95% CI`, `r-hat`, `ess bulk`. + +### 3. Canonical column layouts and titles + +**LSQ — `📈 Refined parameters:`** + +``` +| datablock | category | entry | parameter | units | start | value | s.u. | change | +``` + +Footnote: + +``` +start = parameter value before refinement +value = refined value from least-squares minimization +s.u. = standard uncertainty (1σ), from the covariance matrix +change = relative change from start, in %; ↑ = increase, ↓ = decrease +``` + +**Bayesian — `📈 Committed parameters:`** (title unchanged) + +``` +| datablock | category | entry | parameter | units | start | value | s.u. | change | +``` + +Footnote: + +``` +start = parameter value before sampling +value = estimate written back to the project (best posterior sample) +s.u. = standard uncertainty (1σ), the posterior standard deviation +change = relative change from start, in %; ↑ = increase, ↓ = decrease +``` + +**Bayesian — `📊 Posterior distribution:`** + +``` +| datablock | category | entry | parameter | units | median | 95% CI | r-hat | ess bulk | +``` + +Footnote: + +``` +median = 50th percentile of the marginal posterior +95% CI = 95% credible interval (2.5%–97.5%, asymmetric) +r-hat = Gelman–Rubin diagnostic (good convergence: r-hat ≤ 1.01) +ess bulk = bulk effective sample size (typically ≥ 400) +``` + +### 4. Title changes from the current implementation + +- `📈 Fitted parameters:` → `📈 Refined parameters:` (IUCr-style + "refinement" wording, also matches the cross-method `value` column). +- `📈 Committed parameters:` stays unchanged — the duality of "committed + values" vs "posterior distribution" is meaningful and worth preserving + on the Bayesian side. +- `📊 Posterior parameter summaries:` → `📊 Posterior distribution:` + (shorter and explicit about what the second table shows). + +### 5. Chart legend convention + +Chart legends use the full footnote-form name where the chart has +horizontal space. Where the plot title already signals context (e.g. +"Posterior distribution of "), legends may shorten to the +table-header form: + +- Posterior distribution plots: `estimate`, `median`, + `95% credible interval`. +- Measured-vs-calculated plots: `measured`, `calculated`. + +Existing chart legends that describe plot **type** (e.g. +`Marginal density`, `Posterior contours`, `Posterior samples`) are not +parameter-value labels and are out of scope for this ADR. + +### 6. Internal attribute names unchanged + +`Parameter.value`, `Parameter.uncertainty`, +`Parameter.posterior_uncertainty`, and every persisted CIF tag stay as +they are. This ADR governs **display strings only**, not the Python or +CIF API. + +## Addendum (2026-05-25): Fit-results table replaces emoji-line summary + +The original ADR specified two parameter-level tables for Bayesian fits +(`Committed parameters`, `Posterior distribution`) and one for LSQ +(`Refined parameters`), each below an emoji-line summary block +(`✅ Success: True`, `📏 Goodness-of-fit (reduced χ²): 1.29`, …). In +practice the emoji-line block grew long, mixed multi-value lines +(`📊 Convergence: status=passed, max_r_hat=1.004, …`) with single-value +lines (`📏 R-factor (Rf): 5.65%`), and split related information across +visually-different formats. + +The block is now rendered as **one additional 2-column table** per fit +method, sitting directly above the parameter tables: + +- LSQ: `📋 Least-squares fit results:` — title. +- Bayesian: `📋 Bayesian fit results:` — title. + +Column layout: `Metric | Value`, left/right alignment. Each row carries +one emoji-prefixed metric name in the first column and one scalar value +in the second. The previous `console.paragraph('Fit results')` / +`console.paragraph('Bayesian fit results')` section header is dropped — +the table title now signals the section. + +Canonical row order (top-to-bottom): + +1. `🧪 Minimizer` / `🧪 Sampler` — the minimizer.type string (e.g. + `lmfit (leastsq)`, `bumps (dream)`). +2. `✅ Overall status` — single shared value vocabulary: `success` / + `failed`. For LSQ this mirrors `FitResults.success`. For Bayesian + this is `success` only when the sampler completed _and_ convergence + passed, else `failed`. Per-metric convergence detail goes in rows + 12–16 below. +3. `💬 Engine message` _(Bayesian, optional)_ — the engine's free-form + status message, e.g. `DREAM sampling completed`. +4. `⏱️ Fitting time (seconds)` — `fitting_time`. +5. `🔁 Iterations` _(LSQ, optional)_ — shown only when + `FitResults.iterations > 0`. +6. `📏 Goodness-of-fit (reduced χ²)` — `reduced_chi_square`. 7–10. + `📏 R-factor (Rf, %)`, `📏 R-factor squared (Rf², %)`, + `📏 Weighted R-factor (wR, %)`, `📏 Bragg R-factor (BR, %)` — each + row when the corresponding inputs are available. Units appear in the + metric name, so the value cell holds a bare number. (R-factors come + immediately after goodness-of-fit and before `Best log-posterior` — + both methods agree on this order.) +7. `📉 Best log-posterior` _(Bayesian, optional)_ — shown when + `best_log_posterior is not None`. 12–16. _(Bayesian only)_ + Convergence rows derived from `convergence_diagnostics`: - + `📊 Convergence status` — `passed` / `failed`. - `📊 Max r-hat` — + formatted to 3 decimals. - `📊 Min ess bulk` — formatted to 1 + decimal. - `📊 Draws per chain`. - `📊 Chains`. + +The shared-vocabulary `success` / `failed` for `Overall status` is +intentional cross-method consistency: a reader scanning LSQ and Bayesian +outputs side-by-side sees the same status word in the same row position +regardless of method. Bayesian-specific nuance (sampler completed but +convergence flagged, etc.) is exposed in the convergence rows below. + +**Rows dropped relative to the previous emoji-line summary:** + +- `🎯 Committed point estimate: Best posterior sample` — already + documented by the `Committed parameters` table footnote + (`value = estimate written back to the project (best posterior sample)`). +- `🔁 Sampler completed: yes` — redundant with `Overall status`. +- `⚙️ Sampler settings: steps=…, burn=…, …` — already in the + `Settings used` table above the fit-results table. +- The derived `samples = n_draws × n_chains` count — derived from the + `Draws per chain` and `Chains` rows immediately below. + +**Table-title icons.** The four fit-output tables now carry a +distinguishing icon in their title so the four blocks are visually +separable when scrolling: + +| Table | Title prefix | +| --------------------------------- | ------------------------------------------------------------ | +| Minimizer settings | `⚙️ Settings used:` | +| Fit-method summary | `📋 Least-squares fit results:` / `📋 Bayesian fit results:` | +| Committed values | `📈 Refined parameters:` / `📈 Committed parameters:` | +| Posterior summary (Bayesian only) | `📊 Posterior distribution:` | + +The icons are also the same emoji used inside the rows of the +corresponding fit-results-summary table (📏 for goodness-of-fit metrics, +📊 for convergence diagnostics), so the visual language is internally +consistent. + +**Internal-implementation note.** Helper `print_metrics_table(rows)` in +`easydiffraction.utils.utils` renders the new 2-column table from a list +of `[label, value]` rows. Both `reporting.FitResults.display_results()` +and `bayesian.BayesianFitResults.display_results()` build their rows via +a `_build_fit_results_rows()` instance method and feed +`print_metrics_table()`. The shared signature keeps the two methods +structurally parallel. + +## Consequences + +### Positive + +- Tables fit standard HTML / markdown width without truncating the + formerly 21-character `best posterior sample` column. +- Users can compare LSQ and Bayesian results column-by-column (`start`, + `value`, `s.u.`, `change` line up identically). +- IUCr / GUM-aligned terminology. +- The inline footnote glossary gives non-programmer users a + discoverability path without having to leave the table to read + external docs. +- Setting the convention in an ADR keeps future fit-result tables (a new + sampler, an alternative refinement strategy) on the same naming. + +### Trade-offs + +- Existing tutorials, tests, and integration outputs that pin the + literal strings `Fitted parameters`, `Posterior parameter summaries`, + `fitted`, `best posterior sample`, `uncertainty`, `95% interval` need + updating in the implementation PR. +- `s.u.` is unfamiliar to readers who do not know GUM or IUCr CIF. The + footnote covers this; the compactness win at the column header is the + main argument. + +### ADRs related to this ADR + +None directly amended. This ADR complements: + +- [`display-ux.md`](display-ux.md) — defines facade method names; this + ADR fills in the column-header layer underneath. +- [`iucr-cif-tag-alignment.md`](iucr-cif-tag-alignment.md) — defines + persisted CIF tag names; this ADR is the matching display-time label + layer. +- [`analysis-cif-fit-state.md`](analysis-cif-fit-state.md) — defines + Python / CIF attribute names (e.g. `Parameter.uncertainty`, + `posterior_uncertainty`); display headers map to those without + renaming them. + +## Alternatives Considered + +### A. Keep `uncertainty` as the column header for both modes + +Pros: zero changes. Cons: ambiguous in Bayesian context (users may +confuse it with the 95% credible interval below); inconsistent with the +IUCr CIF `_su` convention; reinforces the wider `best posterior sample` +problem because it does not solve the layout issue. + +### B. `posterior SD` for Bayesian, `uncertainty` for LSQ + +Pros: explicit on the Bayesian side. Cons: different column headers for +the same physical quantity (1σ width), breaking the column-by-column +comparison; longer (10 chars vs 4 for `s.u.`). + +### C. Different headers for the committed-value column (`refined` vs + +`estimate` vs `value`) + +Three different headers for "the value committed to the project". Pros: +each method-accurate. Cons: breaks the cross-method consistency goal; +readers seeing `refined` next to `estimate` in side-by-side tables +wonder what the semantic difference is even though the underlying +quantity is the same. Decision: use neutral `value` everywhere, let the +footnote disambiguate. + +### D. Single Bayesian table covering both committed values and + +posterior summary + +Pros: one table to read. Cons: nine value columns plus identity columns +exceed standard HTML width and truncate. The two-table split is forced +by layout and meaningfully preserves the "what did I commit" vs "what +does the posterior look like" duality. + +## Deferred Work + +- The `acceptance rate` column in the posterior distribution table. Not + displayed by default today; a future ADR can decide whether it joins + the canonical layout or stays in a verbose mode. +- Inline footnote text vs Markdown link to a docs-site glossary. Inline + is the initial form; promotion to a glossary page is a future ADR if + footnote lengths grow. +- Localisation. All display strings are English; non-English UIs are out + of scope. diff --git a/docs/dev/adrs/accepted/help-discoverability.md b/docs/dev/adrs/accepted/help-discoverability.md index 8fd019f39..a3a1abace 100644 --- a/docs/dev/adrs/accepted/help-discoverability.md +++ b/docs/dev/adrs/accepted/help-discoverability.md @@ -9,14 +9,14 @@ Accepted and implemented. EasyDiffraction is used by scientists who often explore the API in notebooks. The main object graph already exposes many focused objects: projects, project metadata, structures, experiments, categories, -parameters, analysis helpers, summaries, and display facades. Users need -a consistent way to discover the next useful operation from any of these +parameters, analysis helpers, reports, and display facades. Users need a +consistent way to discover the next useful operation from any of these objects without reading source code. Most model objects inherit `GuardedBase`, `CategoryItem`, `CategoryCollection`, `DatablockItem`, or `DatablockCollection`, which already provide `help()` output. Plain facade classes such as display -namespaces and summaries do not inherit those base classes, so they need +namespaces and reports do not inherit those base classes, so they need the same discovery behavior explicitly. ## Decision @@ -28,7 +28,7 @@ includes: - category items and category collections - datablock items and datablock collections - project-level objects such as `Project`, `ProjectInfo`, `Analysis`, - `Summary`, and `Rendering` + `Report`, and `Rendering` - display facades such as `project.display`, `project.display.parameters`, `project.display.fit`, `project.display.posterior`, and `analysis.display` @@ -52,7 +52,7 @@ project.help() project.display.help() project.display.parameters.help() project.analysis.display.help() -project.summary.help() +project.report.help() project.experiments.help() project.experiments['hrpt'].help() project.experiments['hrpt'].background.help() diff --git a/docs/dev/adrs/accepted/iucr-cif-tag-alignment.md b/docs/dev/adrs/accepted/iucr-cif-tag-alignment.md new file mode 100644 index 000000000..d36193346 --- /dev/null +++ b/docs/dev/adrs/accepted/iucr-cif-tag-alignment.md @@ -0,0 +1,1164 @@ +# ADR: IUCr CIF Tag Alignment + +**Status:** Accepted +**Date:** 2026-05-26 + +Reframes the earlier "IUCr CIF Tag Alignment for Fit Outputs" suggestion +(2026-05-24, PR #181) into a tiered policy. The default saved CIFs stay +optimised for day-to-day UX; a separate IUCr export path produces clean +report CIFs on demand. Amends parts of +[`analysis-cif-fit-state.md`](analysis-cif-fit-state.md) and +[`minimizer-input-output-split.md`](minimizer-input-output-split.md); +runs alongside the +[`python-cif-category-correspondence.md`](python-cif-category-correspondence.md) +suggestion (Python-side correspondence). + +Grounded in: + +- COMCIFS + [`cif_core.dic`](https://raw.githubusercontent.com/COMCIFS/cif_core/main/cif_core.dic) + v3.4.0 (2026-05-05; 787 `_alias.definition_id` entries). +- COMCIFS + [`cif_pow.dic`](https://raw.githubusercontent.com/COMCIFS/Powder_Dictionary/master/cif_pow.dic) + v2.5.0 (2026-05-19; 180 `_alias.definition_id` entries). + +Both reference dictionaries are DDLm CIF_2.0 files. Their canonical +identifiers are dotted (`_definition.id '_pd_instr.geometry'`); the +legacy DDL1 underscore form is recorded as `_alias.definition_id` on +every item. The project emits the **dotted DDLm form universally** +(default save and IUCr export) and accepts both forms on read via the +dictionaries' alias tables. + +A corpus of 10 published IUCr submission CIFs in `tmp/iucr-cifs/` +covering single-crystal (X-ray, neutron) and powder (lab X-ray, neutron, +synchrotron) refinements informs the **structural** decisions in §2 — +multi-datablock layout, `data_global` content patterns, reflection and +profile loop column sets, GSAS-II Rietveld block split. The corpus is +**not** authoritative for tag form, casing, or item names when it +disagrees with the reference dictionaries; example CIFs in the wild are +commonly produced by tooling that has not yet caught up with the current +DDLm spec. The dictionaries are the source of truth. + +The submission-specific publication dictionary (`cif_publ.dic`) is not +consulted directly. The v1 report CIF deliberately avoids empty journal +and author template fields; the retained global metadata comes from +`cif_core.dic` items such as `_audit.*`, `_computing.*`, and +`_chemical_formula.*`. Deferred journal/publication tags are listed in +[`project-summary-rendering.md`](project-summary-rendering.md) §5.1. + +## Context + +EasyDiffraction saves project state into per-domain CIF files +(`project.cif`, `structures/.cif`, `experiments/.cif`, +`analysis/analysis.cif`). Two pressures act on the choice of category +and item names: + +- **External interop.** Some current names diverge from the published + IUCr dictionaries. External tooling (publCIF, checkCIF, pdCIFplotter, + journal submission pipelines) cannot consume the diverging fields + without a custom mapping layer. +- **Day-to-day UX.** Users switch between Python and direct CIF editing + in a CLI. Some IUCr-canonical structures are awkward for hand editing + — submission templates often include multi-datablock layouts with + `data_global` metadata and TOF calibration as a coefficient loop + indexed by integer `power`. The v1 report keeps the structural pieces + and omits empty `_publ_*` / `_journal_*` placeholders. Parametric + profile shape (Caglioti, FCJ, TOF sigma/gamma) has no IUCr counterpart + at all. + +A blanket "align with IUCr everywhere" policy pays a UX cost the project +does not need to absorb for files that are not submission targets. A +blanket "keep current names everywhere" policy gives up external interop +entirely. The chosen design splits along these two pressures. + +Two earlier ADRs already touch this surface: + +- [`loop-category-key-identity.md`](loop-category-key-identity.md) pins + loop-key naming on COMCIFS conventions. +- [`python-cif-category-correspondence.md`](python-cif-category-correspondence.md) + catalogues Python-vs-CIF category mismatches and chooses which side + should bend. + +This ADR closes the remaining gap on the **CIF side**. + +## Scope + +In scope: + +- A tiered category-and-item-name policy for the default save, split by + domain (structure / analysis / experiment). +- A new IUCr export path that produces a single clean report CIF on + demand, separate from the default save. +- ADP write-side single-tag emission and casing alignment in the + structure tier. +- Loop-tag style policy: dotted DDLm form universally on write, both + dotted and underscore forms accepted on read. +- The per-descriptor mechanism that wires both write paths (`iucr_name` + on `CifHandler`, category-level `IucrCategoryTransformer` for + structural reshapings). +- Multi-datablock layout in the IUCr export, including the `data_global` + audit, software, and chemistry metadata block. + +Out of scope: + +- Python attribute renames. This ADR changes CIF emission only. + Cross-reference + [`python-cif-category-correspondence.md`](python-cif-category-correspondence.md) + for Python-side decisions. +- Adding new CIF categories the project does not currently track + (`_chemical.*`, `_publ.*`, `_journal.*`) **for the default save**. The + IUCr export derives `_chemical_formula.*` for the report only and does + not emit `_publ_*` / `_journal_*` placeholders in v1. +- imgCIF (`cif_img.dic`); no raw image persistence path exists. +- Project-level singleton categories `_info.*`, `_rendering_plot.*`, + `_rendering_table.*`, `_verbosity.*` — out of scope here; see + `python-cif-category-correspondence`. + +## Design Philosophy: Tiered Default Save + Separate IUCr Export + +Each saved file lives in a directory whose name already scopes its +contents. `structures/.cif` is unambiguously structural; +`experiments/.cif` is unambiguously experimental; +`analysis/analysis.cif` is unambiguously analytic. The file path does +the disambiguation that a category prefix would otherwise carry. That +observation drives the policy: + +- **Structure tier** — align category and item names with IUCr verbatim + (with casing fixes). Crystallographic CIF names have decades of + literature backing; hand-editors recognise them. +- **Analysis tier** — keep all fit-output statistics under + topology-neutral `_fit_result.*` in `analysis/analysis.cif`, with + **item** names matching dictionary casing (uppercase R / wR / DOI, + etc.). The per-topology category split into `_refine_ls.*` / + `_pd_proc_ls.*` / `_reflns.*` happens only in the IUCr export, where + the experiment family is known per block. This sidesteps the + schema-choice problem for joint and sequential fits described in the + analysis-cif-fit-state ADR. Project-specific + minimizer/sampler/Bayesian scaffolding stays under the current + category names — file-scoped to `analysis/analysis.cif`, no namespace + prefix. +- **Experiment tier** — keep current UX-friendly names (`_instr.*`, + `_peak.*`, `_background.*`, `_expt_type.*`). The pdCIF + instrument/calibration model is awkward (radiation as a loop, TOF as a + coefficient loop indexed by integer `power`), and pdCIF has no + parametric peak-shape items at all. File path scopes them; no prefix + needed. +- **Reports** — a separate `project.report` facade that pulls live + Python state and emits journal report artifacts under `reports/`. The + IUCr CIF one-off method is `project.report.save_cif()`; the regular + `project.save()` call emits configured reports from the + `project.report.{cif,html,tex,pdf}` booleans. This path applies all + IUCr renames, structural reshapings, multi-datablock layout, and + project-extension namespacing (`_easydiffraction_*`). It replaces the + unimplemented `project.summary` placeholder. **Export only — no + round-trip.** + +## Current State + +Project CIF categories audited against `cif_core.dic` v3.4.0 and +`cif_pow.dic` v2.5.0. The "Default-save tier" column shows whether the +category changes in the default save; the "IUCr export" column shows the +dotted DDLm tag emitted by the IUCr CIF report writer. + +| Category (current) | IUCr dictionary | Default-save tier | IUCr export (dotted DDLm) | +| ----------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------- | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `_cell.*` | core | Structure — unchanged | `_cell.length_a`, `_cell.angle_alpha`, etc. | +| `_atom_site.*` (most fields) | core | Structure — unchanged | `_atom_site.label`, `_atom_site.fract_x`, … | +| `_atom_site.adp_type` | core (`_atom_site.ADP_type`) | Structure — casing fix | `_atom_site.ADP_type` (uppercase ADP per dictionary). | +| `_atom_site.wyckoff_letter` | core (`_atom_site.Wyckoff_symbol`) | Structure — rename | `_atom_site.Wyckoff_symbol` (uppercase W, "symbol" not "letter"). | +| `_atom_site.B_iso_or_equiv` / `U_iso_or_equiv` | core | Structure — single-tag emit | `_atom_site.B_iso_or_equiv` xor `_atom_site.U_iso_or_equiv` per row, based on `_atom_site.ADP_type`. | +| `_atom_site_aniso.B_*` / `U_*` | core | Structure — single-tag emit | `_atom_site_aniso.B_*` xor `_atom_site_aniso.U_*` per row. | +| `_space_group.name_h_m` | core (`_space_group.name_H-M_alt`) | Structure — casing fix | `_space_group.name_H-M_alt`. | +| `_space_group.it_coordinate_system_code` | core (`_space_group.IT_coordinate_system_code`) | Structure — casing fix | `_space_group.IT_coordinate_system_code`. | +| symmetry operations | core (`_space_group_symop.*`) | (not emitted today) | `_space_group_symop.id` + `_space_group_symop.operation_xyz` loop alongside the H-M name. | +| `_diffrn.ambient_temperature`, `ambient_pressure` | core | Experiment — unchanged | `_diffrn.ambient_temperature`, `_diffrn.ambient_pressure`. | +| `_diffrn.ambient_magnetic_field`, `ambient_electric_field` | none | Experiment — unchanged | `_easydiffraction_diffrn.ambient_magnetic_field`, `…electric_field` (project extension). | +| `_refln.*` | core | (no default save under refln) | `_refln.*` reflections loop (column set differs by domain — see §2.3). | +| `_pd_meas.*`, `_pd_proc.*`, `_pd_calc.*`, `_pd_data.*` | pdCIF | Experiment — unchanged | `_pd_meas.*`, `_pd_proc.*`, `_pd_calc.*` profile-data loop (see §2.3). | +| `_pd_background.*` | pdCIF | Experiment — unchanged | `_pd_background.*`. | +| `_pd_phase_block.*` | pdCIF | Experiment — unchanged | `_pd_phase_block.*`. | +| `_sc_crystal_block.*` | community (no IUCr counterpart) | Experiment — unchanged | `_easydiffraction_sc_crystal_block.*` in IUCr export. | +| `_instr.wavelength` | core (`_diffrn_radiation_wavelength.value`) | Experiment — unchanged | `_diffrn_radiation_wavelength.{id, value, wt}` — single-row category for monochromatic; loop only for multi-λ. | +| `_instr.2theta_offset` | pdCIF (`_pd_calib.2theta_offset`) | Experiment — unchanged | `_pd_calib.2theta_offset`. | +| `_instr.2theta_bank`, `d_to_tof_*` | pdCIF (`_pd_calib_d_to_tof.*` loop) | Experiment — unchanged | Four-row loop `_pd_calib_d_to_tof.{id, coeff, power, coeff_su, diffractogram_id}`. | +| `_peak.*` (parametric profile shape) | none (pdCIF has no shape parameters) | Experiment — unchanged | `_easydiffraction_peak.*` + `_pd_proc_ls.profile_function` free-text descriptor. | +| `_extinction.*` | core (`_refine_ls.extinction_*` items) | Experiment — unchanged | `_easydiffraction_extinction.*` + dual emit `_refine_ls.extinction_{method,coef,expression}`. | +| `_excluded_region.*` | pdCIF (`_pd_proc.info_excluded_regions` free-text) | Experiment — unchanged | `_easydiffraction_excluded_region.*` + `_pd_proc.info_excluded_regions` free-text rendering. | +| `_expt_type.*` | none | Experiment — unchanged | `_easydiffraction_experiment_type.*`. | +| `_calculator.type`, `_minimizer.type` | none | Analysis — unchanged | Selection fields remain settings only; identity is read from `analysis.software` for `_easydiffraction_software.{framework, calculator, minimizer}` and `_computing.structure_refinement`. | +| `_software.*` | none | Analysis — new provenance category | Source for `_easydiffraction_software.{framework, calculator, minimizer}`, `_easydiffraction_software.fit_datetime`, and `_computing.structure_refinement` in `data_global`. | +| `_minimizer.*` settings (tolerances, max_iter, …) | none | Analysis — unchanged | `_easydiffraction_minimizer.*` (settings only, separate from the identification triple). | +| `_fitting_mode.type`, `_background.type` | none | Analysis / Experiment — unchanged | `_easydiffraction_fitting_mode.type`, `_easydiffraction_background.type` selectors. | +| `_fit_result.reduced_chi_square`, `n_data_points`, `n_parameters` | core (`_refine_ls.*`) and pdCIF (`_pd_proc_ls.*`) | Analysis — unchanged (topology-neutral) | Shape-shifting per topology: see §1.2 and §3 transformers. | +| `_fit_result.*` (R-factors, counts, profile/background function) | core / pdCIF | Analysis — new fields under `_fit_result.*` | IUCr export remaps to per-topology `_refine_ls.*` / `_pd_proc_ls.*`; item names already match dictionary casing (§1.2). | +| `_fit_result.*` (Bayesian diagnostics, success, message, fitting_time, iterations, result_kind) | none | Analysis — unchanged | `_easydiffraction_fit_result.*`. | +| `_fit_parameter`, `_fit_parameter_correlation` | none / partial | Analysis — unchanged | `_easydiffraction_fit_parameter*` (no IUCr counterpart for per-parameter posterior). | +| `_alias`, `_constraint` | none | Analysis — unchanged | `_easydiffraction_alias*`, `_easydiffraction_constraint*`. | +| `_joint_fit`, `_sequential_fit*` | none | Analysis — unchanged | `_easydiffraction_joint_fit*`, `_easydiffraction_sequential_fit*`. | +| reflection-set aggregates | core (`_reflns.*`) | Analysis — new fields | `_reflns.number_total`, `_reflns.number_gt`, `_reflns.threshold_expression` (e.g. `'I>3\s(I)'`). | +| report metadata | core (`_audit.*`, `_computing.*`, `_chemical_formula.*`) and project extension (`_easydiffraction_software.*`) | Analysis / derived report state | Emitted in `data_global` block per §2.3a. Empty `_journal.*`, `_publ_*`, and `_pd_meas.info_author_*` placeholders are excluded by the clean-report policy in `project-summary-rendering.md` §5. | +| analysis-stack identification | core (`_computing.structure_refinement`) | Analysis — `_software.*` persisted | `_easydiffraction_software.{framework, calculator, minimizer}` triple + `_easydiffraction_software.fit_datetime` + `_computing.structure_refinement` derived from `analysis.software`. | + +## Decision + +### 1. Three-tier default save + +#### 1.1 Structure tier — IUCr alignment + casing fixes + +In `structures/.cif`: + +- Rename `_atom_site.adp_type` → `_atom_site.ADP_type` (uppercase ADP). +- Rename `_atom_site.wyckoff_letter` → `_atom_site.Wyckoff_symbol` + (uppercase W, "symbol" not "letter"). The dictionary item + `_atom_site.Wyckoff_letter` does not exist; the "letter" form lives in + a different category (`_space_group_Wyckoff.letter`). +- Rename `_space_group.name_h_m` → `_space_group.name_H-M_alt` + (uppercase hyphenated H-M, with `_alt` suffix per dictionary). +- Rename `_space_group.it_coordinate_system_code` → + `_space_group.IT_coordinate_system_code` (uppercase IT). +- ADP single-tag emission per row (see §4). +- All other `_cell.*`, `_atom_site.*`, `_atom_site_aniso.*`, + `_space_group.*` items already match IUCr verbatim — unchanged. + +Python attribute names stay lowercase (`atom_site.adp_type`, +`atom_site.wyckoff_letter`, `space_group.name_h_m`, +`space_group.it_coordinate_system_code`). Only emitted CIF tags change. + +#### 1.2 Analysis tier — topology-neutral `_fit_result.*`, IUCr renaming on export only + +In `analysis/analysis.cif`: + +- **All fit-output statistics stay under topology-neutral + `_fit_result.*` in the default save.** The IUCr-side category split + into coreCIF `_refine_ls.*` (single-crystal) versus pdCIF + `_pd_proc_ls.*` (powder) happens **only** in the IUCr export (§2 / §3 + transformers). This avoids the deterministic- schema problem + reviewer-flagged for joint and sequential fits: one + `analysis/analysis.cif` can describe a refinement that spans multiple + experiments with different sample forms, so a per-experiment-driven + schema choice cannot be made at the project level. Topology-neutral + `_fit_result.*` round-trips cleanly under + [`analysis-cif-fit-state.md`](analysis-cif-fit-state.md)'s single + common projection. +- Existing items keep their names verbatim + (`_fit_result.reduced_chi_square`, `_fit_result.n_data_points`, + `_fit_result.n_parameters`, …). The IUCr export remaps them to the + dictionary-canonical tags per topology (see §2 and §3). +- **New items added under `_fit_result.*`** using dictionary-canonical + _item names_ (with the project-side category prefix preserved): + - `_fit_result.R_factor_all`, `_fit_result.wR_factor_all` (uppercase R + / wR matching the coreCIF item names). + - `_fit_result.R_factor_gt`, `_fit_result.wR_factor_gt` + (observed-reflection subsets). + - `_fit_result.prof_R_factor`, `_fit_result.prof_wR_factor`, + `_fit_result.prof_wR_expected` (powder-only, derived from profile + residuals). + - `_fit_result.number_restraints`, `_fit_result.number_constraints` + (written only when positive). + - `_fit_result.profile_function`, `_fit_result.background_function` + (powder; free-text descriptors of the active peak and background + categories). + - `_fit_result.threshold_expression`, + `_fit_result.number_reflns_total`, `_fit_result.number_reflns_gt` — + required to make the `_gt` R-factor pair interpretable. Fields that + are not meaningful for a given experiment family (e.g., + `prof_R_factor` for a single-crystal-only refinement) are omitted + from `_fit_result.*`; the IUCr export omits them per block. Live + deterministic fit results may still carry runtime-only convergence + diagnostics such as `shift_over_su_max` and `shift_over_su_mean`, + but those are not part of the default `analysis/analysis.cif` + fit-result projection. +- Bayesian diagnostics, success/message/iterations/fitting*time, + `result_kind`, `point_estimate_name`, fit-parameter posterior + summaries, and the `_alias` / `_constraint` / `_joint_fit` / + `\_sequential_fit*`registries — **stay under their current category names**. File-scoping to`analysis/analysis.cif`carries the disambiguation; no`\_easydiffraction\*\*` + prefix is added in the default save. +- The `_minimizer.*`, `_fitting_mode.*`, `_calculator.*` selectors stay + under their current names for the same reason. + +The dictionary-canonical tag _form_ (uppercase R / wR / DOI, etc.) is +preserved in the _item_ names under `_fit_result.*`. Only the _category_ +prefix changes between default save (`_fit_result.*`) and IUCr export +(`_refine_ls.*` / `_pd_proc_ls.*` / `_reflns.*` per block). + +#### 1.3 Experiment tier — no changes in the default save + +In `experiments/.cif`: keep every current category and item name. +Specifically: + +- `_instr.*`, `_peak.*`, `_background.*` / `_pd_background.*`, + `_pd_phase_block.*`, `_sc_crystal_block.*`, `_extinction.*`, + `_excluded_region.*`, `_expt_type.*`, `_diffrn.*`, + `_pd_meas/proc/calc/data.*`, `_refln.*` — all unchanged. +- Wavelength stays the single scalar `_instr.wavelength`. TOF + calibration stays the four scalar items + `_instr.d_to_tof_{offset, linear, quad, recip}`. The structural + reshapings happen only in the IUCr export (§2). +- Parametric profile shape (`_peak.*` Caglioti / Lorentzian / FCJ / TOF + coefficients) stays under `_peak.*`. + +### 2. Reports — IUCr-aligned report CIF + +#### 2.1 API + +```python +project.save() # project files + configured reports +project.report.save_cif() # one-off reports/.cif +project.report.save() # write configured reports only +``` + +`project.summary` (currently an unimplemented placeholder) is removed +and replaced by `project.report` — a facade slot that owns journal +report generation. The `project.report.{cif,html,tex,pdf}` booleans +control which reports `project.save()` emits. Per-format methods +(`save_cif()`, `save_html()`, `save_tex()`, `save_pdf()`) write one-off +artifacts without changing that configuration. The no-arg +`project.report.save()` uses those booleans and raises `ValueError` when +no formats are enabled. + +#### 2.2 Output location + +A **single CIF file** at `reports/.cif` inside the project +root. One file per project, regardless of how many structures or +experiments live in it; the published IUCr submission convention is "one +CIF per article, multiple data blocks inside" (corroborated by 10/10 +example files in the corpus). + +```text +/ + project.cif + structures/ + phase1.cif + phase2.cif + experiments/ + pd_neutron.cif + pd_xray.cif + analysis/ + analysis.cif + reports/ # written by report config or save_cif() + .cif # single multi-block IUCr CIF +``` + +##### Worked layout examples + +Concrete project layouts for the four topology types covered in §2.3. +Default-save files are one-per-object as today; the IUCr export +collapses to a single file with topology-driven data blocks. Inline +comments name the categories inside each block. + +**Example A — Single-crystal, single structure (single experiment).** + +```text +quartz_sc/ + project.cif + structures/ + quartz.cif # _cell.*, _atom_site.* (ADP_type, Wyckoff_symbol) + experiments/ + xray_sc.cif # _instr.*, _peak.*, _diffrn.* + analysis/ + analysis.cif # _fit_result.* (topology-neutral: reduced_chi_square, + # R_factor_*, wR_factor_*, n_data_points, n_parameters, + # plus Bayesian / non-IUCr fields) + # _easydiffraction_minimizer.* (settings) + reports/ + quartz_sc.cif # data_global — _audit.*, + # _easydiffraction_software.{framework, + # calculator, minimizer}, + # _computing.structure_refinement, + # _chemical_formula.* + # data_quartz — _cell.*, _atom_site.*, _atom_site_aniso.*, + # _space_group.*, _space_group_symop.* loop, + # _diffrn.*, _diffrn_radiation_wavelength.*, + # _refine_ls.*, _reflns.*, + # _refln.* loop (F², include_status), + # _easydiffraction_extinction.* + + # _refine_ls.extinction_* dual emit +``` + +**Example B — Powder Rietveld, single phase, single experiment.** + +```text +mgo_rietveld/ + project.cif + structures/ + mgo.cif + experiments/ + npd.cif # neutron powder, CWL + analysis/ + analysis.cif + reports/ + mgo_rietveld.cif # data_global — audit, software, _chemical_formula + # data_overall — _pd_proc_ls.prof_R_factor, + # .prof_wR_factor, + # .prof_wR_expected, + # .profile_function, + # .background_function, + # _refine_ls.number_parameters, + # _pd_block_id cross-refs + # data_mgo — MgO structure + # data_npd — _pd_meas.* profile loop + # (_2theta_scan, intensity_total, + # _pd_calc.intensity_total, + # _pd_proc.intensity_bkg_calc, + # _pd_proc_ls.weight), + # _refln.* powder reflections loop +``` + +**Example C — Joint Rietveld, multi-experiment (neutron + X-ray).** + +```text +co2sio4/ + project.cif + structures/ + co2sio4.cif + experiments/ + npd_300K.cif + xrd_300K.cif + analysis/ + analysis.cif # _joint_fit weights + reports/ + co2sio4.cif # data_global — audit, software, chemistry + # data_overall — combined refinement stats + # data_co2sio4 — Co2SiO4 structure + # data_npd_300K — NPD pattern, + # _pd_block_diffractogram_id='npd_300K' + # data_xrd_300K — XRD pattern, + # _pd_block_diffractogram_id='xrd_300K' +``` + +**Example D — Sequential fit, multi-temperature TOF Rietveld.** + +```text +co2sio4_t_series/ + project.cif + structures/ + co2sio4.cif + experiments/ + tof_5K.cif + tof_100K.cif + tof_165K.cif + tof_200K.cif + analysis/ + analysis.cif # _sequential_fit configuration + reports/ + co2sio4_t_series.cif # data_global — audit, software, chemistry + # data_overall + # data_co2sio4 + # data_tof_5K — TOF 5K, + # _pd_meas.time_of_flight, + # _pd_calib_d_to_tof loop + # data_tof_100K — TOF 100K + # data_tof_165K — TOF 165K + # data_tof_200K — TOF 200K +``` + +#### 2.3 Multi-datablock layout inside the export file + +**Every export file starts with a `data_global` block carrying audit, +software, and chemistry metadata** (§2.3a). Subsequent blocks depend on +analysis topology. Block content uses dotted DDLm form throughout. The +single-block-name rule is uniform across topologies; topology-specific +GSAS-II-style suffix conventions seen in some example files (e.g. +`data__publ`, `data__overall`) are folded into +`data_global` for global metadata and `data_overall` for refinement +metadata, leaving no ambiguity about block roles. + +- **Single-crystal, single structure (single experiment).** + `data_global` + `data_` (or `data_I` if no name is set). + Pattern in `bal5004.cif`, `bp5083.cif`, `ks5497.cif`, `ra5167.cif`: + `data_global + data_I`. + +- **Single-crystal, multiple structures or temperatures.** + `data_global` + one block per structure or per temperature. Pattern in + `bp5014.cif`: `data_global + data_300K + data_55K + data_2point5K`. + +- **Powder Rietveld (single or multi-experiment, single or + multi-phase).** GSAS-II-style block split, with the global metadata + block named `data_global` per the invariant above: + - `data_global` (audit, software, and chemistry metadata per §2.3a), + - `data_overall` (refinement-level metadata — Rietveld R-factors, + profile/background function descriptors, parameter counts), + - `data_` (one per phase — structural data per + `_pd_phase_block.id`), + - `data_` (one per diffraction pattern — measurement + metadata, profile data loop, reflections loop). + + This deviates from the `data__publ` GSAS-II convention seen + in `hb8206.cif`; the deviation buys a uniform rule across + single-crystal and powder exports and matches the single-crystal + corpus (`bal5004`, etc.) which uses `data_global` universally. + `data_overall` is deliberately unprefixed because the generated report + CIF is already scoped to one project. `global` means file/report + metadata; `overall` means combined refinement summary. Phase and + diffractogram blocks use the existing structure and experiment names + after CIF block-code normalization; if two normalized names collide, + append a short numeric suffix to preserve uniqueness. + +- **Multi-experiment joint Rietveld.** Same shape as the + single-experiment Rietveld block split above, with one + `data_` block per pattern, all cross-referenced via + `_pd_block_diffractogram_id` and `_pd_block_id`. + + `_pd_block_id` identifies phase/model blocks; in this report that is + the `data_` block ID. `_pd_block_diffractogram_id` + identifies diffractogram/pattern blocks; in this report that is the + `data_` block ID. For single references, emit the + scalar block ID directly (for example `_pd_block_id lbco`, not + `_pd_block_id |lbco|`). User preference recorded for future + multi-block expansion: "If multiple phases/patterns are needed, I'd + rather use a proper loop or move toward the newer pdCIF replacement + fields, not encode lists in one scalar with pipes." + +- **Sequential fit.** One file per step is **not** the IUCr convention; + sequential refinements emit one `data_` block per + step inside the same `reports/.cif`. Natural sequential + ordering matches the multi-pattern Rietveld pattern above. + +#### 2.3a `data_global` block content + +Items below are all defined in `cif_core.dic` v3.4.0; emit values where +the project has source data, otherwise `?`. + +- `_audit.creation_method 'EasyDiffraction '`, + `_audit.creation_date `. +- `_computing.structure_refinement` (single string derived from + `analysis.software`; when calculator or minimizer provenance is unset + it falls back to the framework label only, e.g. + `'EasyDiffraction 0.17.0 with lmfit 1.0.0 minimizer and cryspy 1.2.3 calculator'`). + coreCIF standard channel for advertising the analysis-software stack + to IUCr-aware tooling. +- `_easydiffraction_software.*` triple holding the same three roles in + structured form, plus `_easydiffraction_software.fit_datetime` when a + fit timestamp is available (see §2.3a-i below). +- No `_journal.*`, `_journal_date.*`, `_journal_coeditor.*`, + `_publ_contact_author.*`, `_publ_author.*`, `_publ_body.*`, or + `_pd_meas.info_author_*` placeholders are emitted in v1. The clean + report policy and deferred tag list live in + [`project-summary-rendering.md`](project-summary-rendering.md) §5. +- `_chemical_formula.*` chemistry summary derived from atom-site data + where possible: `_chemical_formula.sum`, `_chemical_formula.moiety`, + `_chemical_formula.weight`, `_chemical_formula.IUPAC` (uppercase IUPAC + per dictionary). + +User-supplied publication metadata (`publ_info.json`, `publ_info.toml`, +or a Python `project.publication` owner) is deferred — see Deferred Work +and the clean report policy in `project-summary-rendering.md` §5. + +#### 2.3a-i `_easydiffraction_software` framework + +The IUCr-aligned report needs to identify the analysis stack. The +project emits one structured category in `data_global` from +`analysis.software`, carrying three role-keyed strings and an optional +fit timestamp: + +``` +_easydiffraction_software.framework 'EasyDiffraction 0.17.0' +_easydiffraction_software.calculator 'cryspy 1.2.3' +_easydiffraction_software.minimizer 'lmfit 1.0.0' +_easydiffraction_software.fit_datetime 2026-05-26T13:45:00+00:00 +``` + +- `_easydiffraction_software.framework` — EasyDiffraction itself, the + orchestrating analysis software, with version. +- `_easydiffraction_software.calculator` — the active calculation + backend (cryspy, crysfml, pdffit2) with version. +- `_easydiffraction_software.minimizer` — the active minimizer (lmfit, + scipy-lstsq, dfo-ls, emcee, …) with version. Bayesian sampler runs use + the sampler name and version here. +- `_easydiffraction_software.fit_datetime` — ISO-8601 UTC timestamp of + the successful fit that populated `analysis.software`. Omitted when no + timestamp is recorded. + +The same three values are concatenated into the +`_computing.structure_refinement` free-text string for IUCr-tooling +compatibility (publCIF / checkCIF key on `_computing.*`, not on the +project extension). + +The existing `_easydiffraction_minimizer.*` category in +`analysis/analysis.cif` (default save) keeps its role as the +**settings** container — convergence tolerances, max iteration counts, +sampler chain lengths, etc. — and is also emitted as +`_easydiffraction_minimizer.*` in the IUCr export, separate from the +identification triple above. + +#### 2.3b Structure-block content (per-block) + +For each `data_` (single-crystal) or `data_` +(powder Rietveld phase) block: + +- `_chemical_formula.{moiety, sum, weight, IUPAC}` summary. +- `_cell.*` (`length_a`, `angle_alpha`, `volume`, + `measurement_temperature`, etc.). +- `_space_group.name_H-M_alt`, `_space_group.IT_coordinate_system_code`, + `_space_group.crystal_system`, plus the explicit + `_space_group_symop.id` + `_space_group_symop.operation_xyz` loop + alongside the H-M name. +- `_diffrn.*` (instrument, radiation, measurement conditions). + Wavelength as the `_diffrn_radiation_wavelength` category (single-row + category form for monochromatic, loop form for multi-λ — see §3 + transformer). +- `_exptl_crystal.*` if the project tracks crystal-specimen metadata + (currently it does not — deferred work). +- `_atom_site.*` loop with `_atom_site.label`, `_atom_site.type_symbol`, + `_atom_site.fract_x/y/z`, `_atom_site.occupancy`, + `_atom_site.ADP_type`, `_atom_site.B_iso_or_equiv` xor + `_atom_site.U_iso_or_equiv` per row, `_atom_site.Wyckoff_symbol`. +- `_atom_site_aniso.*` loop (when anisotropic ADPs present), emitting + `B_*` xor `U_*` family per row. +- `_refine_ls.*` (single-crystal) or `_pd_proc_ls.*` (powder) refinement + statistics. +- `_reflns.number_total`, `_reflns.number_gt`, + `_reflns.threshold_expression`. + +#### 2.3c Single-crystal reflections loop + +Column set (DDLm dotted form): + +``` +loop_ +_refln.index_h +_refln.index_k +_refln.index_l +_refln.F_squared_meas +_refln.F_squared_calc +_refln.F_squared_meas_su +_refln.include_status +``` + +Column set chosen from `cif_core.dic` (the dictionary defines +`_refln.include_status` for marking observed reflections; the corpus +form `_refln.observed_status` is **not** in the current dictionary and +is treated as outdated). The `_su` suffix follows DDLm convention; the +parenthesised CIF uncertainty syntax remains the preferred numeric +encoding per `free-flag-cif-encoding.md`, so `_refln.F_squared_meas_su` +is emitted only when a paired-value column is needed. + +#### 2.3d Powder reflections loop + +``` +loop_ +_refln.index_h +_refln.index_k +_refln.index_l +_refln.F_squared_meas +_refln.F_squared_calc +_pd_refln.phase_id +_refln.d_spacing +``` + +Column set adapted from the corpus content (`bal5001.cif`, `hb8206.cif`) +with tag form taken from `cif_core.dic` and `cif_pow.dic`. The phase +identifier uses the powder dictionary's `_pd_refln.phase_id`; it is not +the calculated structure-factor phase angle `_refln.phase_calc`. + +#### 2.3e Powder profile-data loop + +``` +loop_ +_pd_meas.2theta_scan +_pd_meas.intensity_total +_pd_calc.intensity_total +_pd_proc.intensity_bkg_calc +_pd_proc_ls.weight +``` + +For TOF experiments, the `_pd_meas.2theta_scan` column is replaced by +`_pd_meas.time_of_flight`. Verified against `bal5001.cif` (content set; +tag form follows `cif_pow.dic`). + +#### 2.3f `data_overall` block (Rietveld only) + +For powder Rietveld files, an `_overall` block carries refinement-level +metadata that applies across all phases and patterns: + +- `_pd_calc.method 'Rietveld Refinement'`. +- `_pd_proc_ls.prof_R_factor`, `_pd_proc_ls.prof_wR_factor`, + `_pd_proc_ls.prof_wR_expected`. +- `_pd_proc_ls.profile_function`, `_pd_proc_ls.background_function` + (free-text descriptors). +- `_pd_proc_ls.pref_orient_corr` (when preferred-orientation correction + is applied). +- `_refine_ls.number_parameters`, `_refine_ls.number_restraints`, + `_refine_ls.number_constraints`. +- `_pd_block_id` references to phase/model blocks and + `_pd_block_diffractogram_id` references to diffractogram/pattern + blocks. Single references are plain scalar IDs. Multiple + phases/patterns should use a proper loop or newer pdCIF replacement + fields, not pipe-delimited scalar lists. + +#### 2.3g `data_` block (Rietveld only — constant wavelength) + +For each constant-wavelength (CWL) diffraction pattern: + +- `_pd_meas.*` measurement metadata (`_pd_meas.scan_method`, + `_pd_meas.2theta_range_min/max/inc`, `_pd_meas.number_of_points`, + `_pd_meas.datetime_initiated`). The + `_pd_meas.info_author_{name, email, phone}` placeholders are omitted + by the clean report policy. +- `_diffrn.*` and `_diffrn_radiation_wavelength.*` (radiation type, + probe, wavelength). +- `_pd_proc.2theta_range_min/max/inc`, `_pd_proc.info_data_reduction`, + `_pd_proc.info_datetime`, `_pd_proc.info_excluded_regions`. +- `_pd_proc_ls.*` profile-fit R-factors for this pattern. +- The `_pd_meas.*` profile-data loop (§2.3e). +- The `_refln.*` reflections loop (§2.3d). + +#### 2.3h `data_` block (Rietveld only — TOF) + +For time-of-flight (TOF) diffraction patterns the block has the same +shape as §2.3g, with three TOF-specific substitutions — **all defined in +`cif_pow.dic` v2.5.0**, no project extensions needed for the standard +powder TOF surface: + +- Measurement x-axis: `_pd_meas.time_of_flight` (with + `_pd_meas.time_of_flight_su` companion when paired-value emission is + needed). Replaces the `_pd_meas.2theta_scan` column in the + profile-data loop. +- d-spacing → TOF calibration: the four-row + `_pd_calib_d_to_tof.{id, coeff, coeff_su, power, diffractogram_id}` + loop materialised by the §3 transformer. The dictionary defines the + equation as `TOF = Σ c_i · d^(p_i)` (`_pd_calib_d_to_tof.coeff` and + `_pd_calib_d_to_tof.power`, summed over rows; cif_pow.dic lines 2429 + ff.). The `_pd_calib_d_to_tof.id` column accepts arbitrary codes per + the dictionary (its own example uses `0`, `DIFC`, `t2`); the project + uses the EasyDiffraction attribute names verbatim: + + ``` + loop_ + _pd_calib_d_to_tof.id + _pd_calib_d_to_tof.power + _pd_calib_d_to_tof.coeff + _pd_calib_d_to_tof.coeff_su + _pd_calib_d_to_tof.diffractogram_id + offset 0 + linear 1 + quad 2 + recip -1 + ``` + + Rows with a zero coefficient may be omitted. Units are determined + per-row by `power` (μs at power 0, μs/Å at power 1, μs/Ų at power 2, + Å/μs at power −1) per the dictionary's `_method.expression` block on + `_pd_calib_d_to_tof.coeff`. + +- Profile-data loop for TOF: + + ``` + loop_ + _pd_meas.time_of_flight + _pd_meas.intensity_total + _pd_calc.intensity_total + _pd_proc.intensity_bkg_calc + _pd_proc_ls.weight + ``` + + Same columns as §2.3e except the x-axis. The `_diffrn.*` and + `_pd_meas.scan_method` items advertise the TOF nature for external + readers that do not key off the column name alone. + +The richer +`_pd_calib_xcoord.{actual_time_of_flight, nominal_time_of_flight, …}` +calibration pair (`cif_pow.dic` lines 3881 ff., 4167 ff.) is **not** +emitted in the first pass — the project does not currently track +actual-vs-nominal TOF calibration distinct from the polynomial +coefficients. Flagged as deferred work. + +#### 2.4 Formatting (separate IUCr writer) + +The IUCr writer pass differs from the default writer: + +- Dotted DDLm item form (`_atom_site.label`) — same as the default save. + The reference dictionaries declare every item in dotted form; the + corpus' DDL1 underscore usage is treated as outdated tooling output, + not a target convention. +- Blank line between every category, and between a category and a + following loop. +- `# ----
----` header before each logical group within a + block (chemical metadata, cell, space group, symmetry operations, + diffraction, atoms, ADP, refinement, reflections / profile data, + project extensions). +- Block separator + `#=====================================================` between + `data_*` blocks. +- Loop columns left-aligned to per-column widths; loop body lines + indented two spaces. +- 80-char wrap on long string values per CIF spec. +- Numeric `_su` always written via the parenthesised CIF uncertainty + syntax (e.g. `5.4307(2)`); `_su` companion items are not emitted as + separate fields. Matches the existing project encoding from + `free-flag-cif-encoding.md`. +- Project-extension `_easydiffraction_*` categories grouped at the end + of each block under a `# ---- EasyDiffraction project extensions ----` + header. + +#### 2.5 Submission-side validation + +**Superseded (2026-05-30): the runtime writer self-check described below +was removed.** The IUCr CIF writer no longer validates its own output +against `cif_core.dic` / `cif_pow.dic`; `reports/.cif` is +written directly. Rationale: + +- The report CIF is our own deterministic output. Checking it at write + time and raising `EasyDiffractionWriterError` ("…file a bug") turns a + developer-side test concern into a user-facing failure that blocks a + scientist's report over a defect only we can fix. +- The check resolved dictionaries from `tmp/iucr-dicts/` under the + repository root. That path never resolves for a pip-installed user, so + the self-check was a silent no-op for everyone except a developer who + had manually placed the dictionaries — where it only produced noise, + because the current COMCIFS DDLm/CIF2 dictionaries do not parse under + the helper's gemmi + regex approach. +- Spec compliance of the emitted tag set is maintained by authoring the + writer against the COMCIFS reference dictionaries (the dotted-tag set + is fixed in `iucr_writer.py`); a separate IUCr-server upload remains + the authoritative compliance check before submission. No part of the + library reads `tmp/iucr-dicts/` at runtime. + +The original decision (retained for history): the writer ran generated +content through `gemmi` before writing, with public +`project.report.check()` / `check=True` entry points removed so that +dictionary compliance was an internal writer self-check rather than a +user choice. The intended gemmi checks were tag existence in +`cif_core.dic` / `cif_pow.dic` (unknown non-`_easydiffraction_*` tags +raising `EasyDiffractionWriterError`), value-type matching against +`_type.contents`, required category keys per loop row, single-category +loop columns, and well-formed DDLm dotted form. It never covered +crystallographic sanity checks (bond lengths, void volumes, density +plausibility, missed-symmetry detection, ADP positive-definiteness) or +whether future journal-submission metadata is complete — that remains a +separate IUCr-server concern. The v1 clean report CIF does not emit the +empty `_journal.*` / `_publ_*` placeholders. + +### 3. Handler mechanism — `iucr_name` + `IucrCategoryTransformer` + +Both write paths read the same in-memory `Parameter` / +`StringDescriptor` / `NumericDescriptor` objects. Drift between default +save and IUCr export is prevented by two complementary mechanisms. + +**Per-field — `iucr_name: str | None` on `CifHandler`.** Singular, +chosen for consistency with the existing `names: list[str]`. The +exporter resolves the IUCr-side tag as `iucr_name` when set, otherwise +falls back to `names[0]`. Both forms are dotted DDLm. + +```python +# Structure — casing differs from default save +self._adp_type = StringDescriptor( + name='adp_type', + cif_handler=CifHandler( + names=['_atom_site.adp_type'], + iucr_name='_atom_site.ADP_type', + ), +) + +# Analysis — default already matches IUCr; iucr_name omitted +self._goodness_of_fit = Parameter( + name='reduced_chi_square', + cif_handler=CifHandler( + names=['_refine_ls.goodness_of_fit_all'], + # exporter falls back to names[0] + ), +) + +# Project extension — IUCr export uses _easydiffraction_* prefix +self._fitting_time = Parameter( + name='fitting_time', + cif_handler=CifHandler( + names=['_fit_result.fitting_time'], + iucr_name='_easydiffraction_fit_result.fitting_time', + ), +) +``` + +The mechanism scales: future export targets (mmCIF, journal dialects) +get sibling fields (`mmcif_name`, etc.) without rewriting existing +handlers. There is no clever prefix-substitution rule — explicit beats +clever. + +Per-experiment-family dual mapping (e.g., `fit_result.n_data_points` +mapping to `_refine_ls.number_reflns` for single-crystal but +`_pd_proc.number_of_points` for powder) is handled at the +category-transformer level (below), not by promoting `iucr_name` to a +list. The per-field handler stays simple. + +**Category-level — `IucrCategoryTransformer` subclasses for structural +reshaping.** A small number of items don't rename, they restructure: + +- **Wavelength** — single-row + `_diffrn_radiation_wavelength.{id, value, wt}` for monochromatic + radiation (the common case); full loop form when multiple wavelengths + are tracked. +- **TOF calibration** — four scalar Python parameters + (`d_to_tof_offset`, `d_to_tof_linear`, `d_to_tof_quad`, + `d_to_tof_recip`) materialise as a four-row + `_pd_calib_d_to_tof.{id, coeff, power, coeff_su, diffractogram_id}` + loop. Per cif_pow.dic the equation is `TOF = Σ c_i · d^(p_i)`; the + rows use the EasyDiffraction attribute names as `id` codes (`offset`, + `linear`, `quad`, `recip`) with corresponding `power = 0, 1, 2, -1`. + Full row layout in §2.3h. +- **Range-form excluded regions** — free-text + `_pd_proc.info_excluded_regions` rendering of the range list. +- **Symmetry operations** — `_space_group_symop.*` loop derived from the + active space group (no Python-side persistence of symop strings + today). +- **Extinction (single-crystal)** — the project's + `_easydiffraction_extinction.{type, model, mosaicity, radius}` + category is **also** emitted as the coreCIF `_refine_ls.extinction_*` + triple in single-crystal IUCr export blocks. The mapping is direct + because the dictionary text for `_refine_ls.extinction_method` defines + it as a free-text descriptor that already enumerates the + Becker-Coppens type 1 / type 2 / mixed, Gaussian / Lorentzian, + isotropic / anisotropic taxonomy — exactly what the project's `type` + + `model` selectors represent. Concrete mapping: + + | Project field | IUCr emit | + | ------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- | + | `extinction.type = 'becker_coppens'`, `extinction.model = 'gaussian_isotropic_type1'` | `_refine_ls.extinction_method 'Becker-Coppens type 1 Gaussian isotropic'` | + | `extinction.type = 'zachariasen'` | `_refine_ls.extinction_method 'Zachariasen'` | + | `extinction.mosaicity` (BC type 1 or Zachariasen) | `_refine_ls.extinction_coef ` | + | `extinction.radius` (BC type 2) | `_refine_ls.extinction_coef ` | + | BC mixed (both `mosaicity` and `radius` present) | `_refine.special_details ''` per dictionary text | + + The transformer reads the active `_easydiffraction_extinction.*` + values, picks the right coefficient channel based on `type` and + `model`, and falls back to `_refine.special_details` when the + Becker-Coppens "mixed" case is detected. The + `_easydiffraction_extinction.*` block is emitted alongside (not + instead of) the standard items, so the full project-side detail + survives round-trip through any tool that reads `_easydiffraction_*` + extensions while standard tools see the coreCIF triple. + +These cannot be expressed as `iucr_name`; the unit of transformation is +a category, not a field. They live in the IUCr exporter as +`IucrCategoryTransformer` subclasses, registered alongside the existing +`CategoryItem` subclasses. + +### 4. ADP tags — single-tag emission on write + +Both `_atom_site_aniso.B_ii` and `_atom_site_aniso.U_ii` exist in +coreCIF, as do `_atom_site.B_iso_or_equiv` and +`_atom_site.U_iso_or_equiv`. The dictionary expects exactly one family +per file, declared by `_atom_site.ADP_type`. Policy applies to **both** +default save and IUCr export: + +- **Read**: accept either tag family. Unchanged. +- **Write**: emit the tag matching `atom_site.ADP_type` for that row; + omit the other. + +The choice is per-row based on `ADP_type`, not a project-wide default. +The [`type-neutral-adp-parameters.md`](type-neutral-adp-parameters.md) +Python contract is unchanged. + +The writer no longer propagates one file-wide B/U convention across all +atom sites before serialisation. If a structure contains both +B-convention and U-convention atoms, the emitted CIF contains one +`_atom_site_aniso.B_*` loop and one `_atom_site_aniso.U_*` loop, each +containing only the rows whose `ADP_type` matches that family. + +### 5. Loop-tag style — dotted DDLm on write, dual-name on read + +Both reference dictionaries declare every item in dotted DDLm form and +record the legacy DDL1 underscore form as `_alias.definition_id` (787 in +coreCIF, 180 in pdCIF). The dictionaries are the spec; corpus example +files often lag the spec because they are produced by tooling (GSAS-II, +Jana2006, SHELX, etc.) that has not yet caught up with the DDLm +conversion. + +Policy: + +- **Write — dotted DDLm form universally** for both the default save and + the IUCr export. Matches the dictionaries' canonical identifiers and + the project's current write behaviour. +- **Read — accept dotted and underscore form** for every IUCr-aligned + category, using the dictionaries' `_alias.definition_id` table as the + source of truth. The project already does this for + `_pd_background.line_segment_X` / `_pd_background_line_segment_X`; + extend the same policy to every IUCr-aligned category. + +## Consequences + +### Positive + +- Day-to-day saved files keep current UX (no Caglioti coefficients + hidden inside loops, no awkward `_diffrn_radiation_wavelength` loop + for what's morally a scalar, no `_pd_calib_d_to_tof.power` integer + rows for users to figure out). +- Structure CIFs (default save) become directly recognisable to + crystallographers reading or hand-editing them — names match the + literature. +- Analysis CIFs (default save) use dictionary-canonical _item_ names for + fit statistics (uppercase R / wR, etc.) under the topology-neutral + `_fit_result.*` category, so per-field identifiers are immediately + recognisable to scientists familiar with `_refine_ls.*` / + `_pd_proc_ls.*` from Rietveld publications; the IUCr export carries + the matching dictionary-canonical category prefixes per topology. +- The report CIF becomes a single explicit report command, with no + manual editing required for the refinement data: + `project.report.save_cif()` produces a clean file at + `reports/.cif` matching the multi-datablock publication + convention for structural and fit content. Users who want CIF reports + on every project save can set `project.report.cif = True`. +- Journal and author metadata placeholders are omitted from + `data_global`; a future submission-specific surface can add them when + a concrete journal workflow requires them. +- External IUCr tooling (publCIF, checkCIF, pdCIFplotter) can consume + the report file cleanly; the day-to-day saved files are not a tooling + target. +- `_easydiffraction_*` prefix appears only in the IUCr export, where the + explicit namespacing aids journal reviewers. It does not bloat + day-to-day CIFs. +- Drift between default and IUCr write paths is structurally prevented: + both paths read the same `Parameter` objects through the same + `CifHandler` and emit the same DDLm dotted form. + +### Trade-offs + +- Two write paths to implement and test. Single source of truth (the + in-memory `Parameter` objects) keeps drift bounded; the per-field + `iucr_name` plus per-category `IucrCategoryTransformer` mechanism is + the testable seam. +- Powder Rietveld IUCr CIFs are large because measured and calculated + profile data is embedded. Acceptable for journal submission; the + format is what reviewers expect. Largest inspected example: + `hb8169.cif` at 50K lines (DDL1 form; DDLm form would be of comparable + size). +- IUCr export is one-way. A user who hand-edits a file in `reports/` + loses those edits on the next configured report save. Documented as + such; treat `reports/` as generated output. +- Some external tooling chains (publCIF, journal in-house scripts) may + still expect DDL1 underscore form. The dotted DDLm form is the + dictionary spec; if real submissions surface a problem, a downstream + conversion option can be added on request. Not pre-emptively built in. + +### ADRs amended by this ADR + +- [`analysis-cif-fit-state.md`](analysis-cif-fit-state.md) — new + IUCr-named fields added under `_fit_result.*` in the default save + (R-factors, positive restraint/constraint counts, profile/background + function descriptors, reflns aggregates). `_fit_result.*` stays + topology-neutral in `analysis/analysis.cif`; per-topology renaming to + `_refine_ls.*` / `_pd_proc_ls.*` happens only in the IUCr export + (§1.2, §3 transformers). A later project-report amendment adds + `_software.*` as the persisted source for report software provenance. +- [`minimizer-input-output-split.md`](minimizer-input-output-split.md) — + `_fit_result.*` examples updated for the new fields. +- [`project-facade-and-persistence.md`](project-facade-and-persistence.md) + — `project.summary` facade slot is removed and replaced by + `project.report`. The accepted `project.save(report=True)` flag is + superseded by report booleans for configured reports and + `project.report.save_cif()` for the IUCr CIF one-off path. + `summary.cif` is no longer written by default `Project.save()`; the + slot is repurposed for IUCr / journal report generation in + `reports/.cif` (see §2). The unimplemented `summary_to_cif()` + placeholder code path + ([`project.py:464`](../../../../src/easydiffraction/project/project.py)) + is removed as part of the implementation plan; no summary content + survives the transition because nothing was being written there in the + first place. +- [`help-discoverability.md`](help-discoverability.md) — + `project.summary.help()` is removed from the documented help surface + and replaced by `project.report.help()` (same responsibilities, new + slot name). All other entries in the help-surface table are + unaffected. +- [`project-summary-rendering.md`](project-summary-rendering.md) — + amends this ADR's report API: public `check()` / `check=True` are + removed, the `_easydiffraction_software.*` triple is read from + `analysis.software`, and `_easydiffraction_software.fit_datetime` is + added when fit provenance has a timestamp. (The write-path gemmi + validation this ADR introduced was later removed — see the §2.5 + amendment.) + +## Open Questions + +(None blocking. Dictionary-side ambiguities have all been resolved +against `cif_core.dic` v3.4.0 / `cif_pow.dic` v2.5.0 while authoring the +writer. The runtime gemmi self-check originally described in §2.5 was +removed (see the §2.5 amendment); spec compliance now rests on authoring +discipline plus a final IUCr-server upload before submission.) + +## Alternatives Considered + +### A. Keep all current tags as-is + +Smallest diff. Saved CIFs stay self-contained but cannot be consumed by +external IUCr tooling, and journal submission requires manual +conversion. Defensible only if external CIF interop is never a goal. + +### B. Align everything by default (no separate IUCr export) + +The previous broad-rewrite extension. Maximises external interop but +pays the UX cost on every saved file — TOF coefficient loops, wavelength +category form for what's morally a scalar, `_easydiffraction_*` prefixes +in `analysis/analysis.cif`. Replaced by the tiered design above. + +### C. Adopt IUCr fit-output names only (the original fit-output-only ADR) + +Fixes the most visible gap (`_fit_result.*`) but leaves the instrument, +calibration, casing, loop-style, ADP write-side, and journal-submission +decisions unstated. Preserved here as §1.2. + +### D. Two write paths, no shared handler mechanism + +Implement the IUCr export as a fully separate writer that re-implements +every tag mapping. Doubled maintenance, guaranteed drift. Rejected. + +### E. Round-trip-capable IUCr files + +The IUCr export could be the source of truth and the default saved files +could be derived from it. Requires retaining `_easydiffraction_*` +extension data through the IUCr writer and parsing it back on load. Adds +round-trip surface area for no day-to-day benefit. Rejected explicitly: +**IUCr export is one-way**. + +### F. Multiple IUCr files (one per refined dataset) + +The earlier version of §2 proposed `reports/.cif` files — one +per refinement unit. The IUCr submission convention is one file per +article with multiple data blocks inside (consistent with the corpus). +Rejected. + +### G. Emit DDL1 underscore form in the IUCr export + +An earlier revision proposed switching the IUCr export to DDL1 +underscore form because every inspected corpus file used it. Rejected: +the COMCIFS reference dictionaries are the authoritative spec, and they +declare every item in dotted DDLm form. Corpus files frequently lag the +spec because the tooling that produced them (GSAS-II, Jana2006, SHELX, +etc.) has not yet caught up with the DDLm conversion; their tag style is +**not** a target convention. If a specific journal portal turns out to +reject DDLm input, the dual-style fallback in "Open Questions" covers +it. + +## Deferred Work + +- **Journal-submission metadata surface.** A future ADR may introduce a + user-supplied `reports/publ_info.json` / `publ_info.toml` file or a + Python `project.publication` owner for `_journal.*`, `_publ_*`, and + `_publ_author.*` entries. V1 deliberately omits those placeholders so + generated report CIFs stay clean; revisit only with concrete user or + journal-portal requirements. +- **Crystallographic sanity validation.** The §2.5 validator covers spec + compliance only. A future pass could integrate IUCr's web checkCIF + (HTTP POST to the checkCIF endpoint) or bundle a local subset of its + sanity checks (bond-length plausibility, void detection, + missed-symmetry, anisotropic-ADP positive-definiteness). Treated as a + separate concern from dictionary validation. +- **Richer TOF calibration.** `_pd_calib_xcoord.actual_time_of_flight` / + `nominal_time_of_flight` paired calibration (cif_pow.dic lines 3881 + ff., 4167 ff.) for instruments that distinguish actual vs nominal TOF. + EasyDiffraction tracks only the polynomial coefficients today. +- **`_atom_type_scat_*` Cromer-Mann and neutron scattering-length + tables.** Required by GSAS-II-style files (`hb8206.cif`) for + self-contained reflection calculation, but EasyDiffraction does not + track these today. +- **`_exptl_crystal.*` single-crystal-specimen metadata** (size, shape, + density, etc.). The project has no source data for these fields; emit + as `?` placeholders or skip entirely. +- **`_audit.*` extended audit trail** (`_audit.update_record`, + `_audit.block_DOI`). +- **mmCIF / other macromolecular-targeting export.** Same handler + mechanism (`mmcif_name` sibling field) but a different exporter. Not + on the roadmap. +- **Default-save `_chemical_formula.*` derivation** from `_atom_site` + rows. No Python field exists today; the IUCr export already derives + them for `data_global` per §2.3a. +- **imgCIF alignment.** Not on the roadmap; explicitly deferred. diff --git a/docs/dev/adrs/accepted/minimizer-category-consolidation.md b/docs/dev/adrs/accepted/minimizer-category-consolidation.md new file mode 100644 index 000000000..b0f374f19 --- /dev/null +++ b/docs/dev/adrs/accepted/minimizer-category-consolidation.md @@ -0,0 +1,496 @@ +# ADR: Minimizer Category Consolidation + +## Status + +Accepted. + +## Date + +2026-05-23 + +## Group + +Analysis and fitting. + +## Context + +Recent Bayesian (DREAM) work introduced seven analysis-level categories +to persist Bayesian fit settings, results, diagnostics, per-parameter +summaries, and plot caches: + +- `_bayesian_sampler` (resolved sampler inputs) +- `_bayesian_result` (Bayesian header) +- `_bayesian_convergence` (diagnostics) +- `_bayesian_parameter_posterior` (per-parameter summaries) +- `_bayesian_distribution_cache`, `_bayesian_pair_cache`, + `_bayesian_predictive_dataset` (plot-ready cache manifests) + +This layout is internally consistent but breaks the convention used +everywhere else in the codebase: + +1. **One category per concept, plain descriptive name.** Structure and + experiment categories (`cell`, `peak`, `background`, `instrument`) + are single-concept and unsuffixed. The Bayesian work introduces a + parallel naming with prefixes (`bayesian_*`) and a settings/result + mirror that has no precedent. +2. **Refinement annotates the object in place.** A `Parameter` carries + both its user-set initial value and its fit-refined value plus + uncertainty on the same object. The Bayesian work stores per- + parameter posterior data in a separate loop category instead of on + the parameter. +3. **Selectors live on the owner.** `experiment.background_type` and + `experiment.peak_profile_type` are owner-level. The minimizer + selector lives one level deep at `analysis.fitting.minimizer_type`, + which has no analogous depth elsewhere. +4. **One user-input surface per concept.** Users today configure the + sampler at `analysis.fitting.minimizer.` (live solver instance + attributes), while the persisted `_bayesian_sampler.*` category is a + post-run snapshot with no public setters. The same fact lives in two + places, only one is writable, only the other appears in `help()` and + CIF. + +Adding emcee on top of this layout would entrench the divergence. This +ADR consolidates the design before introducing additional Bayesian +samplers. + +## Decision + +### 1. Unified `minimizer` category replaces sampler-input categories + +Introduce a single switchable category `minimizer` on `Analysis`. Its +concrete class is determined by `Analysis.minimizer_type`. The category +now holds user-writable minimizer inputs only. The later +[`minimizer-input-output-split.md`](minimizer-input-output-split.md) ADR +reverses the fit-output half of this rule: scalar fit outputs live on +the paired `fit_result` category instead of on `minimizer`. + +The following categories are removed: + +- `bayesian_sampler` — fields move into the Bayesian concrete classes of + `minimizer`. +- `bayesian_result`, `bayesian_convergence` — fields move into the + Bayesian concrete classes of `fit_result` (`fitting_time`, + `acceptance_rate_mean`, `gelman_rubin_max`, + `effective_sample_size_min`, `best_log_posterior`, …). +- `deterministic_result` — fields move into the deterministic concrete + classes of `fit_result` (`fitting_time`, `iterations`, + `objective_value`, `exit_reason`, …). +- `bayesian_parameter_posterior` — replaced by `Parameter.posterior` + (see §3). +- `bayesian_distribution_cache`, `bayesian_pair_cache`, + `bayesian_predictive_dataset` — replaced by HDF5 sidecar (see §4). + +`fit_parameter` (analysis-owned bounds) remains a fit-state category. +`fit_result` remains the common fit header category and is extended by +the input/output split ADR with family-specific scalar outputs. + +### 2. Selectors move to the `Analysis` owner + +The Python `fitting` category intermediate is dropped. `Analysis` +exposes: + +- `analysis.minimizer_type` (was `analysis.fitting.minimizer_type`) +- `analysis.fitting_mode_type` (was + `analysis.fitting.fitting_mode_type`) +- `analysis.minimizer` (the swappable category) +- `analysis.show_supported_minimizer_types()` +- `analysis.show_current_minimizer_type()` +- `analysis.show_supported_fitting_mode_types()` +- `analysis.show_current_fitting_mode_type()` +- `analysis.joint_fit`, `analysis.sequential_fit` (unchanged + active-sibling categories per + [`fit-mode-categories.md`](../accepted/fit-mode-categories.md)) + +CIF prefixes are unchanged: + +- `_fitting.minimizer_type`, `_fitting.mode_type` stay where they are. +- `_bayesian_sampler.*` is removed; the equivalent fields live under + `_minimizer.*`. + +Rationale: matches the +[Switchable Category API](../accepted/switchable-category-api.md) +convention used by `experiment.background_type` etc. — selector on the +owner, category as a read-only attribute that gets swapped. + +### 3. Per-parameter posterior data lives on `Parameter.posterior` + +Adopt the proposal from +[`parameter-posterior-summary.md`](../suggestions/parameter-posterior-summary.md): +`GenericParameter.posterior` is `None` for deterministic fits and a +`PosteriorParameterSummary` for Bayesian fits. The +`_bayesian_parameter_posterior` CIF loop is removed; posterior summary +columns are added to the existing `_fit_parameter` loop (one row per +refined parameter, mostly-empty columns when the fit was deterministic): + +- `_fit_parameter.posterior_best_sample_value` +- `_fit_parameter.posterior_median` +- `_fit_parameter.posterior_uncertainty` +- `_fit_parameter.posterior_interval_68_low`, + `posterior_interval_68_high` +- `_fit_parameter.posterior_interval_95_low`, + `posterior_interval_95_high` +- `_fit_parameter.posterior_gelman_rubin` +- `_fit_parameter.posterior_effective_sample_size_bulk` + +This mirrors `Parameter.uncertainty`: the same column structure is +populated by deterministic or Bayesian fits as appropriate. Per- +parameter posterior order is the order of the `_fit_parameter` rows +themselves; no separate parallel loop is needed. + +### 4. Heavy posterior arrays live in `analysis/results.h5`, not in CIF + +Posterior chains, KDE / distribution caches, pair-plot caches, and +predictive datasets are large arrays unsuited to CIF. The existing +`analysis/results.h5` sidecar absorbs all of them. The corresponding +manifest categories (`_bayesian_distribution_cache`, +`_bayesian_pair_cache`, `_bayesian_predictive_dataset`) are removed from +CIF entirely — the HDF5 file is self-describing. + +There is exactly **one** sidecar file per fit, regardless of minimizer: +`analysis/results.h5`. No CIF tag stores the sidecar path. The file uses +namespaced top-level groups: + +``` +analysis/results.h5 +├── /posterior/ # canonical posterior chains, log-prob (all Bayesian samplers) +├── /distribution_cache/ # KDE / 1-D distribution plots +├── /pair_cache/ # pair-plot grids +├── /predictive/ # posterior-predictive datasets +└── /emcee_chain/ # emcee HDFBackend live state (emcee runs only) +``` + +**Lifecycle rule: a new fit overwrites the file.** Mixing partial +results from different minimizers — or from the same minimizer with +different settings or a different free-parameter set — is the most +common source of "stale plot" confusion. To prevent this, calling +`analysis.fit()` truncates `analysis/results.h5` (recreating it with the +new run's groups). The user is shown a `log.warn(...)` message the first +time a fit is started while a populated sidecar exists, naming the file +and stating that previous results will be overwritten. + +Resume is the only exception: `analysis.fit(resume=True, extra_steps=N)` +opens the existing file in append mode and extends the chain. Resume is +rejected with a clear error if the active minimizer does not support it, +if `results.h5` is missing, or if the stored chain's parameter set does +not match the current one. + +For deterministic runs the Bayesian groups are absent and the sidecar +file may not exist at all. For non-emcee Bayesian runs the +`/emcee_chain` group is absent. + +### 5. Unified, verbose attribute names with internal mapping + +Each concrete `minimizer` class declares its descriptors in its class +body with verbose, dictionary-style names. Internally, each class maps +these names to its native backend keys. + +Stable inputs across Bayesian samplers (shared by DREAM and emcee): + +| Tag | Native (DREAM) | Native (emcee) | Description | +| ----------------------- | ---------------- | -------------- | -------------------------------------------------------- | +| `sampling_steps` | `steps` | `nsteps` | Total MCMC iterations per chain/walker | +| `burn_in_steps` | `burn` | `nburn` | Iterations discarded as warm-up | +| `thinning_interval` | `thin` | `thin` | Keep every Nth sample | +| `population_size` | `pop` | `nwalkers` | Number of chains / walkers | +| `parallel_workers` | `parallel` (int) | `pool` | `0` = all CPUs; `1` = serial; `N>1` = N worker processes | +| `initialization_method` | `init` (enum) | (custom) | Single unified enum (see §6) | +| `random_seed` | `random_seed` | `random_seed` | Random seed; `None` = system-derived | + +Bayesian-sampler-specific inputs: + +| Tag | Concrete class | Description | +| ---------------- | -------------- | ------------------------------------------- | +| `proposal_moves` | emcee only | emcee proposal moves (e.g. `stretch`, `de`) | + +Deterministic-LSQ inputs: + +| Tag | Description | +| ---------------- | ------------------------- | +| `max_iterations` | Maximum solver iterations | + +`random_seed` remains a Bayesian sampler input because the current +deterministic engines reject non-`None` random seeds. +`convergence_tolerance` is not exposed until a concrete engine path +actually consumes it. + +Fit-filled outputs (subset varies per class): + +| Tag | Class | Description | +| --------------------------- | -------- | ------------------------------------------- | +| `runtime_seconds` | all | Wall time of the fit | +| `reduced_chi2` | all | Reduced χ² | +| `iterations_performed` | LSQ | Iterations actually executed | +| `exit_reason` | LSQ | Free-form short string | +| `acceptance_rate_mean` | Bayesian | Mean acceptance rate across chains/walkers | +| `gelman_rubin_max` | Bayesian | Max R̂ across sampled parameters | +| `effective_sample_size_min` | Bayesian | Min effective sample size across parameters | +| `best_log_posterior` | Bayesian | Best log-posterior value found | + +Verbose CIF tags are user-facing. The canonical MCMC abbreviation +(`r_hat`, `n_eff`, `nllf`) is recorded in the descriptor's `description` +field so it appears in `help()` output but does not become a Python +attribute or a CIF tag. + +**Implementation note (2026-05-25).** The per-parameter R̂ and bulk ESS +values feeding `gelman_rubin_max` and `effective_sample_size_min` are +computed by an in-tree helper at +[`analysis/fit_helpers/_diagnostics.py`](../../../../src/easydiffraction/analysis/fit_helpers/_diagnostics.py) +— pure NumPy + SciPy implementations of split R̂ and rank-normalized bulk +ESS (Vehtari, Gelman, Simpson, Carpenter and Bürkner 2019; Geyer 1992). +The earlier `arviz` dependency, which the library only used to call +`az.rhat()` and `az.ess(method='bulk')`, has been removed; the +diagnostics' public surface (`gelman_rubin_max`, +`effective_sample_size_min`, `r_hat_by_parameter`, +`ess_bulk_by_parameter`) and the convergence thresholds (R̂ ≤ 1.01, ESS +≥ 400) are unchanged. + +### 6. Unified `initialization_method` enum + +A single `(str, Enum)` `InitializationMethodEnum` with members: + +- `latin_hypercube` +- `ball` +- `uniform` +- `prior` + +Each concrete class accepts only the subset it supports and maps to its +native init mode (DREAM `lhs` ↔ `latin_hypercube`, emcee starting-state +generators ↔ `ball` / `uniform` / `prior`). Invalid combinations raise +at set time, not at fit time. + +### 7. CIF `?` is the universal "use default" marker + +Descriptors declare static defaults via `AttributeSpec(default=...)` +when each minimizer category instance is constructed. CIF behavior: + +- **Load.** A missing tag, or a tag with value `?`, resolves to the + descriptor's static default at load time. The category instance + carries the concrete default value from that moment on. +- **Save.** Always emit the actual value. Do not emit `?` for fields + that happen to equal the default. Round-trip is exact for any value + the user set; for an unset field, round-trip resolves `?` → default on + first load and emits the default on next save. +- **No callable defaults.** No "auto-resolve at fit time" (today's + `burn = steps // 5` is replaced by a fixed default + `burn_in_steps = 600`). If a default depends on other settings, the + dependency is documented; the user sets it explicitly. + +This rule applies to every descriptor, not just `minimizer`. For +descriptors that have no sensible default (e.g. `cell.length_a`), the +descriptor declaration omits `default=...` and CIF `?` continues to mean +"unknown" — a load-time error is raised when the field is read. + +### 8. Minimizer families carry defaults; warn-and-reset on swap + +Each concrete `minimizer` class has a complete, discoverable descriptor +surface. Descriptor instances are constructed from family helpers in +`__init__` so shared LSQ fields are declared once and sampler-specific +fields stay on the Bayesian concrete classes. Concrete subclasses may +override class-level defaults only when their backend behavior really +differs. + +```python +class EmceeMinimizer(BayesianMinimizerBase): + _default_sampling_steps = 5000 + _default_population_size = 32 + _default_proposal_moves = 'de' +``` + +When `analysis.minimizer_type` changes, the underlying instance is +replaced by a fresh instance of the new class with that class's +defaults. A `log.warn(...)` lists fields whose default values differ +between old and new classes, matching the precedent of `background_type` +swap warnings. + +### 9. Example CIF layouts + +The fit-result outputs in these examples live under `_fit_result.*` per +[`minimizer-input-output-split.md`](minimizer-input-output-split.md); +`_minimizer.*` carries only user-writable settings. + +`bumps (lm)`: + +``` +data_analysis + +_fitting.mode_type joint +_fitting.minimizer_type 'bumps (lm)' + +_minimizer.max_iterations 200 + +_fit_result.result_kind deterministic +_fit_result.fitting_time 12.34 +_fit_result.iterations 87 +_fit_result.exit_reason converged +_fit_result.reduced_chi_square 1.42 +``` + +`bumps (dream)`: + +``` +data_analysis + +_fitting.mode_type joint +_fitting.minimizer_type 'bumps (dream)' + +_minimizer.sampling_steps 3000 +_minimizer.burn_in_steps 600 +_minimizer.thinning_interval 1 +_minimizer.population_size 4 +_minimizer.parallel_workers 0 +_minimizer.initialization_method latin_hypercube +_minimizer.random_seed ? + +_fit_result.result_kind bayesian +_fit_result.fitting_time 124.7 +_fit_result.reduced_chi_square 1.18 +_fit_result.acceptance_rate_mean 0.27 +_fit_result.gelman_rubin_max 1.03 +_fit_result.effective_sample_size_min 482 +_fit_result.best_log_posterior -1234.56 +``` + +`emcee` (added by the follow-up plan): + +``` +data_analysis + +_fitting.mode_type joint +_fitting.minimizer_type emcee + +_minimizer.sampling_steps 5000 +_minimizer.burn_in_steps 1000 +_minimizer.thinning_interval 1 +_minimizer.population_size 32 +_minimizer.proposal_moves de +_minimizer.parallel_workers 0 +_minimizer.initialization_method ball +_minimizer.random_seed 42 + +_fit_result.result_kind bayesian +_fit_result.fitting_time 87.3 +_fit_result.reduced_chi_square 1.22 +_fit_result.acceptance_rate_mean 0.31 +_fit_result.gelman_rubin_max 1.02 +_fit_result.effective_sample_size_min 612 +_fit_result.best_log_posterior -1237.89 +``` + +emcee's resumable chain state lives in the `/emcee_chain` group of the +same `analysis/results.h5` file (see §4). No sidecar path appears in +CIF. + +## Superseded Selector Layout + +This ADR's original selector layout was superseded by +[`switchable-category-owned-selectors.md`](switchable-category-owned-selectors.md). +The minimizer selector no longer persists as `_fitting.minimizer_type` +and is no longer assigned through `analysis.minimizer_type`. The current +surface is: + +```python +analysis.minimizer.type = 'bumps (lm)' +analysis.minimizer.show_supported() +``` + +The active minimizer persists as `_minimizer.type`. The earlier +`_minimizer.optimizer_name` and `_minimizer.method_name` fields are also +dropped; restored `FitResults.optimizer_name` and +`FitResults.method_name` are derived from the concrete minimizer +category's class-level `_engine_metadata` dict. + +## Consequences + +### Architecture wins + +- The analysis layout matches the rest of the codebase: one descriptive + category per concept, selectors on owners, refinement-in-place. +- The Bayesian / deterministic split stops requiring parallel category + trees. One swappable `minimizer` covers both worlds. +- Adding new minimizers (emcee, future samplers, future LSQ variants) is + a one-class change: declare descriptors, register with the factory. +- CIF projects shrink: large arrays move to HDF5; redundant manifest + categories disappear. + +### Trade-offs + +- `minimizer` no longer mixes writable user inputs and fit-filled + outputs in the same scope. That stricter boundary is recorded by + [`minimizer-input-output-split.md`](minimizer-input-output-split.md); + `Parameter` remains the refinement-in-place precedent for model values + rather than minimizer diagnostics. +- The set of `_minimizer.*` tags present in CIF depends on the active + `_fitting.minimizer_type`. Loading a CIF whose tags don't match the + minimizer's allowed set raises (clear validation, not silent + ignoring). +- Hand-editing CIF to switch minimizer types requires touching both + `_fitting.minimizer_type` and the relevant `_minimizer.*` tags. +- Existing projects saved under the seven-category layout cannot load + unchanged. The project is in beta; per `AGENTS.md` "no legacy shims" + applies. Saved fixtures under `tmp/tutorials/projects/` are + regenerated by the implementation plan. + +### ADRs amended by this ADR + +- [`runtime-fit-results.md`](../accepted/runtime-fit-results.md) — amend + the closing line to point at this ADR as the canonical + saved-projection definition (alongside `analysis-cif-fit-state.md`). +- [`analysis-cif-fit-state.md`](../accepted/analysis-cif-fit-state.md) — + replace §"Bayesian fit projection" entirely. Remove the seven + `_bayesian_*` categories; describe `_minimizer.*` and the extended + `_fit_parameter` posterior columns. Remove the sidecar-path CIF field; + the sidecar name is implicit. +- [`fit-mode-categories.md`](../accepted/fit-mode-categories.md) — + update §1 and §2 to reflect that `minimizer_type` and + `fitting_mode_type` live on `Analysis` directly, not on a `fitting` + Python intermediate. The active-sibling design for `joint_fit` / + `sequential_fit` is unchanged. +- [`selector-families.md`](../accepted/selector-families.md) — + reclassify `analysis.minimizer_type` as a switchable-category selector + (on owner `Analysis`, swaps the `minimizer` category instance), no + longer a Backend selector. +- [`switchable-category-api.md`](../accepted/switchable-category-api.md) + — append `minimizer` to the examples list. No mechanical change. +- [`parameter-correlation-persistence.md`](../accepted/parameter-correlation-persistence.md) + — verify wording still applies (categories + `_fit_parameter_correlation` are kept by this ADR; should be a no-op). + +### Suggestions superseded or absorbed + +- [`parameter-posterior-summary.md`](../suggestions/parameter-posterior-summary.md) + — absorbed by §3 of this ADR. When this ADR is accepted, that + suggestion can be closed and a pointer added. + +## Alternatives Considered + +### A. Keep `bayesian_settings` and `least_squares_settings` as separate categories + +Two stable input categories, each switchable internally. Rejected +because it (i) introduces the `_settings` suffix convention that has no +precedent in the codebase, (ii) duplicates the input/output mirror +pattern, and (iii) gains nothing over a single owner-level category +whose shape adapts to the active minimizer. + +### B. Single flat `fit_settings._` category + +One namespace, attributes prefixed by family. Rejected because (i) it +forces long attribute names (`fit_settings.bayesian_population_size`), +(ii) breaks the "one category, one focused concept" convention, and +(iii) loses the natural shape-shifting that `background` and +`peak_profile` already exemplify. + +### C. Keep the seven-category Bayesian layout and add emcee siblings + +Add `_emcee_sampler`, `_emcee_convergence`, …, mirroring the existing +`_bayesian_*` layout per backend. Rejected because it doubles the +category count for each new sampler and entrenches the convention break. + +### D. Strict input-only `minimizer` plus a separate `fit_result` + +Originally rejected in favour of the one-category-mixes-both shape (§1, +§"Trade-offs"). Reversed by +[`minimizer-input-output-split.md`](minimizer-input-output-split.md) +after implementation showed the `Parameter` analogy does not hold for +minimizer settings versus fit diagnostics. The current design keeps +`minimizer` input-only and moves scalar fit outputs to the paired +`fit_result` category. diff --git a/docs/dev/adrs/accepted/minimizer-input-output-split.md b/docs/dev/adrs/accepted/minimizer-input-output-split.md new file mode 100644 index 000000000..a36bf1355 --- /dev/null +++ b/docs/dev/adrs/accepted/minimizer-input-output-split.md @@ -0,0 +1,433 @@ +# ADR: Minimizer Input/Output Split + +**Status:** Accepted **Date:** 2026-05-24 + +## Status Note + +This proposal revisits and supersedes Alternative D in +[`minimizer-category-consolidation.md`](minimizer-category-consolidation.md) +("Strict input-only `minimizer` plus a separate `fit_result`"), which +was rejected during the consolidation work on the assumption that the +input/output mix on `analysis.minimizer` was symmetric with the +input/output mix on `Parameter`. Implementation experience shows that +analogy does not hold and the mix has produced measurable UX and +duplication problems documented below. + +## Context + +After +[`minimizer-category-consolidation.md`](minimizer-category-consolidation.md) +landed, `analysis.minimizer` holds both writable user inputs and +fit-filled outputs in a single namespace. A user typing +`analysis.minimizer.help()` before any fit sees roughly twenty +properties, the majority of which are `None`, `0`, or empty strings, +with no signal which they may write and which the fit fills in. + +The current shape on the live category surfaces: + +- **Writable inputs:** `max_iterations` (LSQ); `sampling_steps`, + `burn_in_steps`, `thinning_interval`, `population_size`, + `parallel_workers`, `initialization_method`, `random_seed` (Bayesian). +- **Fit-filled outputs (no public setter, only `_set_*` internals):** + `objective_name`, `objective_value`, `n_data_points`, `n_parameters`, + `n_free_parameters`, `degrees_of_freedom`, `covariance_available`, + `correlation_available`, `runtime_seconds`, `iterations_performed`, + `exit_reason` (LSQ); `runtime_seconds`, `point_estimate_name`, + `sampler_completed`, `credible_interval_inner`, + `credible_interval_outer`, `acceptance_rate_mean`, `gelman_rubin_max`, + `effective_sample_size_min`, `best_log_posterior` (Bayesian). + +Three current output fields straddle `analysis.minimizer` and +`analysis.fit_result`: + +| Output concept | Field on `analysis.minimizer` | Field on `analysis.fit_result` | Relationship | +| ----------------------- | ----------------------------- | ------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | +| Wall time | `runtime_seconds` | `fitting_time` | Real duplication — same scalar in two places. | +| Iteration count | `iterations_performed` (LSQ) | `iterations` | Real duplication — same scalar in two places. | +| Objective vs reduced χ² | `objective_value` (raw χ²) | `reduced_chi_square` (χ² / dof) | Cross-category misplacement — two related but distinct scalars where the raw value sits on `minimizer` instead of with the rest of the fit outputs. | + +So a reader who wants "how long did the fit take" must already pick +between two places. The current layout has both **input/output mixed +inside `minimizer`** and **fit-output content split across `minimizer` +and `fit_result`** (whether the two scalars per row are the same value +or not). §2 resolves each row above explicitly. + +The consolidation ADR's two-line argument for keeping inputs and outputs +together was: + +1. **Symmetry with `Parameter`.** A `Parameter` holds both its user-set + initial value and its refined value plus uncertainty. +2. **One-place discoverability** > strict purity. + +The symmetry argument does not actually transfer. `Parameter.value` and +`Parameter.uncertainty` describe the _same scalar quantity_ before and +after refinement; they share a name, semantics, and lifecycle. +`minimizer.sampling_steps` (a user request) and +`minimizer.gelman_rubin_max` (a diagnostic the sampler reports) are +about completely different things and only share a namespace because the +consolidation ADR put them there. "One-place" is also already broken: +scalar fit outputs are split across `minimizer`, `fit_result`, +`fit_parameters`, and `fit_parameter_correlations` today. + +## Decision + +### 1. Split `analysis.minimizer` into inputs and outputs + +`analysis.minimizer` keeps only **writable user settings**. The +fit-filled output fields move to `analysis.fit_result`, which gets a +class hierarchy parallel to `minimizer` so each minimizer family can +declare its own output schema. + +After this ADR: + +| Category | Role | Writable | +| ------------------------------------- | -------------------------------------------------- | --------------------------- | +| `analysis.minimizer` | user-supplied settings | yes | +| `analysis.fit_result` | scalar fit outputs | no (internal `_set_*` only) | +| `analysis.fit_parameters` | per-parameter snapshots and posterior summary rows | no | +| `analysis.fit_parameter_correlations` | upper-triangle correlation rows | no | + +**`fit_result` is not a user-facing switchable category.** It is an +internal projection paired with the active `minimizer`. It does not +expose `fit_result.type` or `fit_result.show_supported()`; the only way +the user changes the active `fit_result` class is by setting +`analysis.minimizer.type`, which the owner's `_swap_minimizer` hook uses +to instantiate both `self._minimizer` and `self._fit_result` atomically. +This is an explicit, documented exception to the global selector +contract from +[`switchable-category-owned-selectors.md`](switchable-category-owned-selectors.md) +§1 because there is no user choice involved at the `fit_result` level — +the minimizer family fully determines the result schema. See the new +exception text added to that ADR (listed under §"ADRs amended"). + +**Family mapping is one-to-one between minimizer family and result +class.** Every minimizer registered under `MinimizerTypeEnum` maps to +exactly one `FitResult` concrete class according to its family: + +| `MinimizerTypeEnum` member | Minimizer family | Paired `FitResult` class | +| -------------------------- | ---------------- | ------------------------ | +| `LMFIT` | LSQ | `LeastSquaresFitResult` | +| `LMFIT_LEASTSQ` | LSQ | `LeastSquaresFitResult` | +| `LMFIT_LEAST_SQUARES` | LSQ | `LeastSquaresFitResult` | +| `DFOLS` | LSQ | `LeastSquaresFitResult` | +| `BUMPS` | LSQ | `LeastSquaresFitResult` | +| `BUMPS_LM` | LSQ | `LeastSquaresFitResult` | +| `BUMPS_AMOEBA` | LSQ | `LeastSquaresFitResult` | +| `BUMPS_DE` | LSQ | `LeastSquaresFitResult` | +| `BUMPS_DREAM` | Bayesian | `BayesianFitResult` | +| `EMCEE` _(when added)_ | Bayesian | `BayesianFitResult` | + +The pairing rule is encoded once on the minimizer base classes +(`LeastSquaresMinimizerBase._fit_result_class = LeastSquaresFitResult`, +`BayesianMinimizerBase._fit_result_class = BayesianFitResult`) so +`_swap_minimizer` reads the paired class off the new minimizer instance +and does not need a per-tag dispatch. + +This preserves the consolidation ADR's "no `_bayesian_*` mirror" +guarantee — there is exactly one output category, not seven — while +making the input/output boundary unambiguous. + +### 2. Field assignments + +**`analysis.minimizer` after the split** (writable settings only): + +- LSQ: `max_iterations`. +- Bayesian: `sampling_steps`, `burn_in_steps`, `thinning_interval`, + `population_size`, `parallel_workers`, `initialization_method`, + `random_seed`. + +`credible_interval_inner` and `credible_interval_outer` **stay on the +output side** in this ADR, attached to `BayesianFitResult` (see below). +They are persisted with the fixed values `0.68` and `0.95`, matching the +per-parameter interval columns (`posterior_interval_68_low/high`, +`posterior_interval_95_low/high`). Promoting the levels to user-writable +settings would let the user choose a 50% interval that then gets +persisted under a column named `posterior_interval_68_low` — a +data-integrity problem rather than a UX problem. A future suggestion ADR +can promote them to settings and generalise the column naming (e.g. +`posterior_interval_low_`) in one combined change; doing one +without the other is unsafe and out of scope here. + +**`analysis.fit_result` after the split** (outputs only). Common fields +live on `FitResultBase`; family-specific fields on the concrete classes: + +- `FitResultBase`: `success`, `message`, `iterations`, `fitting_time`, + `reduced_chi_square`, `result_kind`. +- `LeastSquaresFitResult` adds: `objective_name`, `objective_value`, + `n_data_points`, `n_parameters`, `n_free_parameters`, + `degrees_of_freedom`, `covariance_available`, `correlation_available`, + `exit_reason`, `r_factor_all`, `wr_factor_all`, `r_factor_gt`, + `wr_factor_gt`, `prof_r_factor`, `prof_wr_factor`, `prof_wr_expected`, + `number_restraints`, `number_constraints`, `shift_over_su_max`, + `shift_over_su_mean`, `profile_function`, `background_function`, + `threshold_expression`, `number_reflns_total`, `number_reflns_gt`. +- `BayesianFitResult` adds: `point_estimate_name`, `sampler_completed`, + `credible_interval_inner`, `credible_interval_outer`, + `resolved_random_seed`, `acceptance_rate_mean`, `gelman_rubin_max`, + `effective_sample_size_min`, `best_log_posterior`. + +The live deterministic result class may expose more descriptors than are +written for a specific saved result. `analysis/analysis.cif` serializes +the active subset: common LSQ descriptors, reflection descriptors only +when reflection rows exist, powder-profile descriptors only for +powder-profile results, and restraint / constraint counts only when +positive. Transient convergence diagnostics such as `shift_over_su_max` +and `shift_over_su_mean` remain live-result/report concerns and are not +part of the default fit-result CIF projection. + +The three overlapping pairs from §"Context" are resolved by **dropping +the `minimizer` copy** and keeping the `fit_result` copy: + +- `minimizer.runtime_seconds` removed; `fit_result.fitting_time` is the + single source. +- `minimizer.iterations_performed` removed; `fit_result.iterations` is + the single source. +- `minimizer.objective_value` removed. `LeastSquaresFitResult` keeps + **two distinct fields**: `objective_value` (raw χ² returned by the + minimizer's objective function) and `reduced_chi_square` (= χ² / + `degrees_of_freedom`). They are not duplicates; the unreduced value is + what the solver actually optimises and is useful for diagnostics on + small-dof fits, while the reduced value is what every user-facing + table and plot displays. `BayesianFitResult` does not carry + `objective_value` because the Bayesian engine optimises the log + posterior rather than χ² directly. + +### 3. CIF layout follows the Python split + +The `_minimizer.*` block becomes settings-only. A new `_fit_result.*` +block (already present today for the common header fields) absorbs every +fit output. The set of `_fit_result.*` tags depends on the active +`_minimizer.type`, matching the same shape-shifting convention that +`_minimizer.*` itself already uses per +[`switchable-category-owned-selectors.md`](switchable-category-owned-selectors.md). + +Example deterministic fit: + +``` +_minimizer.type 'lmfit (leastsq)' +_minimizer.max_iterations 1000 + +_fit_result.result_kind deterministic +_fit_result.success true +_fit_result.message converged +_fit_result.iterations 87 +_fit_result.fitting_time 12.34 +_fit_result.reduced_chi_square 1.42 +_fit_result.objective_name chi_square +_fit_result.objective_value 1532.4 +_fit_result.n_data_points 1024 +_fit_result.n_parameters 12 +_fit_result.n_free_parameters 8 +_fit_result.degrees_of_freedom 1016 +_fit_result.covariance_available true +_fit_result.correlation_available true +_fit_result.exit_reason converged +_fit_result.R_factor_all 0.041 +_fit_result.wR_factor_all 0.052 +_fit_result.R_factor_gt 0.038 +_fit_result.wR_factor_gt 0.049 +_fit_result.prof_R_factor 0.041 +_fit_result.prof_wR_factor 0.052 +_fit_result.prof_wR_expected 0.031 +_fit_result.number_constraints 2 +_fit_result.profile_function pseudo_voigt +_fit_result.background_function chebyshev +_fit_result.threshold_expression I>3\s(I) +_fit_result.number_reflns_total 128 +_fit_result.number_reflns_gt 121 +``` + +Example Bayesian fit: + +``` +_minimizer.type 'bumps (dream)' +_minimizer.sampling_steps 3000 +_minimizer.burn_in_steps 600 +_minimizer.thinning_interval 1 +_minimizer.population_size 4 +_minimizer.parallel_workers 0 +_minimizer.initialization_method latin_hypercube +_minimizer.random_seed ? + +_fit_result.result_kind bayesian +_fit_result.success true +_fit_result.message 'sampler converged' +_fit_result.iterations 3000 +_fit_result.fitting_time 124.7 +_fit_result.reduced_chi_square 1.18 +_fit_result.point_estimate_name best_sample +_fit_result.sampler_completed true +_fit_result.credible_interval_inner 0.68 +_fit_result.credible_interval_outer 0.95 +_fit_result.acceptance_rate_mean 0.27 +_fit_result.gelman_rubin_max 1.03 +_fit_result.effective_sample_size_min 482 +_fit_result.best_log_posterior -1234.56 +``` + +### 4. The runtime `analysis.fit_results` object is the same data shaped differently + +`analysis.fit_results` (plural, runtime) is the rich `FitResults` / +`BayesianFitResults` Python object that holds posterior samples, +predictive summaries, raw engine results, and reporting helpers. This +ADR does **not** rename it. After the split it remains the +"give-me-everything" accessor; `analysis.fit_result.*` (singular, CIF +category) holds the persisted scalar projection of the same fit. The +naming pair stays as today. + +A small UX win is added under the accepted display facade +([`display-ux.md`](display-ux.md)): the existing +`project.display.fit.results()` entry point gains a "Settings used" +table above the existing results tables, populated from +`analysis.minimizer.*`. No new `Analysis`-level display method is added; +the user-facing surface stays exactly where the display ADR put it. +Internally the helper reads `self.minimizer.*` and `self.fit_result.*` +and renders one combined view. + +### 5. Help and discoverability + +`analysis.minimizer.help()` after the split lists ~7 properties for +Bayesian and 1 for LSQ — every one writable. The "is this writable" +question disappears. + +`analysis.fit_result.help()` lists 6 common output properties plus the +family-specific ones, all clearly read-only. + +### 6. No new selector wiring + +`analysis.minimizer.type` remains the single user-facing selector. The +swap hook updates **both** `analysis.minimizer` and +`analysis.fit_result` instances atomically (via +`Analysis._swap_minimizer`, which reads the paired `_fit_result_class` +off the new minimizer base). The paired `fit_result` is not a +user-facing switchable: there is no `fit_result.type` and no +`fit_result.show_supported()`, per the documented exception added to +[`switchable-category-owned-selectors.md`](switchable-category-owned-selectors.md) +(see §"ADRs amended"). This keeps "one minimizer concept, one +user-facing type" intact and removes the temptation to swap the result +class independently of the minimizer. + +## Consequences + +### Positive + +- **Clear writable surface.** `analysis.minimizer.help()` shows only + settings. Inputs and outputs no longer mix in one namespace. +- **Single source for every output field.** The two current real + duplications (`runtime_seconds`/`fitting_time` and + `iterations_performed`/`iterations`) collapse to one location each. + The `objective_value`/`reduced_chi_square` pair is **not** a + duplication and both stay (see §2 for the distinction). +- **Family-specific outputs have a natural home.** Currently + `minimizer.gelman_rubin_max` lives on the Bayesian minimizer class; + after the split it lives on the paired `BayesianFitResult` class. The + two categories pair symmetrically and emcee inherits the pattern for + free. +- **CIF stays compact.** No new CIF blocks beyond `_fit_result.*` which + is already present. The settings/outputs split is reflected in the CIF + tag prefix. +- **The "are we done with a fit?" check becomes simple.** + `bool(analysis.fit_result.success.value)` answers it directly without + scanning a mixed input/output namespace. + +### Trade-offs + +- **Settings and matching outputs are two-place reads.** Mitigation: + `project.display.fit.results()` presents both. The current layout + already requires multi-place reads; this just makes the rule + consistent. +- **`fit_result` becomes an internally-paired category.** The pairing + cost is small (one paired-instance assignment in the existing + `_swap_minimizer` hook) and is invisible to the user — there is no + second `fit_result.type` selector. The exception to the global + selector contract is documented explicitly in §"ADRs amended" + alongside the selector ADR. +- **Saved CIF files from the post-consolidation layout cannot load + unchanged.** Beta posture (no legacy shims) applies. Tutorial fixtures + regenerate via `pixi run script-tests`. Tutorial `ed-24` already + carries a narrow archive normaliser; the new layout would extend it + once. +- **Reopens a decision from a recently accepted ADR.** Documented + explicitly above in §"Status Note". + +### ADRs amended by this ADR + +- [`minimizer-category-consolidation.md`](minimizer-category-consolidation.md) + — §1 ("Unified `minimizer` category replaces all sampler-input and + fit-result categories") becomes a partial rule: the unified + `minimizer` holds inputs; outputs move to the paired `fit_result`. + §"Alternatives Considered → D" updated to record the reversal and the + implementation evidence that prompted it. +- [`analysis-cif-fit-state.md`](analysis-cif-fit-state.md) — §"Minimizer + fit projection" rewritten to describe the split (`_minimizer.*` + settings-only, `_fit_result.*` outputs including family-specific + fields). +- [`runtime-fit-results.md`](runtime-fit-results.md) — closing paragraph + references this ADR alongside the existing two. +- [`switchable-category-owned-selectors.md`](switchable-category-owned-selectors.md) + — §1 ("The category owns its selector") gains a paragraph carving out + one documented exception: a category that is fully determined by + another category's `type` (today only `fit_result`, derived from + `minimizer.type`) is allowed to omit `category.type` and + `category.show_supported()`. The mechanism is described in §1 of this + ADR. The user-facing selector convention is otherwise unchanged. +- [`display-ux.md`](display-ux.md) — §"Fit results display" expanded to + mention that `project.display.fit.results()` now prints a "Settings + used" block above the result tables, sourced from + `analysis.minimizer.*`. No new public entry point is added. + +## Deferred Work + +- **Renaming `analysis.fit_results` (plural runtime object).** The + plural/singular pair is mildly confusing but the rename has wide blast + radius (tests, tutorials, every BayesianFitResults reference). Track + separately if the confusion remains after the combined display lands. +- **Paired internal categories beyond `minimizer` / `fit_result`.** This + ADR introduces the paired pattern for the minimizer only. If future + categories grow the same input/output asymmetry (e.g. extinction, + peak), apply the same pattern then; do not generalise pre-emptively. +- **User-configurable credible-interval levels.** The two interval + levels currently stay at the hardcoded `0.68` / `0.95`, matching the + fixed per-parameter column names (`posterior_interval_68_low` etc.). + Promoting the levels to user settings requires generalising the column + naming at the same time to avoid the data-integrity hole where a + `0.50` level lands in a column called `posterior_interval_68_low`. + Both pieces belong in a follow-on ADR so this proposal stays focused + on the input/output split. +- **CIF compatibility helper for ID 35 archive.** The + `_normalize_id35_archive_for_tutorial` helper in `ed-24.py` already + has a roadmap to deletion; the new CIF layout extends the rename map + one more line. No new architecture decision needed. + +## Alternatives Considered + +### A. Keep current mixed-category layout, fix only the duplications + +Drop `minimizer.runtime_seconds`, `.iterations_performed`, +`.objective_value` and route every reader to `fit_result.*`. Rejected +because it leaves the input/output mix on `minimizer` intact and +therefore does not fix the `minimizer.help()` discoverability problem. + +### B. Mark fields with metadata, keep one category + +Add an `is_input: bool` marker to each descriptor and have +`minimizer.help()` group inputs vs outputs in display. Rejected because +it ships the structural problem unchanged — the CIF still mixes both +under `_minimizer.*`, the duplications with `fit_result` remain, and the +`_set_*` vs writable-setter split is still ad-hoc. + +### C. Move outputs into the runtime `fit_results` object, not a CIF category + +Persist only settings in CIF; outputs live in `analysis.fit_results` at +runtime and `analysis/results.h5` on disk. Rejected because the small +scalar outputs (success, χ², runtime, R̂) are exactly what users want to +read from CIF without unpacking HDF5, and the consolidation ADR +explicitly puts them in CIF (`_minimizer.*` today). + +### D. Rename `fit_result` to mirror minimizer (`minimizer_result`) + +Make the pairing rule explicit in the name (`` and `_result`). +Rejected because the recently-accepted +[`switchable-category-owned-selectors.md`](switchable-category-owned-selectors.md) +ADR deliberately drops `_type` and other suffixes from category names; +adding `_result` walks the convention back. diff --git a/docs/dev/adrs/accepted/pattern-display-unification.md b/docs/dev/adrs/accepted/pattern-display-unification.md new file mode 100644 index 000000000..e4f057a00 --- /dev/null +++ b/docs/dev/adrs/accepted/pattern-display-unification.md @@ -0,0 +1,92 @@ +# ADR: Unified Pattern View + +## Status + +Accepted and implemented. + +## Date + +2026-06-04 + +## Context + +`project.display.pattern()` was originally specified by the +[Display UX Facade](display-ux.md) ADR with an `include=` argument that +assembled a view from named layers (`measured`, `calculated`, +`background`, `residual`, `bragg`, `excluded`, `uncertainty`), plus a +`show_pattern_options()` discovery table. `include='auto'` already chose +the most informative combination from project state. + +In practice this surfaced several problems: + +- The single-panel path (`plot_meas`/`plot_calc` → `plot_powder`) and + the composite path (`build_powder_meas_vs_calc_figure`) computed + figure height and x-range independently, so they drifted. A + measured-only view rendered at the full three-panel height in the lazy + docs runtime — the + [Plotting & Docs Performance](plotting-docs-performance.md) skeleton + fell back to `DEFAULT_HEIGHT * PLOTLY_HEIGHT_PER_UNIT` — and gained + stray left/right autoscale padding the composite did not have. +- `'excluded'` was an opt-in overlay, but excluded regions are a + property of the experiment, not a viewing choice; `include='measured'` + and `include=('measured', 'excluded')` produced different plots of the + same data, which read as redundant. +- For a scientist audience, choosing layers is friction. The project + state already determines what is meaningful to show, which is exactly + what `include='auto'` computed. + +## Decision + +`pattern(expt_name, x_min=None, x_max=None, *, x=None)` always renders +every kind of data the project state supports — the former +`include='auto'` behaviour is now the only behaviour. The `include` +parameter, the `show_pattern_options()` method, and the option-status +discovery table are removed. Strict-subset views (for example +measured-only once a calculation exists) are intentionally no longer +offered; zooming (`x_min`/`x_max`) and the x-axis variable (`x`) remain. + +Excluded regions are always shaded when defined on the experiment, +skipped only when a custom `x` axis variable is selected, because the +overlay cannot be mapped onto an arbitrary axis. + +Single-panel and composite charts share one figure-sizing and x-range +core. `plot_powder` builds its layout with the same +`_single_main_panel_height_pixels(...)` height and tight +`_composite_x_range(...)` as the composite main row, and `_get_layout` +already applies the composite margins. A one-row chart is therefore the +top row of the multi-row chart pixel for pixel, by construction, so the +two paths cannot diverge again. + +This supersedes the `include`-based pattern design in the +[Display UX Facade](display-ux.md) ADR. The remainder of that ADR — the +facade grouping, renderer categories, and naming rules — still stands. + +## Consequences + +- `pattern(expt_name=...)` is the whole pattern API surface; tutorials, + docs, and tests no longer pass `include=`. +- The project is in beta, so this replaces the previous API with no + compatibility shim; tutorials and tests are updated to the current + API. +- Sizing and range differences between one- and three-panel views are + prevented structurally, not patched per call. +- `structure(include=...)` and `show_structure_options()` are + unaffected: choosing which 3D features to draw remains a genuine + viewing choice (see + [Crystal Structure 3D Visualization](crysview-structure-visualization.md)). + +## Alternatives Considered + +Keeping `include` as the view-selection vocabulary (the original +design). `include` had been chosen over `layers`, `components`, +`content`, `view`, `series`, and boolean flags because it read as user +intent and fit residual rows and Bragg ticks. It is removed now because +the only combination users reached for in practice was the automatic +"show everything available" view; the subset combinations added API +surface and a discovery table without a matching workflow, and the +parallel single-panel rendering path was the source of the sizing and +range divergence. + +Keeping `'excluded'` as an opt-in overlay, or as a redundant no-op +token, was rejected: excluded regions belong to the experiment, so +shading them is automatic whenever they are present. diff --git a/docs/dev/adrs/accepted/plotting-docs-performance.md b/docs/dev/adrs/accepted/plotting-docs-performance.md new file mode 100644 index 000000000..3d68101a4 --- /dev/null +++ b/docs/dev/adrs/accepted/plotting-docs-performance.md @@ -0,0 +1,466 @@ +# ADR: Plotting & Docs Performance for Interactive Figures + +**Status:** Accepted **Date:** 2026-06-02 + +## Group + +Documentation. + +> This ADR follows [`AGENTS.md`](../../../../AGENTS.md). It spans the +> documentation build (MkDocs) and the display serialization contract, +> so it also relates to the User-facing API ADRs +> [`display-ux.md`](display-ux.md) and +> [`crysview-structure-visualization.md`](crysview-structure-visualization.md). +> No public Python API change is intended; the change is in how figure +> HTML and its JavaScript runtime are delivered. + +## Context + +### Symptom + +Generated tutorial pages that contain many interactive figures (mostly +Plotly, plus the occasional Three.js crystal-structure view) can take +from several to a few dozen seconds before the page becomes responsive. +The plots are valuable and should stay interactive; the goal is to keep +interactivity while making the page usable immediately and letting plots +appear progressively. + +### How figures reach a docs page today + +1. Tutorial sources are `docs/docs/tutorials/ed-*.py`; notebooks are + generated artifacts (per + [`notebook-generation.md`](notebook-generation.md)) and are committed + with **outputs stripped** (`notebook-strip`). +2. The docs CI + ([`.github/workflows/docs.yml`](../../../../.github/workflows/docs.yml)) + runs `notebook-exec-ci` to **execute** every notebook, baking the + rendered cell outputs into the `.ipynb`, then `mkdocs build` with + `mkdocs-jupyter` configured `execute: false` simply embeds those + pre-rendered outputs into the HTML. +3. Each Plotly figure is emitted by `PlotlyPlotter._show_figure` + ([`src/easydiffraction/display/plotters/plotly.py`](../../../../src/easydiffraction/display/plotters/plotly.py)) + as a `text/html` output via + `serialize_html(fig, include_plotlyjs='cdn')` wrapped in + `IPython.display.HTML`. The resulting HTML, **per figure**, carries: + - a `
` plus an inline ` + {% endblock %} + ``` + + In `SHARED` mode the Three.js renderer then emits **only** the module + bootstrap (bare `three` / `three/addons/...` specifiers) and **no** + per-scene importmap, so every scene on a page resolves against this + single head-level map. `STANDALONE` reports are unaffected — they + keep their self-contained inline importmap (a standalone file has no + theme override). Injecting the map on every page is harmless where no + scene consumes it (the tiny JSON is inert), keeping the override + simple. + +7. **SHARED figures downcast bulk float64 arrays to float32 (a bounded, + display-only precision decision).** In `SHARED` mode only, the + serializer transcodes the figure spec's float64 typed arrays to + float32 (~7 significant figures) before embedding, roughly halving + the payload (measured: ed-6 5.5 MB → 3.4 MB; 2.2 → 1.4 MB gzipped). + This is an explicit **display** decision, not a change to stored + data, and it is bounded: + - It operates on a **copy** of the serialized figure + (`fig.to_plotly_json()`), never the source parameters, CIF, or any + computation. + - It applies **only** to docs `SHARED` figures. Live notebooks + (`INLINE`), reports (`STANDALONE`), and every CIF/state file keep + full float64. + - It is **visually lossless**: screens resolve ~3 significant + figures, and the tutorials' hover templates format to a few + decimals (e.g. `:,.2f`), so float32 changes no displayed or + hover-visible value at the precision actually shown. + + Storage-side numeric precision is a separate, deliberate decision, + proposed in a `cif-numeric-precision` ADR suggestion (out of scope + for this change, not committed on this branch). Phase 2 adds coverage + for the `f8`→`f4` transcode (shape preserved, round-trips through + Plotly) and a representative hover/range-sensitive figure whose + formatted values are unchanged. + +This pays the network bill once per page from the same origin, removes +the per-figure JS duplication, and turns first paint from "render every +figure" into "render nothing until seen" — addressing both bottlenecks +while keeping every plot fully interactive. + +## Options considered + +### Option A — Tactical: lazy activation only + +Keep each figure's self-contained, CDN-loaded HTML exactly as today, but +wrap the existing per-figure post-script so `Plotly.newPlot` fires from +an `IntersectionObserver` behind a "Loading…" placeholder. + +- **Pros:** smallest change; isolated to the post-script; delivers the + "plots appear one by one" UX the request asked for. +- **Cons:** does **not** fix the network bottleneck (still CDN, still + RequireJS, Three.js still inlined per scene, importmap bug remains); + keeps ~15 KB × N duplicated post-scripts; leaves the long-term CDN + fragility for versioned docs. Robustness: low. + +### Option B — Shared self-hosted runtime + lazy activation _(recommended)_ + +As in **Decision** above: self-host pinned runtimes loaded once per +page, an explicit embedding mode, and a shared lazy loader. + +- **Pros:** fixes **both** bottlenecks; firewall-proof and archival + (versioned docs stay self-consistent); de-duplicates and centralizes + figure JS (maintainability); fixes the importmap bug; generalizes the + pattern reports already use; keeps reports self-contained. +- **Cons:** the most work now — touches `serialize_html`, the Three.js + renderer, `mkdocs.yml`, a vendoring/build step, and a new shared JS + asset; requires careful handling of the three delivery targets and of + the live-notebook experience. Robustness: high. **Matches the stated + preference to accept more work now for long-term robustness.** + +### Option C — MkDocs post-processing plugin + +Leave the Python serialization mostly as-is and add a custom MkDocs +plugin (or adopt `mkdocs-plotly-plugin`, already eyed in a `docs.yml` +comment) that post-processes built pages to strip duplicate runtimes, +inject one shared runtime, and add the lazy loader globally. + +- **Pros:** centralizes behavior in the build; minimal Python display + changes. +- **Cons:** adds a bespoke build dependency to maintain against MkDocs + and Plotly upgrades; "spooky action" in a post-build pass that is + harder to test than deterministic serialization; + `mkdocs-plotly-plugin` targets `.plotly` JSON files in Markdown, not + executed-notebook outputs, so it is not a drop-in. Robustness: medium, + but with ongoing maintenance cost and weaker testability than B. + +### Comparison + +| Concern | A — tactical | B — shared+lazy | C — plugin | +| ------------------------------------------- | ------------ | --------------- | ---------------------- | +| Plots appear progressively | ✅ | ✅ | ✅ | +| Removes runtime-CDN dependency | ❌ | ✅ | ✅ | +| Smaller runtime (partial bundle) | ❌ | ✅ | possible | +| De-duplicates per-figure JS | ❌ | ✅ | ✅ | +| Fixes Three.js importmap bug | ❌ | ✅ | maybe | +| Archival / version-frozen docs | ❌ | ✅ | ✅ | +| Reports keep `offline` contract (unchanged) | ✅ | ✅ | ✅ | +| Implementation cost now | low | high | medium | +| Long-term maintenance cost | low | low | higher (custom plugin) | +| Testable in unit tests | partial | ✅ | weak | + +## Consequences + +### Positive + +- Page is responsive immediately; figures render on demand, one by one. +- One same-origin runtime fetch per page, cached across the site; + partial bundle roughly halves the Plotly download. +- Per-figure HTML shrinks substantially (no embedded runtime, no + duplicated post-scripts), so executed `.ipynb` artifacts and built + pages are smaller. +- Versioned docs become self-consistent and archival; no runtime CDN. +- Theme-sync / resize / legend logic lives in one auditable place. +- The multiple-importmap Three.js bug is fixed. + +### Negative / cost + +- Larger change across display, report (verification only), docs build, + and a new vendored asset + build step. +- Vendored runtimes must be kept current, but the bump script + pixi + task (Decision 5) reduce this to editing a pinned version + hash and + running one task; licenses regenerate and an optional `--check` mode + guards against drift. +- The shared loader is now load-bearing for docs rendering; it needs its + own tests and a no-/failed-JS fallback story. + +### Neutral + +- No intended change to public Python API or to how authors write + tutorials; the figures look and behave the same, only faster. + +## Risks and mitigations + +- **Live-notebook rendering.** `SHARED` placeholders need the docs + loader, so they must never reach a live Jupyter session. Settled by + the env-var routing (Decision 2): only the docs notebook-execution + tasks request `SHARED`; an unset variable resolves to `INLINE`. Cover + the resolver with a unit test asserting both the default and the + docs-build override. +- **Report `offline` contract.** Keep + [`project-summary-rendering.md`](project-summary-rendering.md) + authoritative (Decision 4); the existing `offline=True` / + `offline=False` report tests must stay green and gain no `SHARED` + behavior. +- **Partial bundle missing a trace type.** Audit every trace/type used + across tutorials and reports before pinning `plotly-cartesian`; fall + back to the full bundle if any `scattergl`/3D/map usage exists. +- **`IntersectionObserver` / no-JS / print.** Provide eager fallback + when the observer is unavailable and when `matchMedia('print')` + matches, plus a `