Skip to content

Restructuring: top-level layout + workflow/papers/scratch division#197

Draft
cailmdaley wants to merge 64 commits into
developfrom
cleanup/restructuring
Draft

Restructuring: top-level layout + workflow/papers/scratch division#197
cailmdaley wants to merge 64 commits into
developfrom
cleanup/restructuring

Conversation

@cailmdaley

@cailmdaley cailmdaley commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Restructuring sp_validation

Supersedes #188 (closed as a side-effect of a branch rename — all work is intact on this branch; only the PR wrapper was lost).

Status — WIP draft, not yet for review.

  • Foundation folded in. Sacha's sachaguer:develop (Merge Sacha's fork with fiducial sp_validation #192 head, 120 files: paper plots,
    harmonic configs, library changes) merged; the cosmology.py KeyError: 'mnu' blocker fixed
    (one line) — test_cosmology.py 26/26 green. Sacha's broad .gitignore bans
    (*.png *.sh *.fits) were not adopted.
  • Back-pressure guard suite all green — characterization guards that must stay green as
    files move: ① imports + standalone-scripts/ resolution, ② snakemake -n passes,
    ③ config-path existence (Candide-local), ⑤ symlink integrity, ⑥ dangling-reference grep.
    Full suite 86 passed, 0 skipped; the move-map guard is active with five registered moves.
  • Phase 2 — the moves — COMPLETE. The tree now has the target top-level shape:
    • workflow/ + papers/{bmodes,catalog,harmonic} — the bmodes split (generic compute base
      composed via Snakemake module; paper layers on top). pure_eb run dir repointed;
      all_tapestry dry-runs cleanly from both locations.
    • cosmo_val/ promoted from notebooks/ (code + config home beside cosmo_inference/);
      every tracked reference swept notebooks/cosmo_valcosmo_val, on-disk outputs moved
      along so candide-absolute paths stay live.
    • scratch/ added (tracked, per-person; conventions in its README), one top-level
      results/ (contents gitignored, dir kept), root output/ ignored, the dead hand-listed
      notebook block dropped from .gitignore.
  • Cleanup begun: defunct/ deleted; nbstripout + 2 MB large-file pre-commit hooks land
    the bloat discipline (activation: pre-commit install, see CONTRIBUTING).
  • Remaining: fold glass_mock core into src/ (code-level refactor, own pass); curate
    notebooks/ to official demos (which reduction notebooks become scripts/ — review with
    Martin); branch/milestone tidy-up (Restructuring proposal: top-level layout + workflow/paper/scratch division #188/Foundation: merge pending local code into develop #189 closure) with Cail.
  • develop is untouched. Live state is tracked in the sp-validation-restructuring fiber.

— Claude on behalf of Cail


One organizing principle — the things you run live at the top — a clean three-way
split between analysis, papers, and scratch, and a modular workflow built for more than one
person.


The shape

Today cosmo_val is buried inside notebooks/ while cosmo_inference/ is top-level, so
you constantly hunt for where each one lives. The fix: the things a person actually runs
sit side by side at the top, sharing library code in src/ underneath.

sp_validation/
├── src/sp_validation/   library code (+ glass_mock core)
├── cosmo_val/           validation: code + config        (promoted from notebooks/)
├── cosmo_inference/     inference: code + config         (cosmosis / cosmocov)
├── workflow/            ALL analysis — modular Snakemake, multi-person → results/
├── papers/             final-figure assembly only (PDF, colour, layout)
├── scripts/            real reduction scripts (catalog builders, masking)
├── scratch/            per-person — ad hoc work + personal workflows (tracked)
├── notebooks/          curated to official demos / tutorials
├── results/            analysis products + diagnostic plots (contents gitignored)
└── docs/  tests/  config/

Division of labor

The boundary is the inputs to a paper figure: everything up to that point is analysis;
the figure itself is presentation.

  • workflow/ — all analysis. Generic, reusable, modular, organized for multiple people.
    Produces analysis products and diagnostic plots (sp_validation makes many — they go to
    results/). The bulk of the work lives here.
  • papers/<paper>/ — final-figure assembly only. The figure PDF, colours, layout,
    recombining data for presentation. Tied to one paper, and may never touch Snakemake.
  • scratch/<person>/ — personal and ad hoc. Experiments and one-off custom workflows.
    Tracked, because seeing each other's scratch is useful.

How the workflow scales — modular, not monolithic

Nothing in this analysis is computed once: the catalog changed ~20× in the first release
suite, and every paper varies the data vector, covariance, and inference. So the workflow
is parameterized — the rules are shared, the config changes each time. Snakemake's
module directive imports the rules under your own config and an output prefix, and lets
you override any single rule:

module analysis:
    snakefile: "../../workflow/Snakefile"
    config:    config              # this run's catalog, cuts, blind
    prefix:    "results/bmodes"    # products land here — no clobbering

use rule * from analysis
# swap is per-rule: redefine just the data-vector rule to override it

One top-level results/; each run namespaces under results/<name>/ via the prefix, so
people don't clobber each other. A --dry-run on each composition is the safety net that
lets the structure grow without silent breakage.


Cleanup

  • Delete defunct/ (quarantined since 2024) and the exploratory 2021–22 notebooks — it
    all stays in git history.
  • Curate notebooks/ to official demos and tutorials; personal scratchy ones move to
    scratch/.
  • Discipline via tooling, not bans: nbstripout strips notebook outputs on commit (the
    repo's weight today is committed notebook outputs), plus a pre-commit size hook.
  • Path translation — collecting the paper dirs breaks ~35 hardcoded absolute paths; a
    mechanical sweep rewrites them (scripts included) to the single repo-relative results/.

The milestone

A suite of PRs, in sequence:

  1. Foundation — merge pending local code into develop. (Sacha — folded into this branch)
  2. Restructuring — this PR: proposal + implementation behind the guard suite.
  3. Glass mocks → tomography.
  4. Input pipeline → tomography.

— Claude on behalf of Cail

sachaguer and others added 30 commits March 3, 2026 15:55
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fold Sacha's pending foundation (PR #192 head, sachaguer:develop @ c22f075)
onto current develop so the restructuring builds on his foundation without
racing his merge gesture (Cail's direction, 2026-06-05).

.gitignore conflict resolved in favour of develop: kept the .felt tracking
block, rejected sacha's broad cluster bans (*.png *.sh *.fits *.out *.err) —
those get narrowed during the restructuring gitignore pass, not adopted
wholesale. cosmo_val.py / cat_config.yaml auto-merged cleanly (origin's
docstring-RST polish + sacha's functional changes did not collide).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
cosmology.py get_cosmo read planck_defaults["mnu"] but the dict never
defined the key, so every bare get_cosmo() call (no ccl_params, no mnu
arg) raised KeyError: 'mnu'. Add "mnu": PLANCK18["m_nu"] (0.06 eV).

Verified: test_cosmology.py 26/26 pass (was immediate KeyError before).
This is the one blocker that kept Sacha's foundation from running clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
docs/source/sp_validation.*.rst are regenerated on every docs build by
sphinx-apidoc (deploy-docs.yml: `sphinx-apidoc -feTMo docs/source
src/sp_validation`), matching the already-ignored fortuna.*/scripts.*
stubs — they should never be committed.

uv.lock: the container is the canonical runtime (CLAUDE.md), the lockfile
has never been tracked, so ignore it rather than make an unowned
pinned-dep commitment. One-line flip to track if we decide to pin.

Establishes a clean base for the restructuring branch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sacha's branch removed the cosmosis_pipeline_glass_mock_0*.ini and
_v0*.ini ignore patterns, which un-ignored ~700 generated glass-mock
pipeline configs in cosmo_inference/cosmosis_config/. Restore the two
specific patterns (not broad bans) so the tree returns to develop's
clean state. These are generated artifacts, never tracked.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
cailmdaley and others added 23 commits June 11, 2026 01:44
The validation code+config home now sits beside cosmo_inference/, per the
restructuring plan. All tracked references swept notebooks/cosmo_val ->
cosmo_val (on-disk output/ moved along, so candide-absolute paths through
the pure_eb symlink stay live); move registered in the dangling-reference
guard. Guards: 50 passed.
…block

Ignore /results/ contents (dir kept via .gitkeep) and root /output/; the
~35-line hand-listed notebook block is gone (its entries no longer exist
on disk — superseded by curation + nbstripout discipline). Kept the live
output-shaped ignores (cosmo_val/output*, gaussian-sims dirs, generated
cosmosis configs).
scratch/<person>/ holds ad hoc experimentation and personal custom
workflows, tracked so scratch can be shared; README states the
conventions (promote to workflow/ when generalizable, outputs stay out).
Notebook outputs strip on commit; files over 2 MB are rejected.
Per-clone activation (pre-commit install) documented in CONTRIBUTING.
The path was built from components, so the mechanical string sweep
missed it; full suite back to green.
…absent

The Docker image is built from the Git build context — tracked content
only, no .git — so 'git ls-files' exits 128 in-image and the dangling-
reference and tracked-symlink guards errored. Walking the image tree
scans exactly the tracked set, so the guards keep their teeth in CI.
Also exclude .pytest_cache/.venv furniture from the reference scan.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ERED itself

The workflow composes a candide-absolute catalog configfile and terminal
inputs on /n17data, so the dry run can only be constructed on the
cluster — same skip pattern as the test_cosmo_val data guards. The test
now also satisfies the Snakefile's envvars: declaration itself instead
of inheriting it from the invoking shell.

Fiber outcome notes the CI-hermeticity fix.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The 6.0.3 release on PyPI ships dist-info only — no python module — so
'import sphinxawesome_theme' fails after a clean-looking install and the
docs build dies with an ExtensionError. Pin !=6.0.3 so resolution falls
back to a working release (and heals itself when 6.0.4 lands).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…el, conflicting-PR silent docs skip)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants