Ngmix v2.0 (CI mirror of #740)#741
Conversation
…to ngmix_update
- bin/ scripts were untracked, causing Docker build to fail - Fix license field to use SPDX string format (MIT) to resolve SetuptoolsDeprecationWarning Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…port + get_guess removal); part 2 (methodology) deferred Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…11 inline + summary); review delivered, Martin to merge Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Re-reviewed against current #741 head: no code changed since the part-2 review, so all 11 findings stand. Martin closed fork PR #740 and consolidated onto #741 (canonical, green, mergeable); engaged only to ack the RNG fix. Triaged 11 findings (5 cut-and-dry / 5 decisions / 1 resume); weight-norm (949) + *_psfo (1045) flagged as the only two merge-gates. report.html rewritten as next-steps triage; summary comment posted to #741 (issuecomment-4626968551); prs-in-flight indexed with #740-closed disposition. Report-only round; fiber closed for Cail's review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ecision-ready reports) Workflow (6 investigators + 2 synth) on the two hard problems: - WEIGHTS (#604+949): two coupled regressions in prepare_ngmix_weights (dead get_noise -> whole-stamp sigma_mad; lost binarization -> double- counts real ivar). Empirically confirmed (truth ivar 1e6 -> recovered 8.8e11). Rec: split — minimal v1-restore in #741 + SExtractor BACKGROUND_RMS baseline as separate PR (closes #604). - SIZE (r50/T): galaxy r50=T(area), PSF r50psf=sigma, neither=1.1774sigma; UNIONS-3500 paper reports r_h as primary. Rec: transform-at-source + cs_util converters; bonus — sp_validation T_to_fwhm dimensionally wrong. Adds weights-report.md, size-report.md, deep-dive-report.html. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- shapepipe/ngmix-weights-ivar (Codex): fix the two prepare_ngmix_weights regressions (minimal v1-restore + red->green test in #741) and the SExtractor BACKGROUND_RMS inverse-variance baseline as a separate PR (closes #604). Points at weights-report.md. - shapepipe/ngmix-size-columns: honest r50 at the ngmix source + cs_util converter web + fix the sp_validation T_to_fwhm leakage bug. Points at size-report.md. Both installed as drafts pending Cail's dispatch decision. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…741 - Pushed cleanup bd60dc8 to origin/ngmix_v2.0 (dd4f656..bd60dc8). - Enabled both oneshot shuttles: ngmix-weights-ivar (codex), ngmix-size-columns (claude-opus). Workers prepare branches/PRs+reports; merge stays Cail's. *_psfo gate + runner decorators tracked for the eventual #741 reply. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
sigma_mad(gal) == 0 on a constant stamp made the scalar fallback compute mask * inf, which is NaN wherever the mask is 0 (a fully-masked constant stamp emitted an all-NaN weight map). Guard on sig_noise > 0 and return all-zero weights instead; the downstream wsum == 0 epoch cut already handles the zero-weight case. Pre-existing v1/v2 edge, not introduced by the #604 work.
Close the #604 coverage gap flagged in review: the per-pixel RMS branch was pinned only by a 3x3 hand-computed matrix and the rescale unit test. make_data already accepts a per-pixel noise map (document it), so inject heteroscedastic truth and assert the Observation weight equals 1/(Fscale*rms)^2 exactly through rescale_epoch_fluxes -> prepare_ngmix_weights -> make_ngmix_observation, with Megapipe-masked and flagged pixels zeroed. Also add the degenerate constant-stamp guard test (np.errstate raise: no divide/invalid warnings, all-zero weights).
scripts/python/fitting.py was removed by the v2.0 dead-code cleanup (bd60dc8), leaving a silently-skipping stale entry. Drop it, and turn the missing-file skip into a hard failure so the list keeps reflecting reality.
When the option is set, the RMS sqlite must exist for every tile (fail-fast FileNotFoundError, no per-tile fallback); the scalar sigma_mad fallback engages only when the option is absent from the config entirely.
…columns # Conflicts: # src/shapepipe/tests/test_ngmix.py
fix(ngmix): emit true half-light radii in r50 columns; dedupe PSF size columns
The ngmix resume path was deleted in bd60dc8 (Martin: 'a hack to resume interrupted runs ... can be removed now'); this template entry was the last reference wired to ngmix_runner. The mask/get_images CHECK_EXISTING_DIR entries elsewhere are live features and stay. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The WCS moved from the named LOG_WCS config option into the positional input list (input_file_list[-1] when MAKE_POST_PROCESS is True), but the @module_runner metadata and the package docs still described the old contract. Declare log_exp_headers/.sqlite from merge_headers_runner in the decorator (matching the ngmix_runner convention) and replace the stale LOG_WCS docs with the positional contract. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
ngmix_runner takes the merged WCS header log positionally (input_file_list[6]) and never reads LOG_WCS. Document the positional contract, add merge_headers_runner to the parent modules, and document the real SAVE_BATCH option in its place. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Review-round summary. Every open thread above now has a reply; here's the shape of what landed on On this branch:
Companion PRs (the r50/T thread):
Awaiting your call (methodology, no code pushed): the — Claude on behalf of Cail |
merge_headers writes TILE_ID as the first key of the tile-level log_exp_headers<tile>.sqlite, and make_post_process derived n_hdu from the first key's value — len(tile_id_string) instead of the CCD count — so every epoch on the unscanned CCDs was silently dropped from N_EPOCH/EPOCH_* (and hence from ME vignets and shape measurement). Regression test builds the tile-mode sqlite via merge_headers and asserts an object on the last CCD keeps all its epochs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
ngmix 2.x run_fitter returns flags != 0 on failure instead of raising, and the failed result carries none of the measurement keys (g, g_cov, T, T_err, flux, flux_err, s2n); compile_results indexed them directly, so a single failed object crashed the whole tile with a KeyError at save time. Failed types are now recorded as NaN with their flags preserved.
With ignore_failed_psf=True a failed PSF epoch stays in obsdict carrying only flags/pars, so reading result['T'] KeyError-dropped the whole object even when the shear fit succeeded on the surviving epochs. The average now skips flags != 0 epochs (all-failed still hits the wsum == 0 guard), and n_epoch_model counts surviving epochs instead of submitted ones.
The rewrite hard-coded res['mcal_flags'] = 0, so the NGMIX_MCAL_FLAGS column written by make_cat was constant-zero and any mcal_flags == 0 quality cut passed every object, failed fits included. Restores the v1 contract: mcal_flags = bitwise OR of all per-type fit flags, so failed objects (now NaN-recorded rather than crashing) carry nonzero flags.
The cfis_simu configs still used the removed LOG_WCS/ME_LOG_WCS options, so the runners' positional reads (ngmix input_file_list[6], mccd_interp/ vignetmaker [1], sextractor [-1] with MAKE_POST_PROCESS) would IndexError or grab the wrong file. Migrated them to the merge_headers_runner input pattern used by example/cfis; sextractor exposure runs gain an explicit FILE_EXT so the 3-entry pattern override no longer mismatches the runner's 4-entry default. Also renamed stale run_sp_exp_Mh references in the cfis templates to run_sp_tile_Mh_exp, the name config_tile_Mh_exp.ini actually produces.
The column now stores ngmix 2.x's nfev (solver function-evaluation count, ~tens-hundreds, -1 on some failures), not the v1 1-5 retry count; the old name misrepresented the value. No downstream consumer reads it: make_cat's _save_ngmix_data never touches it, and the only ntry matches in sp_validation are base64 image blobs in notebook outputs.
copyfile was orphaned by the resume-path removal; Tile_cat's size/e/ theta attributes were read from the catalog but never consumed anywhere in src/ or scripts/. get_noise stays: scripts/jupyter/ test_centroid_shift.py imports and calls it.
Review — part 3 (fresh pass)Provenance: same convention as parts 1–2 — this is a fresh full-diff pass by Claude working on candide, against head b2dcd79. Every finding below was empirically demonstrated before being fixed, each fix carries a regression test confirmed red on the unfixed code, and the chain is now pushed to this branch ( Blockers (2)Both live in the v2.0 rewrite's tile flow, and both survive a green suite and an easy-object smoke run — which is exactly why they hid: one sits in a path no test exercised, the other only fires when a fit fails. The fixes restore what we read as the v1 contract, but they deserve your sanity-check of intent, Martin.
Should-fixes (pushed; please confirm intent)
Noted, not changed
Branch state, empiricallyWith the chain applied the container suite is 270 passed / 1 failed — the one failure is the known — Claude on behalf of Cail |
What this is
A same-repo mirror of #740 (@martinkilbinger's "Ngmix v2.0"), pushed to a branch on
CosmoStat/shapepipeso that CI actually runs. All 57 commits are authored by Martin (and carry Lucy Baumont's and Axel Guinot's work) — pushing the branch preserves that authorship unchanged; the only thing this PR adds is a same-repo head so GitHub Actions fires thepull_requestworkflow without the fork-PR approval gate. #740 received no CI runs at all for this reason.Substance is identical to #740 — see that PR for the full description. In short: upgrade ngmix to 2.4.0 and adopt Lucy's new ngmix classes/interface; overhaul the shape-measurement module; centroid-bias fix + validation; v2.0 production-run plumbing.
Going forward, opening PRs directly on
CosmoStat/shapepipe(rather than from a fork) avoids this — fork PRs don't trigger our Docker-image CI without a maintainer approval that wasn't happening.Closes/supersedes #740 once CI is green (leaving that call to Martin).
Review
A detailed review is on its way (read against Martin's checklist plus a science-quality pass). Headline from exercising the new fitter against real ngmix 2.4.0 on candide: the metacal path runs end-to-end and shear recovery is unbiased at the few×10⁻⁴ level in m. Full notes to follow.
— Claude on behalf of Cail