Share the global-parameters core across executables; generate MPI broadcasts; build hygiene#1552
Share the global-parameters core across executables; generate MPI broadcasts; build hygiene#1552sbryngelson wants to merge 26 commits into
Conversation
Sweeps bubble_model, avg_state, wave_speeds, recon_type, muscl_order, muscl_lim, int_comp, format, and precision (99 sites). Includes renaming legacy WENO_TYPE/MUSCL_TYPE comparisons to recon_type_weno/recon_type_muscl (same values) and two select-case labels in m_qbmm; WENO_TYPE/MUSCL_TYPE retirement is a follow-up.
Fypp resolves #:include at parse time, so generated_case_opt_decls.fpp is now emitted for every target (header-only stub outside simulation) and included unconditionally. Restores the dropped shear GPU_DECLARE (consumed in device kernels), restores the original nmom guard conditions in pre/post, and drops unused imports. The post-process beta_idx ordering change from the hoist is retained deliberately: it aligns post with simulation's writer layout (the parent ordering was inconsistent for bubbles_lagrange combined with mhd/elasticity/etc.).
Simulation declares nb in its own case-optimization block (excluded from generated decls), so the hoisted routine cannot reference it directly; pass it as an argument like nmom.
Drive s_mpi_bcast_user_inputs in all three m_mpi_proxy.fpp files from generate_bcast_fpp(target) in fortran_gen.py. The generator emits case_dir, class-(a) scalars (INT/LOG/REAL), FORTRAN_ARRAY_DIMS arrays, and the fluid_pp / bub_pp / lag_params / chem_params struct-array loops. Manual residues (bc_x/y/z members, domain bounds, m/n/p_glb, patch loops, etc.) stay in each file. get_generated_files() grows from 12 to 15 entries. Latent bug fixed: chem_wrt_Y (post, FORTRAN_ARRAY_DIMS dim=num_species) was namelist-bound but never broadcast; consumed by s_save_data on all ranks. Registry-driven generation closes it by construction. Tuple-set equivalence: pre +n_start_old (dead param, harmless), post +chem_wrt_Y only, sim exact identity.
bc_x/y/z are per-target declarations (the documented multi-variable-line exclusion), so their default assignments cannot live in m_global_parameters_common; the hoist had moved them and broke compilation. Restores the three BC default blocks to each executable after the s_assign_common_defaults call.
Verbatim motion only — no renames, reflows, or logic changes. Section map (original line ranges → files): lines 129-331 compiler flags / GPU logic → cmake/GPU.cmake lines 107-461 FYPP_EXE discovery + HANDLE_SOURCES → cmake/Fypp.cmake lines 464-541 params codegen stamp + gen-file lists + custom command/target → cmake/ParamsCodegen.cmake lines 549-821 MFC_SETUP_TARGET function → cmake/MFCTargets.cmake include() order in root CMakeLists.txt: 1. cmake/GPU.cmake (sets FYPP_GCOV_OPTS, NVHPC_USE_TWO_PASS_IPO, MFC_CUDA_CC) 2. cmake/Fypp.cmake (finds FYPP_EXE, defines HANDLE_SOURCES — needs FYPP_GCOV_OPTS) 3. cmake/ParamsCodegen.cmake (sets _mfc_gen_files_* lists — must precede HANDLE_SOURCES calls) 4. HANDLE_SOURCES calls (in root — consume _mfc_gen_files_* and FYPP_EXE) 5. cmake/MFCTargets.cmake (defines MFC_SETUP_TARGET — needs NVHPC_USE_TWO_PASS_IPO) 6. MFC_SETUP_TARGET calls (in root — instantiate targets) 7. docs section (in root — unchanged) Equivalence gate: cmake configure of -DMFC_PRE_PROCESS=ON -DMFC_MPI=OFF -DCMAKE_BUILD_TYPE=Release BEFORE and AFTER, path-normalized diff of CMakeCache.txt, Makefile, CMakeFiles/pre_process.dir/flags.make, and CMakeFiles/pre_process.dir/build.make — all empty (identical). A clean 3-target build rides the next scheduled gate.
…ludes Delete the configure-time execute_process + stamp block from cmake/ParamsCodegen.cmake (29 lines removed). The build-time add_custom_command is now the sole mechanism that writes the 15 generated_*.fpp includes. Cold-start proof (scratch build dir, no pre-existing includes): - cmake configure succeeds with include/ absent - make -n shows cmake_gen.py scheduled before all fypp steps - make mfc_params_gen produces all 15 files across 3 target dirs Incremental proof: touch toolchain/mfc/params/definitions.py -> make -n schedules regeneration without reconfigure. cmake_gen.py already calls path.parent.mkdir(parents=True,exist_ok=True) so no file(MAKE_DIRECTORY) guards were added. find_package(Python3) is kept in ParamsCodegen.cmake (the CMakeLists.txt one is in a docs block that runs later). Full build+test rides the next gate.
PR #1552 ReviewHEAD SHA: 525e0bd SummaryA large structural refactor with clear benefits: CMakeLists.txt is split into four focused Critical — MPI Correctness RegressionDropped
|
| Member | Type | Purpose |
|---|---|---|
K |
real | Herschel-Bulkley consistency index |
nn |
real | power-law exponent |
tau0 |
real | yield stress |
hb_m |
real | Herschel-Bulkley smoothing exponent |
mu_min |
real | lower viscosity bound |
mu_max |
real | upper viscosity bound |
mu_bulk |
real | bulk viscosity |
non_newtonian |
logical | non-Newtonian feature flag |
These members are read directly from the namelist on rank 0 only; they must be broadcast to reach all other MPI ranks. Since fluid_pp is in TYPED_DECLS, _classify_scalar_vars skips it entirely — the only path for these members is through _emit_fluid_pp. None of the three targets emit a broadcast for them.
Effect: In any multi-rank run with non_newtonian = .true., non-zero ranks have non_newtonian = .false. and all Herschel-Bulkley parameters at their defaults. The simulation produces wrong answers with no error or warning.
Fix: Add the missing members to fp_real_members (and emit non_newtonian separately as MPI_LOGICAL):
fp_real_members = ["gamma", "pi_inf", "G", "cv", "qv", "qvp",
"K", "nn", "tau0", "hb_m", "mu_min", "mu_max", "mu_bulk"]Then emit the logical member after the real loop:
lines.append(f" call MPI_BCAST(fluid_pp(i)%non_newtonian, 1, MPI_LOGICAL, 0, MPI_COMM_WORLD, ierr)")The companion unit test in test_fortran_gen.py should be extended to assert these members appear for all three targets.
Minor Observations
Comment formatting in m_constants.fpp
Line 53 reads:
! Reconstruction Types Interface CompressionThis appears to be two section headings merged without a separator, dropping the blank line that formerly separated the "Reconstruction Types" constants from the "Interface Compression" constants. Not a correctness issue, but a blank comment line between them restores the intent.
What Looks Good
- CMakeLists.txt split into
cmake/GPU.cmake,cmake/Fypp.cmake,cmake/MFCTargets.cmake,cmake/ParamsCodegen.cmakeis clean. Build-time codegen via ninja custom commands replaces configure-timeexecute_process, which is the correct pattern. m_global_parameters_common.fppcorrectly guards all case-optimization declarations with#:if not MFC_CASE_OPTIMIZATION, and the stub generated for pre/post confirms no double-declaration risk.- Named constants (
recon_type_weno,recon_type_muscl, etc.) replacing magic literalsWENO_TYPE = 1/MUSCL_TYPE = 2are an unambiguous improvement, properly sourced fromgenerated_constants.fpp. s_assign_common_defaultsand the partial-initialization pattern (common then per-target) correctly replicate the original default assignment order.build.pyslug now includes a human-readable prefix; no correctness impact.- The manual MPI residue kept in the three
m_mpi_proxy.fppfiles (grid dimensions, BC structs,cfl_dt,bc_io, shear/body forces, lag/bub/chem guards) matches what is genuinely not auto-generatable.
Recommendation
Request changes. The dropped fluid_pp non-Newtonian broadcasts are a silent correctness regression for multi-rank non-Newtonian runs. Fix _emit_fluid_pp, extend the unit test to cover the missing members for all three targets, then this is ready to merge.
The hardcoded fluid_pp emitter predated the MFlowCode#1545 merge and silently dropped K/nn/tau0/hb_m/mu_min/mu_max/mu_bulk and non_newtonian from the generated lists - a multi-rank regression for non-Newtonian runs relative to the manual lists it replaced. Emitted sets now verified member-identical to master's manual lists per target. Caught by automated PR review.
|
Confirmed and fixed in e4bb56a: the hardcoded fluid_pp emitter predated the #1545 merge and dropped the Herschel-Bulkley members (K, nn, tau0, hb_m, mu_min, mu_max, mu_bulk, non_newtonian) from the generated broadcast lists. The emitted sets are now verified member-identical to master's manual lists for all three targets, and the test suite pins the HB members so this class of escape fails loudly. The stacked follow-up PR (#1553) additionally converts this emitter to a registry walk, making new fluid_pp members broadcast by construction. Good catch. |
The named-constant retirement deleted the WENO/MUSCL block between two section headings and merged them into one nonsensical comment. Flagged by automated PR review on MFlowCode#1552.
…h dedup docs NIB read num_ib_patches_max (2050000) but the patch_ib namelist array is dimensioned num_ib_patches_max_namelist (54000), so validation accepted indices that overflow the array. Also refreshes three doc passages made stale by the dedup. Both flagged by review.
Code reviewFound 1 issue (beyond the two from the earlier automated review, both already addressed):
MFC/toolchain/mfc/params/definitions.py Lines 33 to 35 in dac3d1f 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Summary
Third installment of the parameter/declaration single-source-of-truth work (#1550, #1551). Two thrusts:
Executable dedup. The three per-executable
m_global_parameters.fppshared 50–90% of their content by copy-paste. A newsrc/common/m_global_parameters_common.fppnow owns the generated declaration includes, the shared state (eqn_idx,sys_size,b_size,tensor_size, shear bookkeeping), the equation-index setup (s_initialize_eqn_idx), parallel-IO init, the finalize core, and the common default assignments. The per-target modules keep only what genuinely differs (QBMM moment encodings, GPU update blocks, post's output state — each leave-behind verified divergent-by-need). The threem_mpi_proxy.fppbroadcast lists are now generated from the parameter registry (generated_bcast.fppper target): a registered namelist parameter is broadcast by construction, with only computed variables (m_glb,cfl_dt,bc_io, …) remaining hand-listed. Netsrc/change: −477 LOC.Build hygiene.
CMakeLists.txt(1,135 lines) splits into a ~460-line root pluscmake/{GPU,Fypp,ParamsCodegen,MFCTargets}.cmake— pure motion, verified by byte-identical configure graphs. The generated includes (now 15: namelist, decls, constants, case-opt, bcast × 3 targets) regenerate via a single ninja-tracked custom command; the configure-time bootstrap and its mtime stamp are deleted (editingparams/*.py+ rebuild now always yields a correct binary, no reconfigure). Build variant directories get readable names:build/install/gpu-acc-chem-f93aa400b9instead ofde835ec46c(hash inputs unchanged — variant identity preserved).Bug fix (deliberate behavior change #1)
post_processnever broadcastchem_wrt_Ydespite consuming it on all ranks ins_save_data— silent wrong/missing species output for multi-rank chemistry post-processing. The registry-driven generator fixes this by construction. (Broadcast tuple-set deltas vs. before, verified per target: simulation identical; pre_process adds only the dead-but-registeredn_start_old; post_process adds onlychem_wrt_Y.)Deliberate behavior change #2
post_process'sbeta_idx(Lagrangian bubble void fraction) is now appended after the MHD/elasticity/surface-tension blocks, matchingsimulation's writer layout — the previous post-only ordering misread restart data forbubbles_lagrangecombined with those features. Covered by the full golden suite.Verification
.f90declaration-set and GPU-directive-set equivalence per target for every hoist step; broadcast tuple-set equivalence per target; configure-graph byte-equivalence for the CMake split; cold-start + incremental-regeneration proofs for the generation rework.Notes for reviewers
src/commoncontinues to compile once per executable withMFC_<TARGET>defines — deliberate and now documented incontributing.md("How the Build Fits Together"); it is what enables per-target generated includes and the shared module pattern../mfc.sh clean.