1454-projection-optimization by danieljvickers · Pull Request #1549 · MFlowCode/MFC

danieljvickers · 2026-06-09T20:41:47Z

Description

In many-particle cases, the limiting factor in the IBM compute is by far the time it takes to project the immersed boundaries onto the grid. This is fundamentally rooted in how we parallelize the work. The current code parallelizes of x, y, and z, but sequentially iterates through the IB patches. In cases where there are many IBs that are small, we are launching several (thousands) of GPU kernels each time step, but each kernel only works on hundreds of grid cells. Adding an option to parallelize over the thousands of particles should be significantly larger parallelism and thus optimize the projection.

In order to maintain the functionality of both parallelism methods, I need a separate set of geometry bounding checks. Since we perform a check in icpp patches and now 2 in IB patches, this is a significant amount of redundant code that must be maintained. To be somewhat forward-looking, I opted to merge all geometry checking into a single module that can be called from both forms of IB parallelism and the icpp pre_processing code. This should clean up the code nicely and significantly reduce code maintenance going forward. Since we can now change cylinder orientation with angles, I also deprecate the unneeded cylinder length checks.

The end result is the creation of a new module in common, the deletion of duplicate code, and a new parallelism path for IB patches when utilizing GPU compute.

Closes #1454, #1532, #1543

Type of change (delete unused ones)

New feature
Refactor

Testing

All tests pass on GNU compiler

Checklist

Check these like this [x] to indicate which of the below applies.

I added or updated tests for new behavior
I updated documentation if user-facing behavior changed

See the developer guide for full coding standards.

GPU changes (expand if you modified src/simulation/)

GPU results match CPU results
Tested on NVIDIA GPU or AMD GPU

AI code reviews

Reviews are not retriggered automatically. To request a review, comment on the PR:

@claude full review — Claude full review (also triggers on PR open/reopen/ready)
Or add label claude-full-review — Claude full review via label

…tch geometries

github-actions · 2026-06-09T20:44:30Z

Claude Code Review

Head SHA: 4f88e03

Files changed:

13
src/simulation/m_ib_patches.fpp
src/common/m_patch_geometries.fpp
src/pre_process/m_icpp_patches.fpp
src/simulation/m_ibm.fpp
src/simulation/m_checker.fpp
src/simulation/m_global_parameters.fpp
src/pre_process/m_global_parameters.fpp
src/post_process/m_global_parameters.fpp
toolchain/mfc/params/definitions.py
toolchain/mfc/params/descriptions.py

Findings

1. Uninitialized `k` writes in 2D section of `s_apply_ib_patches_grid_cell_parallelism` — silent wrong-index bug

In src/simulation/m_ib_patches.fpp, the 2D path (else if (num_dims == 2) then) only loops over j and i; k is never assigned. Three geometry handlers write to ib_markers%sf(i, j, k) with that uninitialized value instead of ib_markers%sf(i, j, 0):

Geometry 2 (circle): if (f_is_inside_cylinder(...)) ib_markers%sf(i, & j, k) = encoded_patch_id
Geometry 3 (rectangle): if (f_is_inside_cuboid(...)) ib_markers%sf(i, j, & k) = encoded_patch_id
Geometry 5 (STL model): ib_markers%sf(i, j, k) = encoded_patch_id

Geometries 4 and 6 in the same block correctly use ib_markers%sf(i, j, 0), confirming the inconsistency is a refactoring bug.

2. Uninitialized `k` writes in all 2D geometries of `s_apply_ib_patches_ib_parallelism` — silent wrong-index bug

All five geometry branches in the 2D section of s_apply_ib_patches_ib_parallelism write ib_markers%sf(i, j, k) where k is undeclared/uninitialized (the 2D loop only iterates over j and i). All five should use index 0 for the third dimension in 2D:

Geometry 2: ib_markers%sf(i, & j, k)
Geometry 3: ib_markers%sf(i, j, & k)
Geometry 4: ib_markers%sf(i, j, k)
Geometry 5: ib_markers%sf(i, j, k)
Geometry 6: ib_markers%sf(i, j, & k)

3. Nested `GPU_PARALLEL_LOOP` and mismatched `END_GPU_PARALLEL_LOOP` in 2D section of `s_apply_ib_patches_ib_parallelism` — GPU build failure

The 2D path emits two GPU_PARALLEL_LOOP macros but only one END_GPU_PARALLEL_LOOP, and the outer one wraps the inner one (nesting is forbidden per common-pitfalls.md):

$:GPU_PARALLEL_LOOP(private='[xp, yp, patch_id, center]', collapse=3)
do xp = xp_lower, xp_upper
    do yp = yp_lower, yp_upper
        $:GPU_PARALLEL_LOOP(private='[xp, yp, i, il, ir, ...]')   ! nested — invalid
        do patch_id = 1, num_ibs
            ...
        end do
        $:END_GPU_PARALLEL_LOOP()   ! closes inner only; outer is never closed
    end do
end do

The outer directive also specifies collapse=3 but only two explicit loops (xp, yp) appear before the nested directive begins. The 3D section of the same subroutine correctly places a single GPU_PARALLEL_LOOP/END_GPU_PARALLEL_LOOP around only the patch_id loop without wrapping the periodicity loops.

4. Undeclared `bounding_box_corner_distance` in `GPU_PARALLEL_LOOP` private clause — compilation error

In s_apply_ib_patches_ib_parallelism the inner GPU_PARALLEL_LOOP private list references bounding_box_corner_distance:

$:GPU_PARALLEL_LOOP(private='[xp, yp, i, il, ir, j, jl, jr, xyz_local, length, radius, &
                    & bounding_box_corner_distance, patch_id, airfoil_id, ...]')

This variable is not declared anywhere in the subroutine and will cause a compilation error.

5. `get_indices_from_bounds` lacks `GPU_ROUTINE` pragma — GPU build failure

get_bounding_indices is decorated with $:GPU_ROUTINE(parallelism='[seq]') and called from inside GPU_PARALLEL_LOOP regions. It calls get_indices_from_bounds:

call get_indices_from_bounds(bbox_min(1), bbox_max(1), x_cc, il, ir)
call get_indices_from_bounds(bbox_min(2), bbox_max(2), y_cc, jl, jr)
if (num_dims == 3) call get_indices_from_bounds(bbox_min(3), bbox_max(3), z_cc, kl, kr)

get_indices_from_bounds has no $:GPU_ROUTINE pragma. In OpenACC and OpenMP offload, every routine called from a device routine must itself be compiled as a device routine. The missing pragma will cause a GPU build failure.

…nolithic subroutine for grid-cell parallelism

…draft of the grid parallelism subroutine. Deprecated cylinder value to only favor length_x being used

…tion of length_y and length_z for cylinders. All tests pass on GNU compilers for full test suite.

danieljvickers · 2026-06-10T16:30:05Z

All AI review comments are now irrelevant because of significant changes that have occurred between now and that review.

I doubt that this will pass the test suite on the first try, but on the off chance that it does we should not yet merge it. A data product of computational optimization performance should be presented before this PR is merged. Otherwise, it is unnecessary refactoring of the code.

…-bounds read

codecov · 2026-06-10T22:04:10Z

Codecov Report

❌ Patch coverage is 62.24900% with 94 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.65%. Comparing base (c7a1f05) to head (f168b40).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
src/simulation/m_ib_patches.fpp	64.21%	56 Missing and 17 partials ⚠️
src/common/m_patch_geometries.fpp	52.77%	9 Missing and 8 partials ⚠️
src/pre_process/m_icpp_patches.fpp	50.00%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1549      +/-   ##
==========================================
- Coverage   60.91%   60.65%   -0.26%     
==========================================
  Files          73       74       +1     
  Lines       20187    20108      -79     
  Branches     2938     2954      +16     
==========================================
- Hits        12296    12197      -99     
- Misses       5896     5903       +7     
- Partials     1995     2008      +13

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Initial modifications to allow us to use a unified boudns check on pa…

f85f983

…tch geometries

danieljvickers added 9 commits June 9, 2026 21:44

Removed all of the old outdated subroutines and merged them into a mo…

4c94454

…nolithic subroutine for grid-cell parallelism

part of the way through adding a secondary loop

6f76374

Unified some of the patch geometry functions. Finished out the first …

ec363df

…draft of the grid parallelism subroutine. Deprecated cylinder value to only favor length_x being used

Passes circle tests

72529c9

All tests pass, except for STL mdoels

ce800e4

All tests pass on GNU compilers

f5dc0d7

Added patch geometries to docs and fixed spelling

3cd4661

Missed some spelling in my git add

9b0b59d

Integrated with icpp patches and update documents to note the depreca…

aeaa21e

…tion of length_y and length_z for cylinders. All tests pass on GNU compilers for full test suite.

danieljvickers marked this pull request as ready for review June 10, 2026 16:19

danieljvickers requested a review from sbryngelson as a code owner June 10, 2026 16:19

Merged with master

4f88e03

danieljvickers added 3 commits June 10, 2026 14:28

Need to hard-code index 0 in ib markers array in 2D to prevent out-of…

8c2ffb2

…-bounds read

Updated a GPU paralellism macro that contained a removed variable

a2a3e8e

Formatting

f168b40

Passes test suite on NVHPC GPU OpenACC Parallelism

dac7c47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1454-projection-optimization #1549

1454-projection-optimization #1549
danieljvickers wants to merge 15 commits into
MFlowCode:masterfrom
danieljvickers:1454-projection-optimization

danieljvickers commented Jun 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

danieljvickers commented Jun 10, 2026

Uh oh!

codecov Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

danieljvickers commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change (delete unused ones)

Testing

Checklist

AI code reviews

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

Findings

1. Uninitialized k writes in 2D section of s_apply_ib_patches_grid_cell_parallelism — silent wrong-index bug

2. Uninitialized k writes in all 2D geometries of s_apply_ib_patches_ib_parallelism — silent wrong-index bug

3. Nested GPU_PARALLEL_LOOP and mismatched END_GPU_PARALLEL_LOOP in 2D section of s_apply_ib_patches_ib_parallelism — GPU build failure

4. Undeclared bounding_box_corner_distance in GPU_PARALLEL_LOOP private clause — compilation error

5. get_indices_from_bounds lacks GPU_ROUTINE pragma — GPU build failure

Uh oh!

danieljvickers commented Jun 10, 2026

Uh oh!

codecov Bot commented Jun 10, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

danieljvickers commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

1. Uninitialized `k` writes in 2D section of `s_apply_ib_patches_grid_cell_parallelism` — silent wrong-index bug

2. Uninitialized `k` writes in all 2D geometries of `s_apply_ib_patches_ib_parallelism` — silent wrong-index bug

3. Nested `GPU_PARALLEL_LOOP` and mismatched `END_GPU_PARALLEL_LOOP` in 2D section of `s_apply_ib_patches_ib_parallelism` — GPU build failure

4. Undeclared `bounding_box_corner_distance` in `GPU_PARALLEL_LOOP` private clause — compilation error

5. `get_indices_from_bounds` lacks `GPU_ROUTINE` pragma — GPU build failure