Skip to content

Architecture overhaul: FFI unification, thread safety, fixed-timestep physics, golden tests, LOD, 256 lights#65

Merged
proggeramlug merged 30 commits into
mainfrom
audit/architecture-fixes
Jun 12, 2026
Merged

Architecture overhaul: FFI unification, thread safety, fixed-timestep physics, golden tests, LOD, 256 lights#65
proggeramlug merged 30 commits into
mainfrom
audit/architecture-fixes

Conversation

@proggeramlug

@proggeramlug proggeramlug commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Complete pass over the architecture audit's findings. 19 commits, net −4,000 lines. Every commit is independently green (cargo tests, golden images, FFI validator, line-limit ratchet — all gating in CI).

Structural fixes

  • FFI unification — `define_core_ffi!` generates the ~250 non-physics functions for all platforms (−8,955 duplicated lines). `tools/validate-ffi.js` gates names+arities for all 8 targets incl. web (0 failures, 0 warnings).
  • Thread-safe audio — control/render split over a lock-free SPSC ring; platform callbacks own the renderer exclusively. Kills a real data race + UAF.
  • Fixed-timestep physics — accumulator (clamp 0.25s, 4-step cap), opt-in render interpolation, alpha-returning `step()`; engine dt clamped.
  • Generational handles — stale handles fail lookups instead of aliasing reused slots; plus the 3 real GPU leak paths fixed (unload_texture kept VRAM forever; model cache never evicted and aliased reused handles; card-atlas slots never recycled).
  • StringHeader hardening — compile-time layout assertions, runtime invariant checks (loud diagnostics on Perry ABI drift), UB-free UTF-8 handling.
  • Headless rendering + golden-image harness — surface is now optional; 4 golden scenes gate every renderer change in CI.
  • 2000-line file limit — CI-enforced with a ratcheting grandfather baseline; shaders.rs (5,861) split per-cluster, big chips off renderer/mod.rs and web lib.rs.

Capability / graphics

  • Light caps 4+16 → 8+256 (the audit's top blocker), UBO-based so it lands on every backend incl. WebGL2. Golden: 40-light ring.
  • Hi-Z occlusion culling — 64×64 max-depth grid, async readback, one-frame latency, conservative everywhere; `setOcclusionCulling` kill switch.
  • Depth-sorted transparency — stable back-to-front by view depth (free at submit: `mvp[3][3]`).
  • LOD system — per-node variants by projected screen coverage with hysteresis; raw-geometry + model-based APIs. Golden-verified.
  • API consistency (breaking, docs/migration-0.5.md) — colors 0-255 everywhere, degrees everywhere, `Texture.handle`, coordinate-system docs, examples migrated.
  • Web FFI gap closed — 24 renderer setters were silent no-ops on web.

Live bugs found & fixed along the way

`drawPlane` invisible from above + `drawCube` all faces wound inward (golden harness, day one) · Windows silently no-op'd 24 manifest functions · iOS/tvOS gamepad arity read garbage · tvOS missing 54 functions · `bloom_create_mesh`/spline ribbon misread Perry's f64 arrays · watchOS color-shifting phantom params · manifest/vehicle_create arity bug · Android/iOS `read_file` used the stale 12-byte string header.

Audit corrections (things the audit got wrong)

TAA was already complete (variance-clipped, default-on) · mip generation already existed (normal-map aware) · DoF + motion-blur shaders exist · scene serialization (versioned world format) exists · SceneNode GPU resources were already RAII-freed.

Remaining (tracked, with continuation plans in the task list)

Render-graph migration of `end_frame_with_scene` (plan on task; graph + 2 passes already wired) · offline asset cooking · froxel clustering · streaming music decode · renderer/mod.rs continued splitting.

…s FFI surface

The ~250 non-physics bloom_* functions were hand-copied into six
platform crates (~9k duplicated lines) and drifted constantly: Android
shipped 60 functions behind then patched the gap with silent no-op
stubs, Windows stubbed the whole scene-graph/lighting/picking/post-FX
surface, iOS/tvOS gamepad functions took an extra leading param the
manifest doesn't declare (axis/button reads were off by one register),
and bloom_create_mesh / bloom_gen_mesh_spline_ribbon read *const f32
where Perry passes f64 arrays.

This adds the same cure the physics surface already had
(define_physics_ffi!): one macro in bloom-shared generating all 247
functions, expanded per platform. Platform crates keep only genuinely
platform-specific code (window/event loop, audio backend, clipboard,
dialogs, cursor, locale).

- new bloom_shared::ffi: panic guard at every FFI entry point (log-once
  + safe default instead of unwinding into Perry code), platform-aware
  error logging (logcat on Android), feature_off_warn_once moved here
- new bloom_shared::ffi_core: define_core_ffi! with models3d /
  image-extras gate pairs (warn-once stubs when off — symbols never
  silently vanish or no-op)
- audio::decode_audio: unified extension-dispatch + format-sniff decode
  (was two divergent per-platform behaviors)
- Renderer::set_env_clear_from_hdr_file: HDR decode hoisted from
  per-platform wrappers
- macro expansion compile-checked in shared unit tests via mock hooks
- macos migrated as reference platform: lib.rs 3567→1194 lines, release
  staticlib exports verified identical pre/post (382 symbols, 0 diff)
…dator to CI

Platform migration (linux, windows, android, ios, tvos — macos landed
with the macro):
- each crate's hand-written shared wrappers replaced by one macro
  invocation + a bloom_resolve_asset_path hook (identity on desktop,
  asset-dir resolution on android/ios/tvos)
- Windows gains real implementations for the 12 manifest functions it
  silently stubbed (scene bounds/user-data/water, picking, postfx
  outline, frame callbacks, lighting) and the 12 it didn't export
- iOS gains the 11 missing picking/projection functions; iOS/tvOS
  gamepad functions now match the manifest arity (the extra leading
  gamepad param made every axis/button read garbage)
- tvOS gains its 54 missing scene-graph/lighting/postfx functions
- Android's no-op stubs for post-FX/DRS/screenshot/physical-size become
  real implementations
- bloom_create_mesh / bloom_gen_mesh_spline_ribbon now read f64 arrays
  on every platform (Perry's array ABI; the f32/u32 readers were
  misreading memory)
- pick/projection read-back state moved from per-crate static muts into
  EngineState

watchOS fixes found by the new validator:
- bloom_draw_circle_lines had a phantom 4th thickness param shifting
  every color one slot; bloom_draw_cylinder had cylinder_ex's slices
  param — both now match the manifest
- package.json: bloom_physics_vehicle_create declared 38 params, every
  implementation and the TS declaration have 37
- ffi_stubs.rs regenerated; postfx.rs module docs updated to describe
  the shipped SCNTechnique path (the surviving piece of #49)

tools/validate-ffi.js: cross-checks manifest names + arities against
the macros and every platform crate (including duplicate-symbol
detection); wired into CI as a gating ffi-parity job.
- tools/check-file-lines.js: new files must stay <= 2000 lines;
  pre-existing offenders are grandfathered in file-lines-baseline.json
  and may only shrink. Generated/vendored code excluded (web/pkg,
  third_party, metal-patched fork, generated watchOS stubs).
- ffi_core.rs (2946 lines) split into ffi_core/<subsystem>.rs section
  macros composed by define_core_ffi! — nested $crate macro invocations
  keep call-site hygiene so the platform hooks still resolve. Largest
  section is 646 lines. Shared tests green, validator green, macOS
  release staticlib exports still byte-identical to the pre-macro
  baseline.
- remaining giants tracked for splitting: renderer/mod.rs (13956),
  renderer/shaders.rs (5861), web/src/lib.rs (2747),
  material_system.rs (2118), bloom-reference main.rs (2330)
The 0.5.18 layout change corrupted every FFI string silently (4-byte
garbage prefix + truncation) because the engine mirrors Perry's header
by hand with raw offset arithmetic and had no way to notice drift.
There's still no Perry-exported ABI version to handshake against, so:

- compile-time size + per-field offset assertions pin the local mirror
  to the documented layout
- every incoming header is invariant-checked (byte_len <= capacity,
  utf16_len <= byte_len, no unknown flags) — a future Perry layout
  shift now produces a loud log-once diagnostic + empty strings on the
  first string received, instead of silent memory garbage
- from_utf8_unchecked replaced with checked conversion: a wrong
  byte_len can no longer cause UB
- alloc_perry_string writes through the typed struct instead of raw
  ptr.add(N) offsets
- round-trip + rejection unit tests

(The Android/iOS bloom_read_file copies that still hand-rolled the
12-byte 0.4.x header were already replaced by the shared macro's
alloc_perry_string path in the FFI unification.)
…ne delta time

Variable-dt physics was the audit's top correctness finding: engine.rs
passed unclamped wall-clock deltas straight into Jolt, so one hitch
frame (or a backgrounded browser tab) fed dt=0.1+ into the solver —
tunneling, constraint explosions, non-determinism.

Native (physics_jolt.rs):
- per-world WorldStepState: accumulator over a fixed dt (default 1/60,
  setFixedTimestep), frame contribution clamped to 0.25s, catch-up
  capped at 4 steps/frame with backlog dropped — slows down instead of
  spiraling
- step_fixed returns alpha (remainder/fixed_dt); contact drain happens
  once per batch (the shim queue accumulates across sub-steps)
- opt-in render interpolation (set_interpolation): snapshot before the
  LAST step of a batch, position getter lerps, rotation getter nlerps
  with hemisphere correction; physics queries always see raw state
- state cleaned up on destroy_world

Web (jolt_bridge.js + new physics_ffi.rs): same semantics implemented
against JoltPhysics.js; web lib.rs physics surface split into
physics_ffi.rs (first slice of the >2000-line file ratchet).

Engine: delta_time clamped to MAX_DELTA_TIME=0.25 in begin_frame.

TS API: physics.step(world, dt) is now fixed-timestep by default and
returns alpha; stepVariable() is the explicit exact-dt opt-out;
setFixedTimestep / setInterpolation / getStepAlpha added. docs/physics.md
gains a Stepping section.

Tests: accumulator-vs-manual-stepping trajectory equivalence, hitch
clamp + backlog drop, interpolation blending (82 shared tests green).
…the audio data race

The platform audio callbacks (CoreAudio render thread on Apple,
dedicated ALSA/WASAPI/AAudio threads elsewhere) called
engine().audio.mix_output() through the engine static while the main
thread mutated the same mixer via play/stop/volume FFI calls: a data
race on the voice list, and a use-after-free whenever load_sound
reallocated the registry mid-mix.

audio.rs becomes audio/ (mod, render, spsc, decode):
- AudioMixer (control half, main thread): registries hold
  Arc<SoundData>, every mutation becomes a Cmd on a fixed-capacity
  lock-free SPSC ring (hand-rolled, ~100 lines, no new deps; producer
  drops + logs on full ring rather than blocking)
- AudioRenderer (render half): exclusively owned by the platform audio
  thread; drains commands at the top of mix(), never locks, never
  allocates on the hot path; the mixing DSP is ported verbatim
- render→control feedback (is_music_playing, position) via per-track
  atomics; unload during playback is graceful (voices keep their Arc)
- web stays single-threaded: ScriptProcessorNode fires on the JS main
  thread, so the mixer mixes inline through the same command path
- the music-handle scan-0..100 hack in the old mix loop is gone

Platforms: macOS/iOS/tvOS hand the renderer to the CoreAudio callback
at init; Linux/Windows/Android move it into the audio thread/oboe
callback at spawn. tvOS CoreAudio glue split into audio_backend.rs
(file hit the 2000-line ratchet).

Tests: SPSC ordering + cross-thread stream + drop-drain; inline mix;
music flag round-trip; renderer handoff across threads; graceful
unload-mid-playback. 89 shared tests green.
The spawn-site edit in the audio-split commit silently failed to match
(caught by CI build-windows; local cross-checks can't compile the C
deps for this target).
handles.rs — handles now encode (generation << 32 | slot+1) in the f64:
freeing a slot bumps its generation, so a stale handle held by game
code fails lookup instead of silently resolving to whatever object
reused the slot. Generation-0 handles are bit-identical to the old
plain integers, so nothing changes for live handles or the FFI. All
registries (textures, models, sounds, music, scene nodes, physics
worlds/bodies/shapes/constraints) inherit the protection.

Renderer::unload_texture — previously only zeroed the size entry and
kept the wgpu texture + bind group alive forever ('bind group remains
but won't be referenced'). Now swaps in a 1×1 white placeholder so the
real texture drops and VRAM is released; retired slots are never
reused, so stale material indices render white rather than aliasing a
later texture.

bloom_unload_model — now evicts renderer.model_gpu_cache (keyed by
handle bits). Without eviction the cached GPU meshes leaked, and a
model whose handle reused the slot rendered the previous model's
geometry.

SceneGraph — destroyed nodes return their 6-slot mesh-card block to a
free list; create/destroy cycles no longer exhaust the fixed-size card
atlas. (Node-owned GPU buffers/BLAS/SDF were already released by RAII
when the node dropped.)

94 shared tests green incl. new stale-handle/no-alias/double-free
registry tests.
The unload_texture fix grew renderer/mod.rs past its ratcheted
baseline; the correct response is shrinking, not baseline-bumping.
register_texture* / update / unload / filter / evict_model_cache move
to renderer/texture_store.rs (341 lines); mod.rs ratchets down
13956 -> 13660.
Translucent draws rendered in submission order — alpha compositing was
only correct if the game happened to submit far-to-near. Each
MaterialDrawCommand now carries its pivot's clip-space w (free at
submit time: mvp[3][3]) and the bucket is stable-sorted back-to-front
before the translucent pass, so equal-depth draws keep submission
order and hand-ordered games render as before. Instanced draws sort as
a group by their fallback-transform pivot.

File policy fallout, resolved by shrinking: material_system's GPU test
module moved to material_system_tests.rs (file now under 2000 —
removed from the grandfather baseline); create_render_texture /
get_texture_ref joined the rest of texture storage in texture_store.rs
(mod.rs 13660 -> 13639).
Audit follow-up: contrary to the audit's 'no mipmaps' finding, every
game-texture path already routes through register_texture_kind's CPU
mip generation (normal-map-aware vector averaging with variance baked
to alpha) and the default sampler is trilinear. The one real gap is
the Android hard-disable, which needs on-device re-verification
against current wgpu before it can be lifted.
The Hi-Z pyramid existed for SSAO/SSR but was never used for culling —
every frustum-surviving node rendered even when fully hidden. The
existing chain is min-reduced (what ray marching wants); occlusion
needs the opposite bound, so a new 64x64 max-depth grid is reduced
from Hi-Z mip 0 each frame and read back asynchronously (one-frame
latency, zero GPU stalls; 16KB/frame, row-aligned exactly to 256B).

scene.prepare tests each frustum-surviving node's world AABB against
last frame's grid using last frame's view-projection. Every uncertain
case resolves to visible: no grid yet, corner behind the captured near
plane, footprint off the captured screen, depth within a 2%+0.1
margin, in-flight readback. The flag only gates the main camera pass —
shadows, picking, and TLAS never read it. Screenshot capture passes
None (a one-shot capture must render everything).

bloom_set_occlusion_culling(0/1) is the kill switch (manifest + macro
+ TS setOcclusionCulling + watchOS stubs); default on.

File policy: Hi-Z chain recording moved to renderer/hiz.rs; mod.rs
ratchets 13633 -> 13598. 96 shared tests green incl. a GPU-backed
occluded/visible/near-plane/disabled test matrix.
…tches two live winding bugs

Renderer.surface is now Option<Surface>: headless mode renders into a
persistent offscreen texture behind the new FrameTarget abstraction
(swapchain frame | headless target), with Renderer::new_headless as
the entry point. The screenshot path works unchanged on top, which is
what the harness captures.

tests/golden_render.rs renders reference scenes through the real
pipeline (2D shape batch; lit 3D primitives with directional + point
light, plane, shadows) and compares against checked-in PNGs: mean
per-channel diff <= 2.0/255 and <=1% outlier pixels, BGRA-aware,
failure writes an .actual.png next to the golden, regenerate with
BLOOM_UPDATE_GOLDEN=1. Skips without a GPU adapter; CI's macos-14
shared-tests job has Metal, so this now gates renderer changes —
built deliberately before the clustered-lighting and render-graph
work it exists to protect.

Bugs the harness caught on its first day:
- draw_plane wound its quad inward: back-face culled from every
  camera above it — ground planes have been invisible since the
  primitive shipped (only visible from underneath)
- draw_cube wound all six faces inward: with back-face culling you
  saw each cube's interior, so silhouettes looked right but normals
  lit the wrong side and faces vanished at grazing angles

File policy: 2D shape drawing + FrameTarget moved to renderer/
draw2d.rs; mod.rs 13598→13602 net after adding headless support
(ratcheted).
@proggeramlug proggeramlug force-pushed the audit/architecture-fixes branch from 5c785ec to 8c35cf9 Compare June 12, 2026 09:30
…cs blocker)

Every layer that mirrored the fixed arrays moves in lockstep:
LightingUniforms (Rust consts, now the single source), both legacy 3D
WGSL shaders, the material ABI's PerView block (ABI-VERSION 2 → 3 per
the documented protocol, EXPECTED_ABI_VERSION bumped), and the
material-system Rust mirror (population uses array::from_fn, so the
new capacity flows through without copy-length edits).

Deliberately a uniform buffer, not storage: 256×32B + 8×32B < 9KB fits
WebGL2's 16KB minimum UBO, so the cap raise lands on every backend with
zero shader permutations. Shaders loop only over the live count —
small scenes pay nothing. Froxel clustering (per-pixel cost for
genuinely large counts) is the follow-up optimization, now safe to
build against the golden harness.

New golden: 40 point lights in a hue ring over a floor — under the old
cap 60% of the ring goes dark, far past tolerance. Existing goldens
pass pixel-identical (capacity-only change).
…ywhere, Texture.handle

Three silent inconsistencies that forced users to read engine source
(audit Tier-3), fixed as one coordinated breaking pass with all
engine examples migrated and docs/migration-0.5.md as the guide:

- setSceneNodeColor / setOutlineColor / setSceneNodeWaterMaterial took
  0-1 floats — the only color params in the API that did; passing a
  Colors preset silently rendered white. Now 0-255 like every draw*
  call (wrappers divide before the FFI; native stays 0-1). Light
  colors deliberately stay 0-1 float + intensity (radiometric, like
  Unity/Unreal) and are documented as such. World-format tints stay
  0-1 (versioned serialized data); the loader converts.
- drawModelRotated took radians while Camera2D.rotation was degrees;
  now degrees everywhere user-facing (raylib convention).
- Texture.id -> Texture.handle (was the lone outlier among resource
  types).

Plus documentation that existed nowhere user-facing: coordinate system
+ SI units at the top of the physics and scene modules, immediate-vs-
retained-mode guidance, loadTexture failure semantics, light-color
convention, @internal markers on the *Raw compiler-workaround variants.

All examples + src/index.ts pass perry check.
… coverage

SceneNodes can now carry reduced-detail geometry variants (LodLevel:
own vertex/index buffers, screen-coverage threshold). prepare() picks
the level each frame from the projected NDC extent of the node's world
AABB — with 10% hysteresis against boundary flicker, falling back to
the base geometry for near-plane cases — and render() binds the active
variant. Shadows, picking, BLAS, and SDF deliberately keep the base
geometry (LODs only affect the camera pass).

API: setSceneNodeLod(node, lodIndex, vertices, indices, maxCoverage)
for raw geometry and attachModelLodToNode(...) for model meshes;
bloom_scene_set_lod + bloom_scene_attach_model_lod in the manifest and
the shared FFI macro (watchOS stubs regenerated; validator green).

New golden: two cubes with a green LOD variant under red base
geometry — the near one renders red (base), the far one green (LOD),
locking the selection logic into CI.
…eb coverage

The web crate silently lacked 24 manifest functions — every post-FX /
DRS / exposure / SSGI / TAA / DoF setter plus the new occlusion-culling
toggle was a no-op for web games (the JS glue mapped nothing for them).
All are now real wasm_bindgen exports over the same shared engine
methods natives use; attachModelLodToNode works on web too.

tools/validate-ffi.js: the web check graduates from a count-only
warning to per-function FAILures, with a 4-entry documented allowlist
for the structurally different cases (pointer-taking geometry needs
the Perry-WASM linear-memory bridge; filesystem captures need _bytes
designs). Validator output is now 0 failures / 0 warnings across all
8 targets.
…only dead code

- web lib.rs ratchets to 2151 (the settings block from the FFI-gap fix
  moves to its own module; fixes the ffi-parity CI failure on the
  previous commit, which tripped the line ratchet)
- encode_png_simple cfg'd to native (feeds the file-writing screenshot
  path; was a permanent warning on every wasm build)
…dules

shaders/ now holds core (2D/3D/scene), env (prefilter/sky/aerial/LUT),
ao (Hi-Z + GTAO), gi (cards/SDF/WSRC), ssgi (probes + SSR), post
(bloom/DoF/motion-blur/SSS/TAA/exposure/composite/upscale/RCAS) — all
re-exported through shaders/mod.rs so call sites are untouched. Every
file is under the 2000-line limit; shaders.rs leaves the grandfather
baseline.

(Audit-record correction along the way: DoF and motion-blur shaders
exist and are wired — two more 'missing post-effects' findings from
the original audit that were wrong.)
@proggeramlug proggeramlug changed the title Architecture: FFI unification + audit fixes (in progress) Architecture overhaul: FFI unification, thread safety, fixed-timestep physics, golden tests, LOD, 256 lights Jun 12, 2026
…into RAM

A 5-minute 44.1kHz stereo track was held as ~57MB of f32 PCM; it's now
~5MB of compressed bytes (shared Arc) plus ~1.5s of decoded ring.

audio/stream.rs: per-playback decode worker (lewton/minimp3 are both
incremental) feeding 32k-sample chunks over the existing lock-free SPSC
ring with sleep-based backpressure; looping restarts the decoder inside
the worker so loop seams can't underrun; a stop flag (set by
StreamConsumer::drop — i.e. stop/replace/teardown) kills the worker.

The render-side MusicVoice grows a Stream source alongside Full:
chunk-pulling mix loop, silence on underrun (resumes next callback),
shared playing/position atomics maintained as before. WAV stays
full-decode (already PCM); wasm32 falls back to full decode (no
threads) — both via the same load_music_bytes entry the FFI loader now
calls.

Non-goal kept explicit: no resampling yet — non-44.1kHz tracks play at
the same wrong pitch they did before (tracked).

97 shared tests green incl. a stop-flag/backpressure test.
…ding

tools/bloom-cook (new crate): PNG/JPEG/BMP/TGA -> BC7 DDS with a full
precomputed mip chain (image_dds encode; --normal/--linear for
non-color data; texture-dir batch mode). The encoder dependency lives
only in the tool — the engine runtime stays decode-only.

Runtime: loadTexture() magic-sniffs DDS and takes the cooked path —
direct compressed mip-chain upload on adapters with BC support (4x
less VRAM than RGBA8, no decode or runtime mip generation), CPU decode
to RGBA elsewhere, so one cooked artifact ships everywhere. All seven
platform device creations + web conditionally request
TEXTURE_COMPRESSION_BC. BC7 binds non-sRGB to match the engine's
Rgba8Unorm texture convention (the first test run caught a double-sRGB
wash; the unload placeholder had the same latent bug — both fixed).
DDS rides the image-extras feature (a beyond-PNG codec, exactly that
feature's charter).

Tests: cooked-vs-raw A/B render of a quadrant fixture must agree
within BC7 loss (max channel diff <=16). Golden tests now skip
software adapters (WARP crashed with ACCESS_VIOLATION in the
surface-less path on Windows CI; real-GPU coverage is macos-14).

File policy: web input FFI split to input_ffi.rs — web lib.rs is under
2000 and leaves the grandfather baseline (now only renderer/mod.rs and
the bloom-reference tool remain on it).
main.rs (2330) was one of the last two files on the 2000-line
grandfather baseline; the ray/BVH/camera/RNG/BRDF/environment/lights/
integrator core moves to tracer.rs (1252) leaving main.rs at 1088
(spec/scene/glTF loading, tone mapping, render loop, CLI). Tool builds
clean; baseline now holds only renderer/mod.rs.
…sion capture

First increment of the Phase 2b port (the plan lives on the task):
the Hi-Z linearize/downsample chain and the occlusion-grid capture now
run as graph PassNodes with declared reads/writes (SceneDepth →
Transient(HIZ_PYRAMID) → occlusion), scheduled by the topological
sort instead of hand-ordered.

This cluster establishes the ctx-owns-renderer pattern the rest of the
migration uses: the pass context carries &mut Renderer + encoder +
profiler, so node closures borrow nothing at build time — unlike the
older material-pass nodes that capture individual field refs and can't
call &mut self methods. Goldens pass pixel-identical, proving the
scheduler reproduces the existing order.

File policy: SSAO bilateral blur extracted to record_ssao_blur in
hiz.rs; mod.rs ratchets 13602 → 13651−... net to current size after
both changes.
…ss.rs

Render-graph migration cluster 2 prep: both SSR passes become
self-contained Renderer methods (the march keeps its internal
profiler spans; the denoiser needs only the encoder), trimmed to the
parameters their bodies actually use. Goldens pixel-identical.
mod.rs ratchets 13591 -> 13463.
Render-graph migration cluster 3 prep: the 360-line Lumen-style
screen-probe block (place / trace SW-HiZ-HW-SDF / temporal EMA /
octahedral resolve, plus the disabled-clear) becomes
record_ssgi_passes. One borrow reorder: ssr_composite_view now binds
after the SSGI call (compose reads it later either way). Goldens
pixel-identical. mod.rs ratchets 13464 -> 13110.
Render-graph migration cluster 4 prep: cascade fitting, the
ticket-004 cache-hit skip, and the per-cascade depth renders become
record_shadow_pass (199 lines out of end_frame_with_scene). Goldens —
including the shadowed lit_primitives_3d scene — pixel-identical.
mod.rs ratchets 13108 -> 12914.
Render-graph migration cluster 5 prep: the Karis-thresholded
downsample + additive-upsample chain becomes record_bloom_chain; the
SSR/SSGI composite-input view bindings move below the last &mut-self
pass recording (compose reads them later either way). Goldens
pixel-identical. mod.rs ratchets 12913 -> 12787.
Cluster 6 prep: the compose pass (HDR + SSR + SSGI*albedo + bloom +
fog + sun shafts -> composed_rt) becomes record_scene_compose, with
its composite-input view selection internal — the cross-pass local
bindings in end_frame_with_scene are gone. Goldens pixel-identical.
mod.rs ratchets 12791 -> 12686.
Cluster 7 prep: upscale/TAA/DoF/motion-blur/SSS/CAS move into
record_postfx_tail — the pre_*_view cascade locals become method
internals — and a new composite_source_view(&self) helper encodes the
last-enabled-stage-wins chain once for the composite pass (it must
mirror the cascade; the doc says so explicitly).

Verification note: the goldens run TAA-off, so they pin the
hdr/upscale branch of the cascade; the other branches are compile-
verified and logic-mirrored. A TAA-on golden is a worthwhile future
addition. mod.rs ratchets 12681 -> 12356.
Cluster 8 (the big one): sky-view LUT refresh, sky + immediate-mode 3D
batch + scene-graph render into the HDR MRT set, planar reflections,
and the opaque material pass on the inner render graph all move into
record_hdr_scene_pass (266 lines). has_3d rebinds inside the method
(same predicate as the upload step just before the call; vertices_3d
untouched between); encoder reborrows feed the inner FrameCtx. Goldens
pixel-identical. mod.rs ratchets 12359 -> 12100.
@proggeramlug proggeramlug marked this pull request as ready for review June 12, 2026 11:11
@proggeramlug proggeramlug merged commit 96e09ff into main Jun 12, 2026
9 checks passed
@proggeramlug proggeramlug deleted the audit/architecture-fixes branch June 12, 2026 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant