You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fbuild's end-to-end wall clock on a real project has never been measured systematically under a controlled, reproducible environment. We have anecdotal slow spots (toolchain download/extract, library resolution, sequential install steps) but no profile data separating on-CPU work from off-CPU waiting (network, disk, subprocess, lock contention).
The NightDriverStrip benchmark fixture is referenced by the ignored test build_nightdriverstrip_demo in crates/fbuild-build/tests/esp32_build.rs (expects tests/NightDriverStrip/); the checkout is not committed, so the Docker harness must clone it.
zackees/soldr has the Docker + script reference infrastructure (docker/cook-shared-cache/Dockerfile, perf/, PERF.md) to model the fast-rebuild container on.
Proposal
Stand up a reproducible Linux Docker profiling harness, measure cold-cache and hot-cache builds of NightDriverStrip, and use the data to drive a burndown of optimization sub-issues until fbuild's non-compiler overhead is as close to fully overlapped/concurrent as possible.
Phase 0 — Docker profiling harness
Dockerfile optimized for fast image rebuilds (layer-cached toolchain/deps, source COPY last, or bind-mount + named volumes per the soldr pattern), based on zackees/soldr's docker + script infrastructure.
fbuild's own cache (~/.fbuild/cache) is not persisted between container runs — cold cache means genuinely cold (fresh downloads).
Clones NightDriverStrip as the benchmark workload (ESP32 demo env; optionally demo_c6 for a RISC-V data point).
Orchestration script runs: (a) cold-cache build, (b) hot-cache rebuild (same container, cache intact), each N≥3 times, and archives all profiles/logs as artifacts.
Phase 1 — Instrumentation + profiling
Event logging on: FBUILD_PERF_LOG=1 plus extending perf_log.rs/tracing spans wherever coverage is missing (download start/end per package, extract, install, per-TU compile dispatch, archive, link, daemon RPC round-trips).
On-CPU profiling: perf record -g (or samply/cargo flamegraph) over daemon + CLI → flamegraphs.
Off-CPU (async) profiling: off-CPU flamegraphs (perf sched / eBPF offcputime) and/or tokio-console-style async task instrumentation to expose where the pipeline waits — network, disk, subprocess, serialized stages, lock contention.
Phase 2 — Analysis → sub-issues
Produce a cold-vs-hot phase breakdown table (wall clock per stage, % of total).
For every stage that is (a) not cached when it should be, (b) serialized when it could overlap with download/install/compile/link, or (c) hot on-CPU in fbuild's own code: file a child sub-issue with the profile evidence attached.
Phase 3 — Optimization burndown
Expected themes (to be confirmed by data, not assumed): overlap download ⇄ extract ⇄ install ⇄ first compiles; start compiling TUs whose deps are ready before the full install finishes; overlap archive/link prep with trailing compiles; cache anything recomputed on hot builds (config parse, library selection, header scan); remove sync-in-async stalls (continuing audit: sync code that could be async in fbuild-cli + fbuild-python (sub-issue of #813) #817).
Out of scope: compile settings. No changes to compiler/linker flags, optimization levels, or codegen — stock settings stay stock. Everything around the compiler is fair game.
Each optimization ships as its own PR against its sub-issue, with before/after numbers from the Phase 0 harness in the PR description.
Phase 4 — Verification + close-out
Re-run the harness after each merged PR; final report of cold and hot wall-clock deltas vs the Phase 1 baseline on this issue.
Acceptance criteria
Docker harness merged (Dockerfile + run script) that produces cold-cache and hot-cache builds of NightDriverStrip with fbuild cache not persisted between docker runs.
On-CPU and off-CPU profiles + FBUILD_PERF_LOG event timelines captured for both cold and hot runs and attached to this issue.
Baseline numbers posted: cold and hot wall clock with per-phase breakdown (median of ≥3 runs).
Every identified slow/uncached/serialized path has a child sub-issue linked from a task list on this issue.
All child sub-issues resolved via merged PRs, each PR showing before/after harness numbers.
Final cold + hot wall-clock comparison vs baseline posted; all sub-issues closed; this issue closed.
Decisions
Benchmark workload: NightDriverStrip demo env (ESP32/Xtensa) — it's the fixture the existing ignored integration test already targets and the heaviest real-world project we've built.
Fixture provisioning: Docker harness clones NightDriverStrip at a pinned commit rather than committing the tree to this repo — keeps the repo lean, keeps runs reproducible.
Profiler choice:perf + flamegraphs for on-CPU, off-CPU flamegraphs (perf sched/eBPF) for waits — standard Linux tooling that works in a container; exact tool swap is fine if the harness PR finds something better.
Harness location:ci/docker-profile/ alongside the existing ci/docker-* dirs; scripts in Python via uv run per the language policy (CI scripting is the sanctioned Python use).
Context
fbuild's end-to-end wall clock on a real project has never been measured systematically under a controlled, reproducible environment. We have anecdotal slow spots (toolchain download/extract, library resolution, sequential install steps) but no profile data separating on-CPU work from off-CPU waiting (network, disk, subprocess, lock contention).
Existing pieces this effort builds on:
FBUILD_PERF_LOG=1env-gated phase timing incrates/fbuild-build/src/perf_log.rs(from perf(build): investigate warm-pass compilation stall — 30s where cache says <1s #91) — coarse per-phase wall clock, emitted viatracing+ stderr.fbuild-packagesalready has a parallel download pipeline; unknown how well it overlaps download → extract → install → compile in practice.build_nightdriverstrip_demoincrates/fbuild-build/tests/esp32_build.rs(expectstests/NightDriverStrip/); the checkout is not committed, so the Docker harness must clone it.docker/cook-shared-cache/Dockerfile,perf/,PERF.md) to model the fast-rebuild container on.Proposal
Stand up a reproducible Linux Docker profiling harness, measure cold-cache and hot-cache builds of NightDriverStrip, and use the data to drive a burndown of optimization sub-issues until fbuild's non-compiler overhead is as close to fully overlapped/concurrent as possible.
Phase 0 — Docker profiling harness
COPYlast, or bind-mount + named volumes per the soldr pattern), based on zackees/soldr's docker + script infrastructure.~/.fbuild/cache) is not persisted between container runs — cold cache means genuinely cold (fresh downloads).demoenv; optionallydemo_c6for a RISC-V data point).Phase 1 — Instrumentation + profiling
FBUILD_PERF_LOG=1plus extendingperf_log.rs/tracing spans wherever coverage is missing (download start/end per package, extract, install, per-TU compile dispatch, archive, link, daemon RPC round-trips).perf record -g(orsamply/cargo flamegraph) over daemon + CLI → flamegraphs.perf sched/ eBPF offcputime) and/ortokio-console-style async task instrumentation to expose where the pipeline waits — network, disk, subprocess, serialized stages, lock contention.Phase 2 — Analysis → sub-issues
Phase 3 — Optimization burndown
Phase 4 — Verification + close-out
Acceptance criteria
FBUILD_PERF_LOGevent timelines captured for both cold and hot runs and attached to this issue.Decisions
demoenv (ESP32/Xtensa) — it's the fixture the existing ignored integration test already targets and the heaviest real-world project we've built.perf+ flamegraphs for on-CPU, off-CPU flamegraphs (perf sched/eBPF) for waits — standard Linux tooling that works in a container; exact tool swap is fine if the harness PR finds something better.ci/docker-profile/alongside the existingci/docker-*dirs; scripts in Python viauv runper the language policy (CI scripting is the sanctioned Python use).Related issues
FBUILD_PERF_LOG) — this effort extends that instrumentation.Burndown (Phase 2 task list)
Correctness blockers found by the harness (done):
BoardConfig: Default+ rustdoc link from fix(release): restore macOS + arm64-windows lanes on soldr 0.7.98 + setup-soldr v0.9.64 #941 (merged: fix(config): derive Default for BoardConfig (unbreaks workspace check) #948)Harness:
ci/docker-profile/)Optimization sub-issues (from the Phase-1 baseline):