meta: profiling-driven build performance burndown (Docker Linux, NightDriverStrip, cold vs hot cache)

## Context

fbuild's end-to-end wall clock on a real project has never been measured systematically under a controlled, reproducible environment. We have anecdotal slow spots (toolchain download/extract, library resolution, sequential install steps) but no profile data separating on-CPU work from off-CPU waiting (network, disk, subprocess, lock contention).

Existing pieces this effort builds on:

- `FBUILD_PERF_LOG=1` env-gated phase timing in `crates/fbuild-build/src/perf_log.rs` (from #91) — coarse per-phase wall clock, emitted via `tracing` + stderr.
- `fbuild-packages` already has a parallel download pipeline; unknown how well it overlaps download → extract → install → compile in practice.
- Embedded zccache service (#789) covers compiler-invocation caching; this effort targets everything *around* the compiler.
- The NightDriverStrip benchmark fixture is referenced by the ignored test `build_nightdriverstrip_demo` in `crates/fbuild-build/tests/esp32_build.rs` (expects `tests/NightDriverStrip/`); the checkout is not committed, so the Docker harness must clone it.
- zackees/soldr has the Docker + script reference infrastructure (`docker/cook-shared-cache/Dockerfile`, `perf/`, `PERF.md`) to model the fast-rebuild container on.

## Proposal

Stand up a reproducible Linux Docker profiling harness, measure cold-cache and hot-cache builds of NightDriverStrip, and use the data to drive a burndown of optimization sub-issues until fbuild's non-compiler overhead is as close to fully overlapped/concurrent as possible.

### Phase 0 — Docker profiling harness
- Dockerfile optimized for fast *image* rebuilds (layer-cached toolchain/deps, source `COPY` last, or bind-mount + named volumes per the soldr pattern), based on zackees/soldr's docker + script infrastructure.
- fbuild's own cache (`~/.fbuild/cache`) is **not** persisted between container runs — cold cache means genuinely cold (fresh downloads).
- Clones NightDriverStrip as the benchmark workload (ESP32 `demo` env; optionally `demo_c6` for a RISC-V data point).
- Orchestration script runs: (a) cold-cache build, (b) hot-cache rebuild (same container, cache intact), each N≥3 times, and archives all profiles/logs as artifacts.

### Phase 1 — Instrumentation + profiling
- Event logging on: `FBUILD_PERF_LOG=1` plus extending `perf_log.rs`/tracing spans wherever coverage is missing (download start/end per package, extract, install, per-TU compile dispatch, archive, link, daemon RPC round-trips).
- **On-CPU profiling**: `perf record -g` (or `samply`/`cargo flamegraph`) over daemon + CLI → flamegraphs.
- **Off-CPU (async) profiling**: off-CPU flamegraphs (`perf sched` / eBPF offcputime) and/or `tokio-console`-style async task instrumentation to expose where the pipeline *waits* — network, disk, subprocess, serialized stages, lock contention.

### Phase 2 — Analysis → sub-issues
- Produce a cold-vs-hot phase breakdown table (wall clock per stage, % of total).
- For every stage that is (a) not cached when it should be, (b) serialized when it could overlap with download/install/compile/link, or (c) hot on-CPU in fbuild's own code: file a child sub-issue with the profile evidence attached.

### Phase 3 — Optimization burndown
- Expected themes (to be confirmed by data, not assumed): overlap download ⇄ extract ⇄ install ⇄ first compiles; start compiling TUs whose deps are ready before the full install finishes; overlap archive/link prep with trailing compiles; cache anything recomputed on hot builds (config parse, library selection, header scan); remove sync-in-async stalls (continuing #817).
- **Out of scope: compile settings.** No changes to compiler/linker flags, optimization levels, or codegen — stock settings stay stock. Everything around the compiler is fair game.
- Each optimization ships as its own PR against its sub-issue, with before/after numbers from the Phase 0 harness in the PR description.

### Phase 4 — Verification + close-out
- Re-run the harness after each merged PR; final report of cold and hot wall-clock deltas vs the Phase 1 baseline on this issue.

## Acceptance criteria

- [ ] Docker harness merged (Dockerfile + run script) that produces cold-cache and hot-cache builds of NightDriverStrip with fbuild cache not persisted between docker runs.
- [ ] On-CPU and off-CPU profiles + `FBUILD_PERF_LOG` event timelines captured for both cold and hot runs and attached to this issue.
- [ ] Baseline numbers posted: cold and hot wall clock with per-phase breakdown (median of ≥3 runs).
- [ ] Every identified slow/uncached/serialized path has a child sub-issue linked from a task list on this issue.
- [ ] All child sub-issues resolved via merged PRs, each PR showing before/after harness numbers.
- [ ] Final cold + hot wall-clock comparison vs baseline posted; all sub-issues closed; this issue closed.

## Decisions

- *Benchmark workload:* NightDriverStrip `demo` env (ESP32/Xtensa) — it's the fixture the existing ignored integration test already targets and the heaviest real-world project we've built.
- *Fixture provisioning:* Docker harness clones NightDriverStrip at a pinned commit rather than committing the tree to this repo — keeps the repo lean, keeps runs reproducible.
- *Profiler choice:* `perf` + flamegraphs for on-CPU, off-CPU flamegraphs (perf sched/eBPF) for waits — standard Linux tooling that works in a container; exact tool swap is fine if the harness PR finds something better.
- *Harness location:* `ci/docker-profile/` alongside the existing `ci/docker-*` dirs; scripts in Python via `uv run` per the language policy (CI scripting is the sanctioned Python use).
- *Sub-issue granularity:* one issue per independently-mergeable optimization, all linked from a task list here — matches the meta-issue pattern used by #603.
- *Priority:* P2 — significant DX win, nothing shipping is blocked on it.
- *Compile settings frozen:* stock flags only; performance must come from caching, concurrency, and pipeline overlap.

## Related issues

- #91 — warm-build phase timing (`FBUILD_PERF_LOG`) — this effort extends that instrumentation.
- #817 — sync-code-that-could-be-async audit — Phase 3 continues that thread with profile evidence.
- #789 — embedded zccache service — covers compiler caching; explicitly out of scope here.

## Burndown (Phase 2 task list)

Correctness blockers found by the harness (done):
- [x] #945 — extra_scripts stdout corrupts lite-SCons JSON protocol (merged: #946)
- [x] #947 — escaped-quote defines mangled before gcc (merged: #949)
- [x] CI unbreak: `BoardConfig: Default` + rustdoc link from #941 (merged: #948)

Harness:
- [x] #950 — Docker profiling harness (`ci/docker-profile/`)

Optimization sub-issues (from the Phase-1 baseline):
- [x] #951 — no-change rebuild recompiles sketch + core variant (fixed: 108 s → 2.1 s steady-state, PR #956)
- [ ] #952 — fw-libs recompiled per project (~150 s); core cache never hydrates
- [ ] #953 — overlap download/extract/install in cold resolve (~185–237 s serialized)
- [ ] #954 — manage esptool as a provisioned package (P3)
- [ ] #955 — project-relative board_build.partitions CSV ignored (correctness + blocks #951 fingerprint persistence)
- [ ] #957 — first rebuild after cold recompiles once (signature drift; follow-up to #951)






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

meta: profiling-driven build performance burndown (Docker Linux, NightDriverStrip, cold vs hot cache) #942

Context

Proposal

Phase 0 — Docker profiling harness

Phase 1 — Instrumentation + profiling

Phase 2 — Analysis → sub-issues

Phase 3 — Optimization burndown

Phase 4 — Verification + close-out

Acceptance criteria

Decisions

Related issues

Burndown (Phase 2 task list)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

meta: profiling-driven build performance burndown (Docker Linux, NightDriverStrip, cold vs hot cache) #942

Description

Context

Proposal

Phase 0 — Docker profiling harness

Phase 1 — Instrumentation + profiling

Phase 2 — Analysis → sub-issues

Phase 3 — Optimization burndown

Phase 4 — Verification + close-out

Acceptance criteria

Decisions

Related issues

Burndown (Phase 2 task list)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions