feat(alloc): support wait-free allocator by ytakano · Pull Request #696 · tier4/awkernel

ytakano · 2026-06-14T19:13:45Z

This PR introduces wf_alloc as a feature-selectable heap allocator backend for awkernel_lib, while keeping TLSF as the fallback path on architectures where wf_alloc is not enabled/supported. It also updates heap allocator documentation to match current behavior.

Description

Make heap backend selection explicit by compile-time feature:
- heap-wf-alloc selects wf_alloc.
- otherwise, TLSF is used.
Keep allocator behavior under Talloc policy (primary/backup) unchanged:
- user-space path still uses primary-first behavior.
- kernel path uses primary-then-backup behavior.
- deallocation routing remains range-based.
Add wf_alloc plumbing for supported targets (x86_64, aarch64):
- add optional dependency and feature gating in awkernel_lib.
- add compile-time guard that rejects heap-wf-alloc on unsupported architectures.
- implement backend switch via type Allocator = ... alias.
Update architecture heap initialization to pass active CPU count into allocator setup:
- x86_64 and aarch64 now initialize primary/backup heaps via init_*_with_num_cpu(...).
Default feature wiring for kernel is aligned with this backend selection (heap-wf-alloc on x86 by default, TLSF remains for non-CAS2 paths).
Update awkernel_lib/src/heap.rs doc comments:
- remove stale claim that TLSF is default.
- document wf_alloc initialization flow, CPU-token mapping, interrupt-guarded alloc/dealloc, and fallback to existing OOM path when allocator init fails.
Initialize AcpiTable before heap memory initialization for x86_64.

How was this PR tested?

Tested on Qemu and physical machines

Notes for reviewers

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

Copilot

Pull request overview

This PR adds a feature-selectable heap allocator backend to awkernel_lib, enabling wf_alloc on supported architectures while preserving the existing TLSF-based path as the fallback.

Changes:

Introduces heap-wf-alloc / heap-tlsf feature wiring to explicitly select the heap backend (kernel defaults updated per-arch).
Plumbs CPU-count-aware heap initialization (init_*_with_num_cpu) into x86_64 and aarch64 kernel bring-up.
Refactors awkernel_lib::heap to select backend via a type Allocator = ... alias, adds wf_alloc initialization logic, and updates heap module documentation.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
kernel/src/arch/x86_64/kernel_main.rs	Detects CPU count earlier and passes `num_cpu` into backup/primary heap initialization.
kernel/src/arch/aarch64/kernel_main.rs	Passes the detected CPU count into primary/backup heap initialization.
kernel/Cargo.toml	Adds kernel-level feature flags to select allocator backend and sets per-target defaults.
awkernel_lib/src/heap.rs	Implements backend selection, adds `wf_alloc` backend, adds `*_with_num_cpu` init APIs, and updates docs.
awkernel_lib/Cargo.toml	Adds optional `wf_alloc` dependency (target-gated) and exposes `heap-wf-alloc` / `heap-tlsf` features.

Comments suppressed due to low confidence (1)

awkernel_lib/src/heap.rs:42

The documentation states that only 32/64 CPUs are supported, but the implementation uses NUM_MAX_CPU (currently 512) for the CPU flag bitmap and for wf_alloc's active_threads upper bound. This limitation note looks stale and can mislead callers configuring num_cpu.

//! # Limitation
//!
//! Only 32 or 64 CPUs are supported for 32 or 64 bits CPU architectures.
//!

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

atsushi421

The mdbook chapter at mdbook/src/internal/memory_allocator.md still describes the old TLSF-only design (Allocator/BackUpAllocator structs, init_primary/init_backup only). It should be updated to cover the new HeapBackend trait, backend selection by feature, and the *_with_num_cpu APIs (and explain why an explicit num_cpu is needed for wf_alloc).

The new ~200 lines of layout math in WfAllocBackend::init (alignment, overflow guards, metadata sizing, active_threads validation) have no unit tests. align_up alone is a pure helper that is trivially testable. Consider adding #[cfg(test)] mod tests that at least covers align_up overflow/normal cases and the rejection paths of init (too small heap, active_threads == 0 or > NUM_MAX_CPU, misaligned heap_start).

is_primary_mem (around line 384 in awkernel_lib/src/heap.rs) reads primary_start and primary_size with Relaxed, and the matching stores in init_*_with_num_cpu are also Relaxed. Another CPU can observe "new primary_start, old primary_size = 0" between the two stores; in that window end == start, the range check is always false, and primary-owned deallocations are routed to the backup, which would treat them as foreign and corrupt its free list. Use Release for the size store (last) and Acquire for the load, or document that init must happen-before any allocation.

Address review comments on heap.rs: - Reword the module doc so the wf_alloc init-failure path is accurate: alloc returns null and Talloc panics (primary) or halts via delay::wait_forever() (non-primary); there is no OOM-handler fallback. - Document that init_primary/init_backup read cpu::num_cpu() internally, and the hazard that a 0 count makes WfAllocBackend::init bail out. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

When cpu_id >= active_threads(), dealloc previously returned silently, permanently leaking the block. Print cpu_id and active_threads via the allocation-free unsafe_puts/unsafe_print_hex_u64 and halt in delay::wait_forever(). We avoid panic! here because the panic handler allocates, which would re-enter this same out-of-range guard on this CPU. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

Address review comment: init is `unsafe fn(&self, ...)` on a Sync type, so the initialized fast-path is not mutual exclusion and concurrent init is UB. Document the # Safety contract that callers must serialize init (kernel boot does so on a single thread before APs start). Also move the unsafe_print_hex_u64 import into the wf_alloc_backend module to avoid an unused-import warning on backends that exclude it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ntly HeapBackend::init now returns Result<(), &'static str>. Each WfAllocBackend init failure path (alignment/overflow guards, metadata sizing, active_threads bounds, from_metadata_region) returns a descriptive reason instead of a silent return, so a half-initialized heap can no longer go unnoticed until the first allocation fails. Talloc::init_{primary,backup}_with_num_cpu now print the reason via unsafe_puts and halt in delay::wait_forever() on failure. We print and halt rather than panic! because the panic handler allocates, which cannot work while the heap is not yet up. TlsfBackend::init returns Ok(()). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The cfg gating silently selects WfAllocBackend when both features are on, ignoring heap-tlsf. Add a compile_error! so this feature-flag mistake (e.g. from an unrelated downstream crate) fails loudly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The offset was read twice (early in kernel_main to call create_acpi, and again in kernel_main2 as "step 7"), and the early read was unlabeled, making the boot step list misleading. Fetch it once in kernel_main (now labeled step 4, before ACPI which needs it) and pass it into kernel_main2, removing the duplicate read. Renumber the step list accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The backup heap (early) and primary heap (late) were sized from two different CPU populations: detect_num_cpus counted MADT entries by the enabled bit, while non_primary_cpus additionally filters by WaitingForSipi and the xAPIC < 255 limit. The early count also degraded to 1 when the MADT was absent, masking the real ACPI fault. - Rename detect_num_cpus to count_usable_cpus and document that the enabled-bit MADT count is a deliberate, allocation-free upper bound of non_primary_cpus + 1 (non_primary only ever applies additional restrictive filters). It must stay allocation-free because it runs before any heap is initialized -- going through acpi.platform_info() allocates and would OOM here. - Make a missing MADT fatal instead of unwrap_or(1). - Add a boot-time check in kernel_main2 that non_primary_cpus + 1 does not exceed the backup heap's count, so any divergence fails loudly at boot rather than as an opaque allocation failure on a high cpu_id later. Verified: boots to the shell under QEMU with -smp 16 (15 APs + primary = 16, matching the backup heap count); no inconsistency or OOM. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The review noted the wf_alloc layout math had no unit tests, and that align_up is a pure, trivially testable helper. It lived inside the wf_alloc_backend module, which (like the whole heap module) is gated to no_std builds with the heap-wf-alloc feature, so the host `--features std` test build never compiled it. Extract align_up into a new always-declared `heap_util` module (its body gated to the wf backend or cfg(test)) so the std test harness compiles and runs it, mirroring how local_heap's tests run on the host. Add tests covering already-aligned, round-up, alignment-of-one, large power-of-two, and overflow (None) cases. The init rejection paths are not unit-tested: they depend on wf_alloc types and would require enabling heap-wf-alloc (and its global allocator) in the host test build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ytakano · 2026-06-18T02:53:41Z

The new ~200 lines of layout math in WfAllocBackend::init (...) have no unit tests. align_up alone is a pure helper that is trivially testable. Consider adding #[cfg(test)] mod tests that at least covers align_up overflow/normal cases and the rejection paths of init ...

Added in da52eab. While doing this I hit a structural snag worth noting: the heap module is #[cfg(not(feature = "std"))] (it installs the #[global_allocator]), and the host test build runs with --features std, so a #[cfg(test)] mod tests placed inside heap / wf_alloc_backend is never compiled by make test.

So I extracted align_up into a new always-declared heap_util module (its body gated to the wf backend or cfg(test)), which the std test harness does compile and run — mirroring how local_heap's tests run on the host. The tests cover align_up's already-aligned, round-up, alignment-of-one, large power-of-two, and overflow (None) cases (make test: 4 new tests, all passing).

I did not unit-test the init rejection paths: they depend on wf_alloc types (WfAlloc::metadata_region_align/size, from_metadata_region), so testing them on the host would require enabling heap-wf-alloc for the whole test binary, which also swaps in an uninitialized WfAllocBackend as the global allocator and breaks every other test. If you'd like that coverage, I think the right move is a dedicated wf-enabled test target rather than changing the default one — happy to do that as a follow-up if you want.

…ange Document on init_{primary,backup}_with_num_cpu and is_primary_mem that the Relaxed stores/loads of primary_start/primary_size are sound because heap init must happen-before the allocator is shared with any other CPU (the same single-init contract as HeapBackend::init). No other CPU can observe the window between the two stores, so a primary-owned deallocation is never misrouted to the backup allocator. The kernel establishes this happens-before edge at boot: - x86_64: the BSP's BSP_READY release store, paired with each AP's acquire fence before its first allocation. - aarch64: the BSP's PRIMARY_INITIALIZED SeqCst store, which each AP loads (SeqCst) before proceeding. Two separate atomics cannot be updated atomically by stronger orderings anyway, so it is the init-before-sharing contract, not per-atomic ordering, that provides safety; hence documenting rather than switching to Release/Acquire. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ytakano · 2026-06-18T03:00:28Z

is_primary_mem (around line 384 in awkernel_lib/src/heap.rs) reads primary_start and primary_size with Relaxed (...) Another CPU can observe "new primary_start, old primary_size = 0" between the two stores (...) Use Release for the size store (last) and Acquire for the load, or document that init must happen-before any allocation.

Took the "document that init must happen-before any allocation" option, in 471c5cd — and it turns out that's the technically accurate fix here rather than just the convenient one.

The window you describe ("new primary_start, old primary_size == 0") only exists while the BSP runs heap init single-threaded; the APs never read primary_start / primary_size until after init has fully completed, because the boot sequence puts a happens-before edge between init and any AP allocation:

x86_64: the BSP does all heap init while the APs are parked, then publishes BSP_READY with a release store; each AP observes it and runs an fence(Acquire) before its first allocation (use_primary_then_backup etc.).
aarch64: the BSP publishes PRIMARY_INITIALIZED with a SeqCst store after heap init, and each AP spins on a SeqCst load of it before proceeding.

So no other CPU ever observes the intermediate state, and a primary-owned deallocation is never misrouted to the backup. I also realized that bumping the two stores/loads to Release/Acquire wouldn't actually close the hole on its own: primary_start and primary_size are two separate atomics and can't be read as one atomic pair regardless of ordering — it's the init-before-sharing contract, not the per-atomic ordering, that provides safety. I documented this on init_primary_with_num_cpu / init_backup_with_num_cpu and is_primary_mem, tying it to the single-init # Safety contract on HeapBackend::init.

Happy to additionally bump these to Release/Acquire as defense-in-depth if you'd prefer the local atomics to be self-describing, but it isn't required for correctness given the boot handshake.

The chapter described the old TLSF-only design (Allocator/BackUpAllocator structs, init_primary/init_backup only). Update it to cover: - compile-time backend selection (heap-wf-alloc -> wf_alloc on x86_64/aarch64, else TLSF; mutually exclusive), - the HeapBackend trait and the `type Allocator = ...` alias, - the *_with_num_cpu init APIs and why wf_alloc needs an explicit CPU count (per-CPU tokens / active_threads sizing) while TLSF ignores it, - the x86_64 / aarch64 boot snippets now passing num_cpu. Also fix the "For x86_64" copy-paste in the AArch64 section and the Tallock typo. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ytakano · 2026-06-18T03:05:40Z

The mdbook chapter at mdbook/src/internal/memory_allocator.md still describes the old TLSF-only design (Allocator/BackUpAllocator structs, init_primary/init_backup only). It should be updated to cover the new HeapBackend trait, backend selection by feature, and the *_with_num_cpu APIs (and explain why an explicit num_cpu is needed for wf_alloc).

Updated in 94bc5e2. The chapter now covers:

compile-time backend selection (heap-wf-alloc → wf_alloc on x86_64/aarch64, otherwise TLSF; mutually exclusive via compile_error!);
the HeapBackend trait and the type Allocator = ... alias, plus the real Talloc { primary: Allocator, backup: Allocator, ... } (replacing the old Allocator / BackUpAllocator structs);
both the bare init_primary / init_backup and the init_*_with_num_cpu APIs, with a paragraph on why wf_alloc needs the explicit CPU count (it maps each CPU id to a per-CPU token and sizes its metadata from active_threads, bailing out on 0), why the bare variants read cpu::num_cpu() and so the boot code uses the explicit-count variants on x86_64, and that TLSF ignores the count;
the x86_64 / aarch64 boot snippets now passing num_cpu.

Also fixed the "For x86_64" copy-paste in the AArch64 section and the Tallock typo. mdbook build renders cleanly.

This was the last open item from the review — thanks for the thorough pass.

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

ytakano added 4 commits June 14, 2026 10:39

Add optional wf_alloc heap backend

f10cee1

use wf_alloc for heap memory allocation

2f39140

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

Merge branch 'main' into wf_alloc

2ec6121

docs: update heap backend selection docs

2f6748c

ytakano requested a review from Copilot June 15, 2026 00:15

Copilot started reviewing on behalf of ytakano June 15, 2026 00:15 View session

Copilot AI reviewed Jun 15, 2026

View reviewed changes

Comment thread awkernel_lib/src/heap.rs Outdated

Comment thread kernel/src/arch/x86_64/kernel_main.rs Outdated

initialize AcpiTable before initializing heap memory

bde218d

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

ytakano marked this pull request as ready for review June 15, 2026 08:19

ytakano requested a review from atsushi421 June 15, 2026 08:34

atsushi421 reviewed Jun 16, 2026

View reviewed changes

ytakano and others added 9 commits June 18, 2026 10:47

fmt

c2afb50

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

update docs

67930c8

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

atsushi421 requested a review from Copilot June 18, 2026 09:37

Copilot started reviewing on behalf of atsushi421 June 18, 2026 09:38 View session

Copilot AI reviewed Jun 18, 2026

View reviewed changes

Comment thread awkernel_lib/src/heap.rs Outdated

Comment thread mdbook/src/internal/memory_allocator.md Outdated

Potential fix for pull request finding

e65c46c

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

docs(alloc): correct wf_alloc init-failure behavior in module docs

6326042

atsushi421 self-requested a review June 19, 2026 11:51

atsushi421 approved these changes Jun 19, 2026

View reviewed changes

ytakano merged commit 5d52bf4 into tier4:main Jun 19, 2026
1 check passed

Conversation

ytakano commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related links

How was this PR tested?

Notes for reviewers

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

atsushi421 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ytakano commented Jun 18, 2026

Uh oh!

ytakano commented Jun 18, 2026

Uh oh!

ytakano commented Jun 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ytakano commented Jun 14, 2026 •

edited

Loading