Skip to content

feat(alloc): support wait-free allocator#696

Merged
ytakano merged 19 commits into
tier4:mainfrom
ytakano:wf_alloc
Jun 19, 2026
Merged

feat(alloc): support wait-free allocator#696
ytakano merged 19 commits into
tier4:mainfrom
ytakano:wf_alloc

Conversation

@ytakano

@ytakano ytakano commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

This PR introduces wf_alloc as a feature-selectable heap allocator backend for awkernel_lib, while keeping TLSF as the fallback path on architectures where wf_alloc is not enabled/supported. It also updates heap allocator documentation to match current behavior.

Description

  • Make heap backend selection explicit by compile-time feature:
    • heap-wf-alloc selects wf_alloc.
    • otherwise, TLSF is used.
  • Keep allocator behavior under Talloc policy (primary/backup) unchanged:
    • user-space path still uses primary-first behavior.
    • kernel path uses primary-then-backup behavior.
    • deallocation routing remains range-based.
  • Add wf_alloc plumbing for supported targets (x86_64, aarch64):
    • add optional dependency and feature gating in awkernel_lib.
    • add compile-time guard that rejects heap-wf-alloc on unsupported architectures.
    • implement backend switch via type Allocator = ... alias.
  • Update architecture heap initialization to pass active CPU count into allocator setup:
    • x86_64 and aarch64 now initialize primary/backup heaps via init_*_with_num_cpu(...).
  • Default feature wiring for kernel is aligned with this backend selection (heap-wf-alloc on x86 by default, TLSF remains for non-CAS2 paths).
  • Update awkernel_lib/src/heap.rs doc comments:
    • remove stale claim that TLSF is default.
    • document wf_alloc initialization flow, CPU-token mapping, interrupt-guarded alloc/dealloc, and fallback to existing OOM path when allocator init fails.
  • Initialize AcpiTable before heap memory initialization for x86_64.

Related links

Benchmarking:

How was this PR tested?

Tested on Qemu and physical machines

Notes for reviewers

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a feature-selectable heap allocator backend to awkernel_lib, enabling wf_alloc on supported architectures while preserving the existing TLSF-based path as the fallback.

Changes:

  • Introduces heap-wf-alloc / heap-tlsf feature wiring to explicitly select the heap backend (kernel defaults updated per-arch).
  • Plumbs CPU-count-aware heap initialization (init_*_with_num_cpu) into x86_64 and aarch64 kernel bring-up.
  • Refactors awkernel_lib::heap to select backend via a type Allocator = ... alias, adds wf_alloc initialization logic, and updates heap module documentation.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
kernel/src/arch/x86_64/kernel_main.rs Detects CPU count earlier and passes num_cpu into backup/primary heap initialization.
kernel/src/arch/aarch64/kernel_main.rs Passes the detected CPU count into primary/backup heap initialization.
kernel/Cargo.toml Adds kernel-level feature flags to select allocator backend and sets per-target defaults.
awkernel_lib/src/heap.rs Implements backend selection, adds wf_alloc backend, adds *_with_num_cpu init APIs, and updates docs.
awkernel_lib/Cargo.toml Adds optional wf_alloc dependency (target-gated) and exposes heap-wf-alloc / heap-tlsf features.
Comments suppressed due to low confidence (1)

awkernel_lib/src/heap.rs:42

  • The documentation states that only 32/64 CPUs are supported, but the implementation uses NUM_MAX_CPU (currently 512) for the CPU flag bitmap and for wf_alloc's active_threads upper bound. This limitation note looks stale and can mislead callers configuring num_cpu.
//! # Limitation
//!
//! Only 32 or 64 CPUs are supported for 32 or 64 bits CPU architectures.
//!

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread awkernel_lib/src/heap.rs Outdated
Comment thread kernel/src/arch/x86_64/kernel_main.rs Outdated
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
@ytakano ytakano marked this pull request as ready for review June 15, 2026 08:19
@ytakano ytakano requested a review from atsushi421 June 15, 2026 08:34

@atsushi421 atsushi421 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mdbook chapter at mdbook/src/internal/memory_allocator.md still describes the old TLSF-only design (Allocator/BackUpAllocator structs, init_primary/init_backup only). It should be updated to cover the new HeapBackend trait, backend selection by feature, and the *_with_num_cpu APIs (and explain why an explicit num_cpu is needed for wf_alloc).


The new ~200 lines of layout math in WfAllocBackend::init (alignment, overflow guards, metadata sizing, active_threads validation) have no unit tests. align_up alone is a pure helper that is trivially testable. Consider adding #[cfg(test)] mod tests that at least covers align_up overflow/normal cases and the rejection paths of init (too small heap, active_threads == 0 or > NUM_MAX_CPU, misaligned heap_start).


is_primary_mem (around line 384 in awkernel_lib/src/heap.rs) reads primary_start and primary_size with Relaxed, and the matching stores in init_*_with_num_cpu are also Relaxed. Another CPU can observe "new primary_start, old primary_size = 0" between the two stores; in that window end == start, the range check is always false, and primary-owned deallocations are routed to the backup, which would treat them as foreign and corrupt its free list. Use Release for the size store (last) and Acquire for the load, or document that init must happen-before any allocation.

Comment thread awkernel_lib/src/heap.rs
Comment thread awkernel_lib/src/heap.rs
Comment thread awkernel_lib/src/heap.rs Outdated
Comment thread awkernel_lib/src/heap.rs
Comment thread awkernel_lib/src/heap.rs Outdated
Comment thread awkernel_lib/src/heap.rs
Comment thread kernel/src/arch/x86_64/kernel_main.rs Outdated
Comment thread kernel/src/arch/x86_64/kernel_main.rs Outdated
Comment thread kernel/src/arch/x86_64/kernel_main.rs Outdated
Comment thread kernel/src/arch/x86_64/kernel_main.rs
ytakano and others added 9 commits June 18, 2026 10:47
Address review comments on heap.rs:
- Reword the module doc so the wf_alloc init-failure path is accurate:
  alloc returns null and Talloc panics (primary) or halts via
  delay::wait_forever() (non-primary); there is no OOM-handler fallback.
- Document that init_primary/init_backup read cpu::num_cpu() internally,
  and the hazard that a 0 count makes WfAllocBackend::init bail out.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When cpu_id >= active_threads(), dealloc previously returned silently,
permanently leaking the block. Print cpu_id and active_threads via the
allocation-free unsafe_puts/unsafe_print_hex_u64 and halt in
delay::wait_forever(). We avoid panic! here because the panic handler
allocates, which would re-enter this same out-of-range guard on this CPU.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>
Address review comment: init is `unsafe fn(&self, ...)` on a Sync type, so
the initialized fast-path is not mutual exclusion and concurrent init is UB.
Document the # Safety contract that callers must serialize init (kernel boot
does so on a single thread before APs start).

Also move the unsafe_print_hex_u64 import into the wf_alloc_backend module to
avoid an unused-import warning on backends that exclude it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ntly

HeapBackend::init now returns Result<(), &'static str>. Each WfAllocBackend
init failure path (alignment/overflow guards, metadata sizing, active_threads
bounds, from_metadata_region) returns a descriptive reason instead of a silent
return, so a half-initialized heap can no longer go unnoticed until the first
allocation fails.

Talloc::init_{primary,backup}_with_num_cpu now print the reason via unsafe_puts
and halt in delay::wait_forever() on failure. We print and halt rather than
panic! because the panic handler allocates, which cannot work while the heap is
not yet up. TlsfBackend::init returns Ok(()).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The cfg gating silently selects WfAllocBackend when both features are on,
ignoring heap-tlsf. Add a compile_error! so this feature-flag mistake (e.g.
from an unrelated downstream crate) fails loudly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The offset was read twice (early in kernel_main to call create_acpi, and
again in kernel_main2 as "step 7"), and the early read was unlabeled, making
the boot step list misleading. Fetch it once in kernel_main (now labeled
step 4, before ACPI which needs it) and pass it into kernel_main2, removing
the duplicate read. Renumber the step list accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The backup heap (early) and primary heap (late) were sized from two different
CPU populations: detect_num_cpus counted MADT entries by the enabled bit, while
non_primary_cpus additionally filters by WaitingForSipi and the xAPIC < 255
limit. The early count also degraded to 1 when the MADT was absent, masking the
real ACPI fault.

- Rename detect_num_cpus to count_usable_cpus and document that the enabled-bit
  MADT count is a deliberate, allocation-free upper bound of non_primary_cpus +
  1 (non_primary only ever applies additional restrictive filters). It must stay
  allocation-free because it runs before any heap is initialized -- going through
  acpi.platform_info() allocates and would OOM here.
- Make a missing MADT fatal instead of unwrap_or(1).
- Add a boot-time check in kernel_main2 that non_primary_cpus + 1 does not exceed
  the backup heap's count, so any divergence fails loudly at boot rather than as
  an opaque allocation failure on a high cpu_id later.

Verified: boots to the shell under QEMU with -smp 16 (15 APs + primary = 16,
matching the backup heap count); no inconsistency or OOM.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The review noted the wf_alloc layout math had no unit tests, and that align_up
is a pure, trivially testable helper. It lived inside the wf_alloc_backend
module, which (like the whole heap module) is gated to no_std builds with the
heap-wf-alloc feature, so the host `--features std` test build never compiled
it.

Extract align_up into a new always-declared `heap_util` module (its body gated
to the wf backend or cfg(test)) so the std test harness compiles and runs it,
mirroring how local_heap's tests run on the host. Add tests covering
already-aligned, round-up, alignment-of-one, large power-of-two, and overflow
(None) cases.

The init rejection paths are not unit-tested: they depend on wf_alloc types and
would require enabling heap-wf-alloc (and its global allocator) in the host test
build.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ytakano

ytakano commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

The new ~200 lines of layout math in WfAllocBackend::init (...) have no unit tests. align_up alone is a pure helper that is trivially testable. Consider adding #[cfg(test)] mod tests that at least covers align_up overflow/normal cases and the rejection paths of init ...

Added in da52eab. While doing this I hit a structural snag worth noting: the heap module is #[cfg(not(feature = "std"))] (it installs the #[global_allocator]), and the host test build runs with --features std, so a #[cfg(test)] mod tests placed inside heap / wf_alloc_backend is never compiled by make test.

So I extracted align_up into a new always-declared heap_util module (its body gated to the wf backend or cfg(test)), which the std test harness does compile and run — mirroring how local_heap's tests run on the host. The tests cover align_up's already-aligned, round-up, alignment-of-one, large power-of-two, and overflow (None) cases (make test: 4 new tests, all passing).

I did not unit-test the init rejection paths: they depend on wf_alloc types (WfAlloc::metadata_region_align/size, from_metadata_region), so testing them on the host would require enabling heap-wf-alloc for the whole test binary, which also swaps in an uninitialized WfAllocBackend as the global allocator and breaks every other test. If you'd like that coverage, I think the right move is a dedicated wf-enabled test target rather than changing the default one — happy to do that as a follow-up if you want.

…ange

Document on init_{primary,backup}_with_num_cpu and is_primary_mem that the
Relaxed stores/loads of primary_start/primary_size are sound because heap init
must happen-before the allocator is shared with any other CPU (the same
single-init contract as HeapBackend::init). No other CPU can observe the window
between the two stores, so a primary-owned deallocation is never misrouted to
the backup allocator.

The kernel establishes this happens-before edge at boot:
- x86_64: the BSP's BSP_READY release store, paired with each AP's acquire fence
  before its first allocation.
- aarch64: the BSP's PRIMARY_INITIALIZED SeqCst store, which each AP loads
  (SeqCst) before proceeding.

Two separate atomics cannot be updated atomically by stronger orderings anyway,
so it is the init-before-sharing contract, not per-atomic ordering, that
provides safety; hence documenting rather than switching to Release/Acquire.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ytakano

ytakano commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

is_primary_mem (around line 384 in awkernel_lib/src/heap.rs) reads primary_start and primary_size with Relaxed (...) Another CPU can observe "new primary_start, old primary_size = 0" between the two stores (...) Use Release for the size store (last) and Acquire for the load, or document that init must happen-before any allocation.

Took the "document that init must happen-before any allocation" option, in 471c5cd — and it turns out that's the technically accurate fix here rather than just the convenient one.

The window you describe ("new primary_start, old primary_size == 0") only exists while the BSP runs heap init single-threaded; the APs never read primary_start / primary_size until after init has fully completed, because the boot sequence puts a happens-before edge between init and any AP allocation:

  • x86_64: the BSP does all heap init while the APs are parked, then publishes BSP_READY with a release store; each AP observes it and runs an fence(Acquire) before its first allocation (use_primary_then_backup etc.).
  • aarch64: the BSP publishes PRIMARY_INITIALIZED with a SeqCst store after heap init, and each AP spins on a SeqCst load of it before proceeding.

So no other CPU ever observes the intermediate state, and a primary-owned deallocation is never misrouted to the backup. I also realized that bumping the two stores/loads to Release/Acquire wouldn't actually close the hole on its own: primary_start and primary_size are two separate atomics and can't be read as one atomic pair regardless of ordering — it's the init-before-sharing contract, not the per-atomic ordering, that provides safety. I documented this on init_primary_with_num_cpu / init_backup_with_num_cpu and is_primary_mem, tying it to the single-init # Safety contract on HeapBackend::init.

Happy to additionally bump these to Release/Acquire as defense-in-depth if you'd prefer the local atomics to be self-describing, but it isn't required for correctness given the boot handshake.

The chapter described the old TLSF-only design (Allocator/BackUpAllocator
structs, init_primary/init_backup only). Update it to cover:
- compile-time backend selection (heap-wf-alloc -> wf_alloc on x86_64/aarch64,
  else TLSF; mutually exclusive),
- the HeapBackend trait and the `type Allocator = ...` alias,
- the *_with_num_cpu init APIs and why wf_alloc needs an explicit CPU count
  (per-CPU tokens / active_threads sizing) while TLSF ignores it,
- the x86_64 / aarch64 boot snippets now passing num_cpu.

Also fix the "For x86_64" copy-paste in the AArch64 section and the Tallock typo.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ytakano

ytakano commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

The mdbook chapter at mdbook/src/internal/memory_allocator.md still describes the old TLSF-only design (Allocator/BackUpAllocator structs, init_primary/init_backup only). It should be updated to cover the new HeapBackend trait, backend selection by feature, and the *_with_num_cpu APIs (and explain why an explicit num_cpu is needed for wf_alloc).

Updated in 94bc5e2. The chapter now covers:

  • compile-time backend selection (heap-wf-alloc → wf_alloc on x86_64/aarch64, otherwise TLSF; mutually exclusive via compile_error!);
  • the HeapBackend trait and the type Allocator = ... alias, plus the real Talloc { primary: Allocator, backup: Allocator, ... } (replacing the old Allocator / BackUpAllocator structs);
  • both the bare init_primary / init_backup and the init_*_with_num_cpu APIs, with a paragraph on why wf_alloc needs the explicit CPU count (it maps each CPU id to a per-CPU token and sizes its metadata from active_threads, bailing out on 0), why the bare variants read cpu::num_cpu() and so the boot code uses the explicit-count variants on x86_64, and that TLSF ignores the count;
  • the x86_64 / aarch64 boot snippets now passing num_cpu.

Also fixed the "For x86_64" copy-paste in the AArch64 section and the Tallock typo. mdbook build renders cleanly.

This was the last open item from the review — thanks for the thorough pass.

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

Comment thread awkernel_lib/src/heap.rs Outdated
Comment thread mdbook/src/internal/memory_allocator.md Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@atsushi421 atsushi421 self-requested a review June 19, 2026 11:51
@ytakano ytakano merged commit 5d52bf4 into tier4:main Jun 19, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants