feat(async): add task::kill() for force-terminating async tasks by ytakano · Pull Request #698 · tier4/awkernel

ytakano · 2026-06-16T08:51:37Z

Description

Adds task::kill(task_id) to awkernel_async_lib, which marks a task as Terminated and removes it from the global TASKS registry. Also fixes a race in run_main() where Poll::Pending could overwrite a Terminated state set by a concurrent kill() call.

Adds files for AI agents also.

How was this PR tested?

Qemu Test

Tested on a Qemu VM by using applications/tests/test_task_kill.

Welcome to Awkernel!

You can use BLisp language as follows.
https://ytakano.github.io/blisp/

> (factorial 20)
2432902008176640000
> (+ 10 20)
30

Enjoy!

> [         5112 INFO] TASK_KILL_TEST start
[         5134 INFO] TASK_KILL_TEST kill_sleeping_task: PASS (kill returned true)
[         5135 INFO] TASK_KILL_TEST task_removed_from_registry: PASS
[         5135 INFO] TASK_KILL_TEST kill_sleeping_task: PASS (resources freed)
[         5135 INFO] TASK_KILL_TEST kill_unknown_id: PASS
[         5186 INFO] TASK_KILL_TEST kill_idempotent: PASS (second kill returned false)
[         5237 INFO] TASK_KILL_TEST resources_freed_after_kill: PASS
[         5237 INFO] TASK_KILL_TEST kill_preempted_task: step 1 start
[         5239 INFO] TASK_KILL_TEST kill_preempted_task: step 2 spawned id=3
[         5290 INFO] TASK_KILL_TEST kill_preempted_task: step 3 after sleep 50ms
[         5290 INFO] TASK_KILL_TEST kill_preempted_task: step 4 kill=true
[         5591 INFO] TASK_KILL_TEST kill_preempted_task: step 5 after sleep 300ms
[         5591 INFO] TASK_KILL_TEST kill_preempted_task: PASS
[         5592 INFO] TASK_KILL_TEST kill_panicked_task: step 1 start
[         5592 ERROR] kernel/src/nostd.rs:162: panic: panicked at applications/tests/test_task_kill/src/lib.rs:171:13:
intentional panic for kill semantics test
[         5598 INFO] 00:        0x000000005c4b03 - UNKNOWN
[         5600 INFO] 01:        0x000000004313e8 - UNKNOWN
[         5602 INFO] 02:        0x0000000061097b - UNKNOWN
[         5604 INFO] 03:        0x00000000430ed1 - UNKNOWN
[         5607 INFO] 04:        0x0000000059869d - UNKNOWN
[         5608 INFO] 05:        0x0000000059661b - UNKNOWN
[         5610 INFO] 06:        0x000000004261ae - UNKNOWN
[         5610 INFO] 07:        0x0000000042ce1f - UNKNOWN
[         5610 INFO] 08:        0x000000000000fb - UNKNOWN
[         5692 INFO] TASK_KILL_TEST kill_panicked_task: PASS (panicked task removed from registry)
[         5693 INFO] TASK_KILL_TEST kill_panicked_task: PASS (kill returned false)
[         5694 INFO] TASK_KILL_TEST done

SPIN Model Chcker

Update specification/awkernel_async_lib/src/task/preemptive_spin to handle kill() and tested it works correctly.

Notes for reviewers

Adds `task::kill(task_id)` to `awkernel_async_lib`, which marks a task as `Terminated` and removes it from the global TASKS registry. Also fixes a race in `run_main()` where `Poll::Pending` could overwrite a `Terminated` state set by a concurrent `kill()` call. Phase 0 of the PCIe detach roadmap. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds a kernel test application that verifies task::kill() semantics at runtime: killing a sleeping task removes it from the registry, killing an unknown ID returns false, and a second kill on a dead task returns false. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds resources_freed_after_kill() to the test_task_kill application. A DropTracker struct increments a global counter on drop; the test kills the owner task while it is sleeping and confirms the counter reaches 1 after the sleep timer fires and releases the last Arc<Task>. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

In nographic mode QEMU provides no GOP framebuffer, so draw() returns Err(NoFrameBuffer) immediately. Instead of ignoring the error and looping every 5 seconds forever, probe on startup and return if there is nothing to draw — eliminating needless scheduler churn and making the service self-cleaning in headless environments. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When task::kill() removes a task from TASKS, the timer closure for that task's ongoing sleep() becomes the sole Arc<Task> holder. If the timer fires before the Arc refcount reaches zero, the closure calls waker.wake() while still holding the Sleep state mutex. For a Terminated task, wake() returns immediately and drops the Arc inline; that triggers Task::drop() → Future::drop() → Sleep::drop(), which tries to re-acquire the same state mutex — instant spinlock deadlock, hanging the timer thread and blocking all subsequent sleeps. Fix: capture the wake decision under the state lock, release the lock, then call waker.wake() only if the state was Wait. Sleep::drop() now sees State::Finished and exits without trying to lock. Reproducer: test_task_kill::resources_freed_after_kill() kills a task mid-sleep(200ms); the 200ms timer then fires and previously deadlocked the scheduler, leaving the test task's own sleep(300ms) never waking up. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

This reverts commit 4769720.

Copilot

Pull request overview

This PR extends awkernel_async_lib with a task::kill(task_id) API to force-terminate async tasks and hardens the executor against a race where Poll::Pending could overwrite a concurrently-set Terminated state. It also adds an integration-style test application wired through userland/kernel features.

Changes:

Add task::kill(u32) -> bool that sets task state to Terminated and removes it from the global TASKS registry.
Fix run_main() to avoid overwriting Terminated/Panicked back to Waiting on Poll::Pending.
Avoid a potential deadlock in Sleep timer callback by releasing its internal state lock before calling waker.wake(), and add a test_task_kill app to validate behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
userland/src/lib.rs	Wires `test_task_kill::run()` into the userland entrypoint under a feature flag.
userland/Cargo.toml	Adds optional dependency + feature flag for the new test application.
kernel/Cargo.toml	Propagates the `test_task_kill` feature to userland.
awkernel_async_lib/src/task.rs	Implements `task::kill()` and fixes executor state handling under concurrent termination.
awkernel_async_lib/src/sleep_task.rs	Drops `Sleep`’s state lock before waking to prevent deadlock during kill-driven drops.
applications/tests/test_task_kill/src/lib.rs	Adds a test app covering kill success/unknown/idempotence and drop/resource release.
applications/tests/test_task_kill/Cargo.toml	Defines the new no-std test crate and its dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

ytakano · 2026-06-16T21:43:24Z

+///
+/// Returns `true` if the task was found and killed, `false` if it was not found or was
+/// already in a terminal state.
+pub fn kill(task_id: u32) -> bool {


MAJOR — kill() defers resource reclamation, possibly forever

kill() removes the task from TASKS and sets Terminated, but it never drops the task's future. The Arc<Task> (and therefore the future and all resources it owns) is freed only when the last Arc<Task> clone is dropped — i.e. when every Waker clone held outside TASKS is dropped.

For a task parked in Sleep, that Waker lives in the timer delta-list and is dropped only when the timer fires — so reclamation is deferred for up to the full remaining sleep duration. In kill_sleeping_task the target sleeps for 3600s, so after kill() its memory stays pinned for ~an hour even though the registry check passes.

Worse: a task parked on a Waker that never fires (e.g. a channel recv with no remaining sender, or any custom future that stashed its waker elsewhere) is leaked permanently — invisible, since it is already out of TASKS.

The doc says "the caller is responsible for deregistering them", but no API is exposed to deregister internal wakers, so callers generally cannot. Consider having kill() proactively reclaim: try_lock the future mutex and drop / fuse-terminate the inner future to release resources immediately. Note the in-flight-poll case (future lock held by a polling CPU): if try_lock fails, the Poll::Pending arm in run_main (around task.rs:884-895) currently also does not drop the future, so resources still are not freed — that arm should drop / terminate the future when it observes Terminated.

Addressed. kill() now tries to drop the task future when it terminates a non-preempted task, and run_main also drops the future when it observes Terminated/Panicked after an in-flight poll returns Pending. For Preempted, kill_pending defers the termination until the next poll boundary, then removes the task from TASKS and drops the future there. I also updated the kill() docs to describe the in-flight poll and terminal-state behavior.

ytakano · 2026-06-16T21:43:30Z

+                            }
+                        };
+                        if should_wake {
                            waker.wake();


MAJOR — killed-task Drop chain now runs under the global SLEEPING lock

The timer handler is invoked from SleepingTasks::wake_task() (scheduler.rs:327, handler()), which runs while the global SLEEPING mutex is held (scheduler.rs:383-384). With kill(), it is now common for waker.wake() here to drop the last Arc<Task> inline (state is Terminated → Task::wake returns and drops self). That triggers the full Task → future → nested-future Drop chain synchronously, under SLEEPING.

Releasing the sleep state lock before wake() (good fix!) only covers Sleep::drop. The broader invariant is now: arbitrary user Drop code runs while SLEEPING is held. Any Drop impl in a task's future that schedules a timer / sleeps / otherwise locks SLEEPING will self-deadlock — MCS locks are non-reentrant. Sleep::drop is safe today (locks only its own state), but this is a fragile, easy-to-violate invariant.

Consider deferring the final Arc<Task> drop until after SLEEPING is released (e.g. pop the fired handlers, drop the SLEEPING guard, then invoke them), and/or documenting the invariant.

Addressed. I kept the Sleep-local state lock release, and also changed scheduler wakeup handling so expired sleep handlers are taken while SLEEPING is locked but invoked only after the SLEEPING guard is dropped. That avoids running waker.wake() and the possible final Arc/future Drop chain under the global SLEEPING lock.

ytakano · 2026-06-16T21:43:33Z

+        }
+    };
+
+    // Step 2: Under the info lock, transition state to Terminated.


MINOR — kill() is asynchronous w.r.t. an in-flight poll

kill() returns true as soon as it sets Terminated, but if the task is being polled on another CPU the future keeps running to the end of that poll (cooperative — expected). Worth documenting that kill() does not interrupt an in-flight poll, and (per the resource-reclamation comment) that the future is not necessarily dropped even after that poll completes.

ytakano · 2026-06-16T21:43:36Z

+    kill_idempotent().await;
+    resources_freed_after_kill().await;
+
+    log::info!("TASK_KILL_TEST done");


MINOR — test gap & timing fragility

kill_sleeping_task spawns a task that sleeps 3600s and only asserts registry removal — it cannot observe whether the task/future was actually reclaimed (the timer won't fire for an hour), so it gives false confidence that kill() "frees" the task. Only resources_freed_after_kill (200ms sleep) exercises reclamation, and it depends on the exact deferred-drop timing.

All cases also use fixed wall-clock sleeps (50/300ms), which are timing-fragile under load or a different timer resolution. Consider a shorter sleep + an explicit reclamation assertion for the "sleeping task" case.

ytakano · 2026-06-16T21:43:40Z

+/// Sets the task state to `Terminated` and removes it from the global task registry.
+/// Any subsequent `wake()` call for this task will be a no-op. If the task was in
+/// `Waiting` state, wakers may still hold `Arc<Task>` references; the `Task` is freed
+/// only after those wakers are dropped — the caller is responsible for deregistering them.


NIT — doc comment precision

"the caller is responsible for deregistering them" — clarify how (which API), or soften the wording, since no such API is exposed. Also worth noting: kill() returns false when the task became terminal between Step 1 and Step 2 (natural completion racing the kill), which is correct behavior but currently undocumented.

ytakano · 2026-06-16T21:43:41Z

Review summary (concurrency focus)

The kill() lock protocol (TASKS → info → TASKS, never nested), the run_main Poll::Pending guard against resurrecting a killed task, and the sleep state-lock-release deadlock fix are all correct and carefully reasoned. I also checked that NUM_TASK_IN_QUEUE stays balanced (a killed-but-queued task is decremented when popped in get_next_task) and that the registry protocol is safe against id reuse — both look fine.

The two substantive concerns are about lifecycle rather than the state machine:

Resource reclamation — kill() never drops the future, so reclamation is deferred until the last Waker drops (up to the full sleep duration, or never for a waker that never fires → permanent leak).
Drop-under-SLEEPING — with kill() it is now common for the last Arc<Task> to be dropped inside the timer handler while the global SLEEPING lock is held, so arbitrary user Drop code runs under that lock.

Inline comments have details and suggestions.

…ination - Introduce kill_pending flag in TaskInfo (cfg not no_preempt): when kill() is called on a State::Preempted task, set the flag instead of overwriting state. yield_preempted_and_wake_task() unconditionally writes State::Preempted which would clobber a direct State::Terminated write, so deferral is required. - Poll::Pending handler in run_main() checks take_kill_pending() and finalizes termination (state=Terminated + TASKS.remove) at the task's next await point. - Drop the Arc<Task> in run_main() before calling yield_and_pool(ctx) so the thread pool does not hold a stale reference. Without this drop, the pooled thread's task Arc would only be released on the next preemption, which may never happen after kill() — leaving the future (and its DropTracker) alive. - Fix do_preemption() TOCTOU: keep the Arc<Task> from the first get_task() lookup alive through the entire function, removing the second unwrap() lookup and handling the None case (task already removed by kill()) gracefully. - Add kill_preempted_task() test using PrioritizedRR to exercise the deferred kill path; verify kill() returns true, task leaves TASKS, and DropTracker fires. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

ytakano and others added 6 commits June 16, 2026 17:43

Revert "fix(display): exit early when no framebuffer is available"

330623d

This reverts commit 4769720.

ytakano requested a review from Copilot June 16, 2026 10:18

Copilot started reviewing on behalf of ytakano June 16, 2026 10:18 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Comment thread awkernel_async_lib/src/task.rs Outdated

Potential fix for pull request finding

6130429

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

ytakano commented Jun 16, 2026

View reviewed changes

ytakano and others added 14 commits June 17, 2026 07:43

Update task kill semantics tests and add repository guidelines

17b04f0

Improve task kill resource cleanup and scheduling safety

80d5ba8

Handle deferred kill cleanup for preempted tasks

f9ba428

fix: detach service tasks in awkernel_services

62c108a

Fix task kill races

d8d363c

Add SPIN kill test variant

ed4722a

Add async scheduler testing skill

dc7dc66

add CLAUDE.md

0699b69

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

Fix clippy warnings without logic changes

fa5cb14

Await service task spawns before dropping handles

06bfe79

Handle missing preemption targets in schedulers

9894193

fmt

8550c73

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

Cover preempted kill race in SPIN

799dad6

fmt

e99f32e

Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

ytakano marked this pull request as ready for review June 18, 2026 01:14

ytakano requested a review from mkakh June 18, 2026 01:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(async): add task::kill() for force-terminating async tasks#698

feat(async): add task::kill() for force-terminating async tasks#698
ytakano wants to merge 22 commits into
tier4:mainfrom
ytakano:feat/task-kill

ytakano commented Jun 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

ytakano Jun 16, 2026

Uh oh!

ytakano Jun 17, 2026

Uh oh!

ytakano Jun 16, 2026

Uh oh!

ytakano Jun 17, 2026

Uh oh!

ytakano Jun 16, 2026

Uh oh!

ytakano Jun 16, 2026

Uh oh!

ytakano Jun 16, 2026

Uh oh!

ytakano commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ytakano commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related links

How was this PR tested?

Qemu Test

SPIN Model Chcker

Notes for reviewers

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

ytakano Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

ytakano Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

ytakano Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

ytakano Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

ytakano Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

ytakano Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

ytakano Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

ytakano commented Jun 16, 2026

Review summary (concurrency focus)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ytakano commented Jun 16, 2026 •

edited

Loading