Atelet runs only on nodes hosting ateoms (#9) by eliranw · Pull Request #134 · agent-substrate/substrate

eliranw · 2026-05-31T12:59:47Z

Summary

Closes #9. Atelet currently runs on every node as a DaemonSet. This makes it run only on nodes that currently host ateom pods.

A new AteletNodeReconciler (Pod-keyed) watches ateom pods and maintains a substrate-owned ate.dev/atelet=true label on each Node currently hosting an ateom. The atelet DaemonSet now carries nodeSelector: ate.dev/atelet=true, so its footprint follows ateom placement.

Mechanism

Reactive labeling. On each ateom-pod event, the reconciler lists pods on the affected node (via a cached spec.nodeName field index) and SSA-applies the desired node state.
Per-pool refcounting via annotations. Each node carries one ate.dev/claim.<workerpool-uid> annotation per WorkerPool occupying it. The label is present iff at least one claim exists; SSA's field-ownership prunes claims/label as pods leave. Multiple WorkerPools can safely share a node.
Init container. Every ateom pod gets a wait-for-atelet init container that TCP-probes $HOST_IP:8085 until the local atelet is serving. This absorbs the scheduling→atelet-ready gap and makes ateoms robust to atelet restarts/upgrades mid-life.
Finalizer. A ate.dev/release-node-claims finalizer on WorkerPool holds deletion until its claims are released, so claims can't leak.
RBAC. Adds nodes: get;list;watch;patch and pods: get;list;watch to the controller ClusterRole (regenerated from kubebuilder markers).

Why reactive (not a placement policy)

Substrate doesn't decide which nodes ateoms run on — the kube-scheduler does, freely. The reconciler just records that choice as a label. This keeps the change small and avoids inventing a node-selection policy; richer schemes (worker classes, capacity reservations) can layer on later.

Test plan

envtest: single pod → node gets label + claim
envtest: two pools on one node → two claims, one label
envtest: delete one pool's pod with another present → only its claim removed, label stays
envtest: delete last pod → claim and label both removed (SSA prune)
envtest: WorkerPool deletion held by finalizer until claim released, then completes
envtest: ateom Deployment carries the wait-for-atelet init container
make verify (tests, gofmt, lint, codegen, licenses, go-mod-tidy, boilerplate)

Migration / rollout

This replaces the atelet DS's "run on all nodes" behavior with "run on labeled nodes only." Recommended two-phase rollout for existing clusters:

Deploy the new controller image + RBAC first. It observes existing ateom pods and labels their nodes.
Apply the atelet DS manifest change. The rolling update drops atelet from unlabeled nodes (no ateoms there) and keeps it on labeled ones (no disruption).

On a fresh cluster the atelet DaemonSet runs zero pods until the first WorkerPool's ateoms are scheduled — expected, not a broken install (kubectl rollout status on a 0-desired DaemonSet returns success immediately). The init container means even a single-shot kubectl apply -k is safe: ateoms wait rather than crash.

google-cla · 2026-05-31T12:59:57Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

The new AteletNodeReconciler will refcount node claims per WorkerPool UID. Embedding the UID directly on the pod template means the reconciler can read it from pod labels without a separate WorkerPool lookup — which matters because the WorkerPool may be mid-deletion when its pods are being reconciled. Signed-off-by: Eliran Wolff <eliranw@nvidia.com>

…nt-substrate#9) Add a busybox:1.36 init container to every ateom pod that probes \$HOST_IP:8085 until atelet's gRPC port is reachable. \$HOST_IP comes from the downward API (status.hostIP). This makes ateom robust to atelet upgrades, restarts, and node-cold-start races: the pod waits in Init:0/1 instead of crashlooping when atelet isn't yet serving. Independent of any node-labeling change — useful on its own. Signed-off-by: Eliran Wolff <eliranw@nvidia.com>

…t-substrate#9) Atelet DaemonSet now requires the ate.dev/atelet=true label to schedule a pod. Substrate's new AteletNodeReconciler (next commit) will populate this label on nodes that host ateom workloads, and remove it when they don't. The init container added in the prior commit absorbs the atelet startup gap, so this change does not introduce a race for ateom pods. NOTE: at first apply on an existing cluster, the DS rolling update will terminate atelet pods on nodes that don't have the label yet. Deploy the new atecontroller image first; it will label nodes that currently host ateoms before the DS update has fully rolled out. See release notes. Signed-off-by: Eliran Wolff <eliranw@nvidia.com>

New Pod-keyed reconciler that will maintain the ate.dev/atelet=true node label and ate.dev/claim.<workerpool-uid> annotations. This commit lands the file skeleton plus the predicate and pod-to-node mapping with unit tests. The reconcile logic itself is stubbed (returns no-op); subsequent commits add it. Field indexer on spec.nodeName is registered in SetupWithManager so future List(client.MatchingFields) calls work against the cached client. Signed-off-by: Eliran Wolff <eliranw@nvidia.com>

…trate#9) reconcileNode lists ateom pods on the node (via cached spec.nodeName field selector), computes the set of distinct WorkerPool UIDs present, and SSA-applies the desired label + per-pool claim annotations. SSA's granular-map semantics let us add and remove individual claim keys without disturbing other field owners. Also registers AteletNodeReconciler in the envtest TestMain so integration tests against the reconciler can run. Signed-off-by: Eliran Wolff <eliranw@nvidia.com>

…gent-substrate#9) Three new envtest cases: - Two pools on the same node → two claim annotations, one label - Deleting one pool's pod with another pool still present → only that pool's annotation removed, label sticks - Deleting the last ateom pod → both the claim and the label are removed via SSA's granular-map removal Pod deletions use client.GracePeriodSeconds(0) so that envtest removes the pod object immediately rather than waiting for a kubelet that doesn't exist in the test environment. Validates the per-pool refcounting design. Signed-off-by: Eliran Wolff <eliranw@nvidia.com>

…bstrate#9) Wire the new reconciler into the atecontroller binary so it runs alongside WorkerPool and ActorTemplate reconcilers in production. Signed-off-by: Eliran Wolff <eliranw@nvidia.com>

…ubstrate#9) The finalizer holds WorkerPool deletion until every Node has released the per-pool claim annotation (ate.dev/claim.<wp.UID>). Claim release happens naturally as the Deployment cascade deletes the ateom pods and AteletNodeReconciler observes the deletions. handleDeletion lists Nodes and requeues with backoff until no claim remains, then removes the finalizer. Signed-off-by: Eliran Wolff <eliranw@nvidia.com>

…-substrate#9) End-to-end envtest path: WorkerPool created -> finalizer present -> pod bound to node -> claim annotation appears -> pod deleted -> WorkerPool deleted -> finalizer holds until claim is gone -> both WorkerPool and claim annotation are absent. Signed-off-by: Eliran Wolff <eliranw@nvidia.com>

Adds nodes/get;list;watch;patch and pods/get;list;watch to the ate-controller ClusterRole, picked up from the +kubebuilder:rbac markers on the new reconciler. Signed-off-by: Eliran Wolff <eliranw@nvidia.com>

…rate#9) Cleanup pass over the feature (no behavior change): - AteletNodeReconciler.reconcileNode: the node fetched for the existence check was bound to an unused variable. Discard the value and document that the Get exists only to avoid resurrecting a deleted Node via the SSA upsert. - ateomPodPredicate now gates on WorkerPoolUIDLabelKey (the key reconcileNode actually consumes) instead of WorkerPoolLabelKey, so the watch filter matches the work the reconciler does and a half-labeled pod can't trigger a no-op reconcile. - Centralize the claim-annotation format in a claimAnnotationKey helper, shared by the reconciler (writer) and the WorkerPool finalizer (reader), replacing the string concatenation duplicated across both files. - Tests: replace hand-rolled finalizer-scan loops with controllerutil.ContainsFinalizer, and reuse the makeNode helper instead of inlining Node construction. Signed-off-by: Eliran Wolff <eliranw@nvidia.com>

This was referenced May 31, 2026

Atelet runs only on nodes hosting ateoms (#9) #133

Closed

Atelet should only run on nodes where ateoms are running. #9

Open

eliranw added 11 commits May 31, 2026 17:16

atecontroller: register AteletNodeReconciler in the manager (agent-su…

df4b81c

…bstrate#9) Wire the new reconciler into the atecontroller binary so it runs alongside WorkerPool and ActorTemplate reconcilers in production. Signed-off-by: Eliran Wolff <eliranw@nvidia.com>

manifests: regenerate RBAC for AteletNodeReconciler (agent-substrate#9)

32e07af

Adds nodes/get;list;watch;patch and pods/get;list;watch to the ate-controller ClusterRole, picked up from the +kubebuilder:rbac markers on the new reconciler. Signed-off-by: Eliran Wolff <eliranw@nvidia.com>

eliranw force-pushed the eliranw/atelet-node-placement branch from a39a478 to b046670 Compare May 31, 2026 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atelet runs only on nodes hosting ateoms (#9)#134

Atelet runs only on nodes hosting ateoms (#9)#134
eliranw wants to merge 11 commits into
agent-substrate:mainfrom
eliranw:eliranw/atelet-node-placement

eliranw commented May 31, 2026

Uh oh!

google-cla Bot commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eliranw commented May 31, 2026

Summary

Mechanism

Why reactive (not a placement policy)

Test plan

Migration / rollout

Uh oh!

google-cla Bot commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant