Add fork mailbox payload API#267
Conversation
✱ Stainless preview builds for hypemanThis PR will update the Edit this comment to update it. It will appear in the SDK's changelogs. ✅ hypeman-openapi studio · code · diff
✅ hypeman-go studio · code · diff
✅ hypeman-typescript studio · code · diff
This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push. |
dffc792 to
05bc363
Compare
…-payloads # Conflicts: # cmd/api/api/snapshots.go # cmd/api/api/snapshots_test.go # lib/forkvm/README.md # lib/guest/client.go # lib/instances/fork.go # lib/instances/guest_resume_network.go # lib/instances/manager.go # lib/instances/restore.go # lib/instances/restore_egress_test.go # lib/system/guest_agent/resume_network.go
|
Firetiger deploy monitoring skipped This PR didn't match the auto-monitor filter configured on your GitHub connection:
Reason: PR repository cannot be determined from provided information; please confirm this is in the To monitor this PR anyway, reply with |
rgarcia
left a comment
There was a problem hiding this comment.
blocking on a few maintainability/design issues:
-
lib/instances/fork.go:50andlib/instances/snapshot.go:390only gate mailbox usage on standby + running target, butlib/instances/fork_mailbox.go:225assumes a Firecracker-style rawmemorysnapshot file and fixed mailbox offsets. Non-Firecracker standby forks can accept mailbox requests and fail later inside restore. Please make this an explicit capability/hypervisor check before fork work begins, so the API rejects unsupported mailbox forks at the boundary. -
lib/instances/fork_mailbox.go:209duplicates the open/stat/mmap/find/write/cache logic already inlib/instances/guest_resume_network.go:186. This is the same mailbox-frame operation with a different layout. Please push the common primitive intolib/mailboxwith layout config, instead of maintaining two snapshot-memory patchers that can drift. Ideally the fork path should also preflight all requested markers before writing any payloads, so multi-mailbox patching is structurally all-or-nothing rather than relying on outer fork cleanup. -
lib/instances/guest_resume_network.go:143adds mailbox ACK matching as raw substring checks.mailbox=foocan matchmailbox=foobar, and any free-form UDP text containingstage=appliedis accepted. Since this ACK now gates successful fork restore, please parse the key/value payload and require exact fields forstageandmailbox.
rgarcia
left a comment
There was a problem hiding this comment.
- Suggestion, maybe too early: Unify the two handoffs into one interface + slice (main ask).
After this PR, restoreInstance prepares two handoffs back-to-back (restore.go:263 and :271) that now implement the identical trio — prepare… / Close() / AfterResume(ctx) error — and the two post-resume teardown blocks (restore.go:332-341 and :343-349) are line-for-line the same recovery (hv.Shutdown → rollbackAdmissionAllocationActive → releaseNetwork → wrapped return). That's the setup for:
type resumeHandoff interface {
AfterResume(ctx context.Context) error
Close()
}
handoffs := []resumeHandoff{resumeNetworkHandoff, forkMailboxHandoff}
for _, h := range handoffs { defer h.Close() }
// after resume, in order:
for _, h := range handoffs {
if err := h.AfterResume(ctx); err != nil { /* one shared teardown */ }
}
Order is preserved (network up before the guest can UDP-ack), so the slice is safe. Collapses the duplicated teardown and makes a third handoff one line. No guest changes needed.
-
Light suggestion: Collapse the duplicated UDP-wait machinery.
WaitApplied and WaitMailboxApplied (guest_resume_network.go:119 and :140) are methods on the same waiter, reading the same channel, with identical select loops — they differ only in the match predicate. Suggest one wait(ctx, match func(fields map[string]string) bool) and route both through parseUDPAckFields. That also removes the case-handling mismatch between them (one lowercases the whole string, the other only the keys). -
WriteForkMailboxPayloadAt is a thin wrapper used only by its own test.
mailbox.go:110 just calls WritePayloadAt(w, ForkLayout, …); production calls WritePayloadAt directly (fork_mailbox.go:272), and the only caller of the wrapper is TestWriteForkMailboxPayloadAt. Suggest deleting it and pointing the test at WritePayloadAt. -
AckTimeout validation is asymmetric + message is off.
fork_mailbox.go:99-104: the < 0 check is gated on WaitForAck but the > 30s check isn't, and "must be positive" is misleading since 0 is valid (maps to the 2s default). Suggest aligning the gating and changing the message to "must not be negative." -
Question on wait_for_ack failure semantics.
On a missing ack, a fork that resumed successfully gets torn down (restore.go:343). Two asymmetries vs the resume-network path worth confirming are intentional: that path falls back to host-initiated reconfigure (resume_network_handoff.go:81) rather than aborting, and its default timeout is 5s vs 2s here. Since the guest-side consumer of a fork mailbox is caller-implemented, wait_for_ack=true against a guest that doesn't ack will reliably destroy the fork after 2s — fine if that's the intended contract, but worth a note on the API field and a deliberate choice on the 2s vs 5s default.
| type: string | ||
| description: Per-template mailbox token used to identify the guest memory marker. | ||
| minLength: 1 | ||
| maxLength: 128 |
There was a problem hiding this comment.
Discoverability gap — there's no way for a fork caller to enumerate which mailbox names/tokens exist in a given snapshot. A wrong/typo'd token just surfaces as a generic "marker not found"-style failure at fork time. Feels strange for the token to be an out-of-band contract that assumes knowledge of how the guest sets this token?
| Optional JSON mailbox payloads to patch into a standby snapshot before resuming the fork. | ||
| Each mailbox must correspond to a guest-side mailbox marker that was present when the | ||
| source snapshot was captured. Mailboxes are only supported for forks that restore from a | ||
| standby snapshot into Running state. |
There was a problem hiding this comment.
a reader could reasonably expect a snapshot-side API given the phrasing here. Consider adding a sentence like "The marker is written by guest software before the standby snapshot is captured; Hypeman does not create or return these tokens" if we're doing out-of-band contract
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 1778dc2. Configure here.
| forkMailboxHandoff, err := m.prepareForkMailboxHandoff(ctx, stored, snapshotDir, opts.Mailboxes) | ||
| if err != nil { | ||
| releaseNetwork() | ||
| return nil, fmt.Errorf("prepare fork mailbox handoff: %w", err) |
There was a problem hiding this comment.
Resume network handoff leaks UDP socket on error
Medium Severity
If prepareForkMailboxHandoff fails, the already-created resumeNetworkHandoff (which may hold an open UDP socket via its ackWaiter) is never closed. The defer closeRestoreHandoffs(handoffs) is registered only after both handoffs succeed, so the early return on line 281 bypasses cleanup of the first handoff.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 1778dc2. Configure here.


Summary
Tests
Note
High Risk
Changes fork/restore orchestration and direct snapshot-memory patching with new failure modes (missing markers, ack timeouts); limited to Firecracker standby forks targeting Running but still security- and correctness-sensitive.
Overview
Adds optional named JSON mailbox payloads to instance and snapshot fork APIs so callers can inject per-template data into guest memory before a forked standby VM resumes.
Fork requests accept a
mailboxesarray (name, token, JSON payload, optionalwait_for_ack/ack_timeout_ms). The API layer maps these into domain types; fork/snapshot restore validates them (standby snapshot only,target_stateRunning, Firecracker only, up to 16 unique mailboxes). Before resume, the host locates guest memory markers in the snapshot, writes payloads (preflight then apply), and can injectack_portand block on UDPstage=appliedfor that mailbox name.lib/mailbox gains a shared fork layout plus
FindMarker,EnsurePayloadFits, andWritePayloadAt, reused for resume-network patching and stricter UDP ack parsing. Restore now runs multiple post-resume handoffs (resume network + fork mailboxes) instead of inlining guest network reconfigure only.Reviewed by Cursor Bugbot for commit 1778dc2. Bugbot is set up for automated code reviews on this repo. Configure here.