Skip to content

feat: snap UX improvements, and work towards strict confinement#1571

Open
alexclewontin wants to merge 5 commits into
NVIDIA:mainfrom
alexclewontin:snap-support
Open

feat: snap UX improvements, and work towards strict confinement#1571
alexclewontin wants to merge 5 commits into
NVIDIA:mainfrom
alexclewontin:snap-support

Conversation

@alexclewontin
Copy link
Copy Markdown

@alexclewontin alexclewontin commented May 26, 2026

Summary

This PR makes the snap package usable under strict confinement and gives it a working local gateway by default. It also fixes the VM image-preparation path that blocked openshell sandbox create --from ... against the snap-managed VM gateway.

The workflows this branch is trying to enable are:

  • Install-and-go local sandboxing: snap install openshell, then run openshell sandbox create ... without manually registering a gateway.
  • Strict-confinement interactive use: openshell sandbox connect and interactive sandbox create flows work without trying to exec host binaries outside the snap sandbox.
  • Custom image VM workflows: openshell sandbox create --from <image> ... works against the default snap-managed VM gateway instead of failing during prepared rootfs setup.
  • Driver switching for local validation: the same snap-managed local gateway can be switched from VM to Docker while the CLI continues to resolve the installer-seeded gateway.

Related Issue

N/A.

Changes

  • Added system gateway metadata support in openshell-bootstrap so package-managed installs can seed read-only gateway registrations outside per-user config.
  • Updated the CLI to honor OPENSHELL_SYSTEM_GATEWAY_DIR, letting package-provided local gateways appear immediately in openshell status and normal gateway selection flows.
  • Moved the snap manifest to snap/snapcraft.yaml and refreshed the snap packaging/tests around that layout.
  • Added snap install/configure hooks and wrapper behavior so the snap seeds a local-vm gateway pointing at http://127.0.0.1:17670, marks it active, and bootstraps gateway runtime state on first install.
  • Bundled the VM runtime payload required by the snap-managed gateway and documented the snap install/build flow in deploy/snap/README.md, docs/about/installation.mdx, docs/sandboxes/manage-gateways.mdx, and architecture/gateway.md.
  • Fixed VM prepared-image handling by:
    • switching from umoci raw unpack to umoci unpack so the guest prep step always gets a bundle rootfs/ directory,
    • mounting prepared ext4 images with ro,noload, and
    • bumping the VM cache version so stale broken prepared images are rebuilt.
  • Improved VM failure reporting so ProcessExited includes relevant VM console output instead of only the host-side exit status.
  • Bundled openssh-client in the snap so strict-confinement SSH-based sandbox workflows use the snap-managed client instead of failing on /usr/bin/ssh AppArmor denies.
  • Updated packaging tests to cover the system gateway directory, seeded localhost gateway metadata, and bundled snap dependencies.

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Additional verification:

  • Current branch: mise run pre-commit passed, including cargo check, cargo clippy, workspace Rust tests, Python tests, markdown lint, Helm lint, install.sh tests, and Python type/lint/format checks.
  • Current branch: pytest python/ passed (15 passed).
  • Current branch: workspace Rust tests passed via pre-commit.
  • Prior branch verification for the same snap behavior covered the intended end-to-end workflows:
    • built the snap,
    • installed it under strict confinement,
    • connected the required interfaces,
    • verified the default local-vm gateway path,
    • launched a VM sandbox successfully,
    • switched the gateway to Docker,
    • launched the sandbox workflow again successfully.
  • Earlier IMAGE_TAG=dev mise run e2e runs on this work and on main both hit the same pre-existing Docker supervisor TLS cert issue (failed to read CA cert from /etc/openshell/tls/client/ca.crt). This PR does not introduce that failure.

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

Adds OPENSHELL_SYSTEM_GATEWAY_DIR, a read-only gateway registry that
installers (snap, deb, systemd units) can seed with deployment-provided
gateways. load_active_gateway and load_gateway_metadata fall back to the
system dir when no per-user entry exists; list_gateways merges both,
with per-user entries shadowing system entries on name collision.

Signed-off-by: Alex Lewontin <alex.lewontin@canonical.com>
Originally-authored-by: Mark Shuttleworth <mark@ubuntu.com>
Signed-off-by: Alex Lewontin <alex.lewontin@canonical.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 26, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 26, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@alexclewontin
Copy link
Copy Markdown
Author

I have read the DCO document and I hereby sign the DCO.

@alexclewontin
Copy link
Copy Markdown
Author

recheck

Signed-off-by: Alex Lewontin <alex.lewontin@canonical.com>
Custom `--from` VM images were failing in the guest-prep path with stale or incompatible prepared rootfs handling.

Observed failures:
- `EXT4-fs (vdc): write access unavailable, cannot proceed (try mounting with noload)`
- `mount: /image-cache: cannot mount /dev/vdc read-only`
- `FATAL: umoci unpack did not produce rootfs directory`
- `ProcessExited: VM process exited with status 0` hid the guest-side cause

Mount prepared ext4 disks with `ro,noload`, accept both umoci unpack layouts, bump the rootfs cache layout versions so old prepared disks are rebuilt, and include the tail of rootfs-console.log in ProcessExited errors.

Signed-off-by: Alex Lewontin <alex.lewontin@canonical.com>
Strict snap sandbox connect/create shells were still trying to exec the host OpenSSH binary.

Observed failure:
- `apparmor="DENIED" operation="exec" class="file" profile="snap.openshell.openshell" name="/usr/bin/ssh" requested_mask="x" denied_mask="x"`

Bundle `openssh-client` in the snap so the CLI uses the bundled binary under strict confinement.

Signed-off-by: Alex Lewontin <alex.lewontin@canonical.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant