demo: lakebox by pietern · Pull Request #5223 · databricks/cli

pietern · 2026-05-08T14:39:01Z

Continuation of #4930.

Lakebox provides SSH-accessible development environments backed by microVM isolation. This adds CLI commands for lifecycle management: - `lakebox auth login` — authenticate to a Databricks workspace - `lakebox create` — create a new lakebox (with optional SSH public key) - `lakebox list` — list your lakeboxes (shows status, key hash, default) - `lakebox ssh` — SSH to your default lakebox (or create one on first use) - `lakebox status <id>` — show lakebox details - `lakebox delete <id>` — delete a lakebox - `lakebox set-default <id>` — change the default lakebox Features: - Default lakebox management stored at ~/.databricks/lakebox.json per profile - Automatic SSH config management (~/.ssh/config) - Public key auth only (password/keyboard-interactive disabled in SSH config) - Creates and sets default on first `lakebox ssh` if none exists

- Remove PubkeyHashPrefix field from lakeboxEntry (no longer returned by API) - Remove KEY column from list output - Remove Key line from status output - Add register-key subcommand for SSH public key registration Co-authored-by: Isaac

…rites - Add 'register' command: generates ~/.ssh/lakebox_rsa and registers with API - Remove 'register-key' command (replaced by 'register') - Remove 'login' command (use 'auth login' + 'register' separately) - SSH command passes options directly as args instead of writing ~/.ssh/config - Check for ssh-keygen availability with helpful install instructions Co-authored-by: Isaac

- Hook into auth login PostRun to auto-generate ~/.ssh/lakebox_rsa and register it after OAuth completes - Fix hook: match on sub.Name() not sub.Use (Use includes args) - Export EnsureAndReadKey and RegisterKey for use by auth hook - Update help text Co-authored-by: Isaac

Everything after -- is passed directly to the ssh process, enabling: lakebox ssh -- echo hello # run command and return lakebox ssh <id> -- cat /etc/os-release lakebox ssh -- -L 8080:localhost:8080 # port forwarding Co-authored-by: Isaac

After 'lakebox auth login --host <url>', the post-login hook now constructs the workspace client directly from the --host/--profile flags instead of using MustWorkspaceClient (which started with an empty config and fell back to the DEFAULT profile). All lakebox commands now use a mustWorkspaceClient wrapper that reads the last-login profile from ~/.databricks/lakebox.json, so 'lakebox ssh' uses the correct profile without requiring --profile on every invocation. Also adds install.sh and upload.sh scripts.

Fix workspace client init after login, persist last profile

Merge kelvich's workspace client fix. Add -- passthrough support so extra args (remote commands, port forwarding, ssh flags) are passed directly to the ssh process. Co-authored-by: Isaac

Single cyan accent color throughout. Bold for IDs, dim for metadata. Braille spinner with elapsed time during async operations. - create: animated spinner during provisioning - list: aligned columns with colored status, cyan bold for running - status: clean field layout - delete: spinner during removal - ssh: spinner during connection - register: spinner during key registration - Shared ui.go with all primitives Co-authored-by: Isaac

The lakebox manager moved its REST surface to a proto-defined service with JSON transcoding (databricks-eng/universe#1839855 + follow-ups). That changed three things this CLI was depending on: 1. JSON field name: each Lakebox message now serializes as `lakeboxId` (proto3 lowerCamelCase default), not `name`. List/status/create were parsing into `Name string \`json:"name"\`` and silently getting the empty string for every entry — the visible symptom was `lakebox list` showing rows with blank ID columns. 2. Status codes: proto-transcoded handlers return 200 OK uniformly. The CLI was checking 201 Created on POST /api/2.0/lakebox and 204 NoContent on DELETE, both of which now look like errors. 3. Key registration moved to its own top-level collection at /api/2.0/lakebox-keys (was /api/2.0/lakebox/register-key), to avoid a path collision with /api/2.0/lakebox/{lakebox_id}. Drop the now-unused `extractLakeboxID` helper — the wire field is the customer-facing ID directly. Verified against dev-aws-us-west-2: list, status, create, delete all work end-to-end. register hits a separate manager-side issue (stale UserKey records in TiDB that the new schema can't deserialize) — not fixed here. Co-authored-by: Isaac

Reynold's restructure (databricks-eng/universe#1874214) nested the two lakebox resources under the service namespace — moving sandboxes from /api/2.0/lakebox to /api/2.0/lakebox/sandboxes and SSH keys from /api/2.0/lakebox-keys to /api/2.0/lakebox/ssh-keys — and renamed the resource type from Lakebox to Sandbox, which surfaces on the wire as sandboxId / sandboxes (was lakeboxId / lakeboxes). CLI still pointed at the old paths and decoded the old field names, so list / status / create returned empty IDs and 404s. Fix both endpoint constants, rename the request/response types and fields to match the proto, and update the four call sites in create / list / ssh / status. User-facing copy ("Lakebox …") is unchanged — the product is still Lakebox; only the resource type renamed. Verified end-to-end against dev-aws-us-west-2: create / list / status / delete all work; ssh passthrough works. Co-authored-by: Isaac

Surfaces the new per-sandbox auto-stop knobs the manager added (databricks-eng/universe#1875183) so users can see at a glance how long their sandbox will live before the watchdog reaps it. - `sandboxEntry` gains pointer fields `IdleTimeoutSecs` and `Persist` so we keep the proto3 explicit-presence semantics ("not in response" vs "explicitly set to 0 / false"). - `autoStopLabel()` collapses the policy to one short token: - `persist == true` → `never` - `idle_timeout_secs > 0` → compact duration (`90s`, `15m`, `2h`, `1h30m`) - otherwise → the manager's global default (10m), rendered explicitly so the column never says `default` - `lakebox list` adds an AUTOSTOP column between STATUS and DEFAULT. - `lakebox status` adds an `autostop` field after `fqdn`. Verified end-to-end against dev-aws-us-west-2 — list and status both render `10m` for sandboxes with no per-record override. Co-authored-by: Isaac

Surfaces the per-sandbox auto-stop knobs the manager added in databricks-eng/universe#1875183 so users can flip them from the CLI instead of curl + JSON. lakebox config <id> --idle-timeout 15m # 15-minute timeout lakebox config <id> --idle-timeout 1h30m # any Go duration lakebox config <id> --idle-timeout 0 # clear → manager default lakebox config <id> --persist # never auto-stop lakebox config <id> --persist=false # back to timeout path lakebox config <id> --idle-timeout 30m --persist=false # combined Implementation notes: - `updateBody` is the inner Sandbox sent in the PATCH body. The proto's `(google.api.http)` declares `body: "sandbox"`, so the HTTP body is the inner `Sandbox` message, NOT a `{"sandbox": {...}}` envelope. First wired-up version got this wrong and the manager rejected with "unknown field `sandbox`" — kept the type comment to flag the gotcha for the next reader. - `IdleTimeoutSecs` carries `,string` JSON tag because proto3 JSON canonical form serializes int64 as a quoted string. The manager accepts both bare-number and quoted-string on input but always emits quoted on output, so without the tag we hit "cannot unmarshal string into Go struct field … int64" on the response read-back. - Pointer fields (`*int64`, `*bool`) carry proto3 explicit-presence through to the wire — only the flags the user actually passed get emitted, so a `--persist`-only invocation does not clobber an existing idle_timeout (and vice-versa). - Client-side range pre-flight (`[60s, 86400s]` plus the 0 clear sentinel) mirrors the manager's `MIN_IDLE_TIMEOUT_SECS` / `MAX_IDLE_TIMEOUT_SECS` constants so users get a clearer error than the server's `INVALID_ARGUMENT`. Verified end-to-end against dev-aws-us-west-2: config --idle-timeout 15m → status shows `15m` config --persist → status shows `never` config --idle-timeout 0 --persist=false → status shows `10m` Co-authored-by: Isaac

Tracks the matching rename in the lakebox manager (databricks-eng/universe#1875183 follow-up). The manager-side flag moved from `persist` to `no_autostop` because the original name conflicted with the storage-persistence concept already in this codebase. CLI changes: --persist → --no-autostop --persist=false → --no-autostop=false Plus a help-text note on the manager's new auto-clear behavior: setting `--idle-timeout` to a non-zero value in a follow-up call clears `--no-autostop` automatically, on the assumption that the caller wants timeout-based stopping back. The CLI itself does not need any extra logic for this — the manager handles it server-side based on field presence in the PATCH body, and the CLI's existing "omit unset flags from the wire payload" semantics (proto3 explicit-presence via *bool / *int64) feed straight into that. Verified the marshal output matches what the new manager expects: --no-autostop → {"sandbox_id":"x","no_autostop":true} --idle-timeout 15m → {"sandbox_id":"x","idle_timeout_secs":"900"} no flags → {"sandbox_id":"x"} (rejected) End-to-end against staging blocked until the manager PR rolls out. Co-authored-by: Isaac

Tracks the matching change in the lakebox manager (databricks-eng/universe#1875183) which moved the per-sandbox idle timeout off `optional int64 idle_timeout_secs = 7` and onto `optional google.protobuf.Duration idle_timeout = 7`. Drops the sentinel-overloaded int64 in favor of a duration-typed field. Wire shape: - Response field is now `idleTimeout` carrying a proto3-canonical Duration string (e.g. `"900s"`); parsed into seconds via `time.ParseDuration` for the autostop column. - Request body sends `idle_timeout` as the same string format. The CLI flag stays `--idle-timeout` (Go duration string in / Go duration string out); only the wire encoding changes. `list` and `status` show the manager's global default for any sandbox whose per-record value isn't yet visible under the new field name — that's deliberate forward-compat behavior so an older manager + newer CLI combination just degrades to showing the default rather than crashing. Co-authored-by: Isaac

- ssh: auto-pick uw2.s.dbrx.dev when the workspace host has `.staging.` in it, otherwise keep using prod uw2.dbrx.dev. `--gateway` still overrides. - api: when the workspace host carries a `?o=<id>` selector or the SDK config has a workspace_id, send `X-Databricks-Org-Id` so multi-workspace gateways (dogfood.staging.databricks.com) route the request to the right workspace. Without it the gateway rejects PATs with "Credential was not sent or was of an unsupported type for this API". Co-authored-by: Isaac

…onments Brings in the original cmd/lakebox/* sources from #4930 with full commit-history attribution. Subsequent commits adapt the standalone CLI into a 'databricks lakebox' subcommand, replace hand-rolled HTTP/spinner/color plumbing with libs primitives, and add unit tests.

Wire the cmd/lakebox tree from #4930 into the main CLI: - cmd/cmd.go registers lakebox.New() under the 'development' command group alongside bundle and sync. - cmd/fuzz_panic_test.go adds 'lakebox' to manualRoots so TestCountFuzz doesn't fuzz hand-written commands as if they were auto-generated. - cmd/lakebox tree: the original PR's standalone-CLI scaffolding is adapted for subcommand use — drop the auth-login hijacking and its helper exports, drop the 'last_profile' state field that only mattered when lakebox owned the whole CLI, switch PreRunE to root.MustWorkspaceClient directly, and update help text from 'lakebox foo' to 'databricks lakebox foo' throughout. Also conforms cmd/lakebox to project lint rules: env.UserHomeDir(ctx) in place of os.UserHomeDir, errors.Is(err, fs.ErrNotExist) instead of os.IsNotExist, atomic.Bool over sync.Once in the spinner gate, errors.New for static error strings. Co-authored-by: Isaac

Replace the hand-rolled braille spinner, TTY detection, and stderr plumbing with the existing cmdio facilities: - spin(ctx, msg) wraps cmdio.NewSpinner — capability-aware, runs through the same Bubble Tea program slot as other CLI spinners. ok/fail markers are logged via cmdio.LogString after Close. - ok(ctx, ...) and warn(ctx, ...) are now ctx-based and route to stderr through cmdio rather than taking a writer. Call sites drop their cmd.ErrOrStderr() locals where they were only used for these helpers. - field/blank still take an io.Writer because callers need to target stdout for structured output (list, status, config). Drops the local isTTY, atomic.Bool spinner gate, and ticker goroutine. Co-authored-by: Isaac

Drops the cyan/bold/dim/reset constants and the local accent/bold/dim wrappers in favor of cmdio.Cyan and cmdio.HiBlack, which respect the SupportsStdoutColor capability check. Bold-for-emphasis is folded into Cyan since cmdio does not expose a Go-level Bold helper today; visually this means lakebox IDs and emphasized command names render in cyan rather than uncolored bold, consistent with the rest of the CLI. field/status now take a context so they can call cmdio.HiBlack / cmdio.Cyan; their writer parameter stays for callers that target stdout. Co-authored-by: Isaac

`databricks lakebox delete <id>` was immediate-execute, which made it easy to nuke the wrong sandbox in scripts and surprised users in the recent bug bash. Adopt the same shape `cmd/bundle/destroy` uses: - Default: prompt with the sandbox ID, name (if set), and current status, so the user sees exactly what's about to be destroyed. - --auto-approve: skip the prompt. Required in non-interactive contexts; without it, the CLI fails fast pointing at the flag rather than hanging on a closed/redirected stdin. `delete` is the only destructive lakebox verb; stop/start/config are reversible and unchanged. Co-authored-by: Isaac ## Changes  ## Why  ## Tests

…#5359) The existing internal/bugbash/exec.sh drops you into an ephemeral subshell whose binary disappears when you exit. For an actual bugbash session — where the same person opens and closes terminals across an hour or two of testing — every shell pays the ~17s download cost again. This companion script installs the latest demo-lakebox CI artifact to ~/.local/bin/databricks, so the binary survives shell restarts. Re-running picks up the latest successful release-build run, making it the one-liner to "refresh to the latest demo-lakebox snapshot." Usage on the laptop: curl -fsSL https://raw.githubusercontent.com/databricks/cli/demo-lakebox/internal/bugbash/install.sh | bash Co-authored-by: Isaac ## Changes  ## Why  ## Tests

Expose Go-callable wrappers for SGR styles already present as constants in `libs/cmdio/color.go`: `Bold` (1), `Faint` (2), `Italic` (3), `Underline` (4), and `Magenta` (35). Helpers are named after the SGR descriptor of their constant — so `ansiFaint` → `Faint`, matching the rest of the file. Doc comments use the SGR table phrasing so `Bold` and `Faint` read as paired intensities. Also adds `faint` and `underline` bindings to `RenderFuncMap`. Related to #5223. This pull request and its description were written by Isaac.

Wire `ValidArgsFunction` on every lakebox subcommand that takes a positional argument so shells with cobra completion sourced suggest real IDs instead of falling back to filename completion: - ssh / status / stop / start / delete / config / set-default — sandbox IDs - ssh-key delete — registered key hashes Surfaced through the existing `databricks completion bash|zsh|fish` subcommand. Completion is best-effort — silently returns no suggestions on any error so the shell stays usable. Stacked on top of #<PR-A-NUM> (start + ssh fail-fast) since `start` needs to participate in completion too. Co-authored-by: Isaac ## Changes  ## Why  ## Tests

`databricks lakebox ssh my-project` and the same shape on status / stop / start / delete / config / default now accept a `--name` value in place of a sandbox ID, resolved purely from local state — no extra API call on the slow path. How it works: - The state file (~/.databricks/lakebox.json) grows a per-profile `sandboxes` list of (id, name) pairs. - `resolveLocalID` maps the user-typed arg to an ID: 1. Exact ID match → return as-is (fast path; preserves the existing "I know the ID" workflow). 2. Exact name match in cache → return the corresponding ID. Ambiguous names (server allows duplicate --name values) error out with the candidate IDs. 3. No match → pass arg through unchanged. Either it's an ID the cache hasn't seen yet (fresh sandbox, user hasn't `list`ed) or a typo; either way the next API call surfaces a clean 404. The cache is populated incrementally by the commands that already observe sandbox state: - `create` upserts the new sandbox. - `list` replaces the cache in full. - `status` / `stop` / `start` / `config` / `default` upsert the one entry they touch. - `delete` removes it. A stale cache can only fail to find a name; it can never cause the CLI to operate on the wrong sandbox. Resolution is ID-first so a sandbox whose `--name` happens to collide with another sandbox's ID never gets mistakenly matched by name. Co-authored-by: Isaac ## Changes  ## Why  ## Tests

## Changes  ## Why  ## Tests

## Summary Two CLI fixes from Anwell's bug-bash form submissions (the third was the `-o json` issue, already fixed in #5372 and verified live; the fourth — `stop` printing `✓ Stopped` on already-stopped — skipped intentionally). ### `273d68a3e` — measure NAME column widths in terminal cells `lakebox list` and `ssh-key list` padded NAME columns using `len()` (bytes). Emoji and CJK glyphs are 1 rune / multi-byte / **2 cells**, so rows with `🚀 rocket box` or `测试盒子` in NAME shifted STATUS / AUTOSTOP / DEFAULT out of alignment. Switched padding math to `runewidth.StringWidth` (East-Asian-Width-correct). Promoted `mattn/go-runewidth` from indirect to direct require + NOTICE entry. ### `d6e585163` — `--idle-timeout` errors echo in Go-duration units `--idle-timeout 25h` was rejected with `between 60s and 86400s, got 90000s` — every number in a different unit than what the user typed, even though `--help` already documented bounds as `60s to 24h`. Routed through the existing `formatDurationSecs` helper so error reads `between 1m and 24h, got 25h`. Also tweaked `--help` lower bound `60s` → `1m` so both strings agree. ## Test plan - [x] `lakebox list` with emoji `--name` aligns cleanly - [x] `lakebox list` with CJK `--name` aligns cleanly - [x] `--idle-timeout 25h` error: `between 1m and 24h, got 25h` - [x] `--idle-timeout 30s` error: `between 1m and 24h, got 30s` - [x] `--idle-timeout 90m` (valid) — no error - [x] license_test (NOTICE allowlist) passes - [x] all existing unit tests pass

## Summary Four CLI fixes driven by the bug-bash form submissions. (The `-o json` issue Anwell reported was already fixed in #5372; verified live.) | Commit | Source | What | |---|---|---| | `273d68a3e` | Anwell | `lakebox list` columns mis-align with emoji / CJK names. Measure widths in terminal cells via `runewidth.StringWidth`. | | `d6e585163` | Anwell | `--idle-timeout` errors echo raw seconds (`86400s` / `90000s`) while `--help` uses durations (`24h`). Route bounds and value through `formatDurationSecs` so both match. | | `9e315d71c` | Mitch | Deleting the last sandbox on a profile left an orphan `gatewayHosts.<profile>` entry in `~/.databricks/lakebox.json`. `removeSandbox` now drops the gateway when the sandbox list goes to zero. | | `6d32203ce` | Mitch + tsanyu | `lakebox start` returned ✓ instantly while the box was still Creating; the next `ssh` would block on the same cold-start. `start` now polls `api.get` until Running (or 10 min) — symmetric with `create`. | ## Skipped (per discussion) - **#6** (`stop` on already-stopped lies) — left as-is. ## Filed elsewhere Server / network / sandbox-image: Codex bubblewrap missing, pip JSON truncation, universe IP-allowlist, in-sandbox Provisioning banner, Cursor deep-link port mis-parse, gateway publickey advertising, ESM token-minting logs/failure. Tracking under their respective epics, no CLI work. ## Test plan - [x] `lakebox list` with emoji `--name` aligns - [x] `lakebox list` with CJK `--name` aligns - [x] `lakebox start <stopped-id>` blocks until Running (verified live) - [x] `lakebox start <stuck-id>` times out cleanly at 10m - [x] All existing tests pass

Yunquan flagged on the bug-bash form: the table shape changed between calls — the NAME column appeared the moment any sandbox had a custom `--name` set, and vanished when none did. Scripts that parsed `list` output had to handle two column layouts; users had to mentally remap columns based on workspace state. Always render the column. Sandboxes without a custom name (Name == "" or Name == SandboxID) display `-` (faint), same convention we already use elsewhere. The column width still scales to the longest *actual* name, so workspaces with only unnamed sandboxes render a narrow NAME column of dashes — visually quiet but structurally present. Co-authored-by: Isaac ## Changes  ## Why  ## Tests

shuochen0311 and others added 16 commits April 10, 2026 18:20

Merge pull request #1 from kelvich/lakebox-cli

81e6f6f

Fix workspace client init after login, persist last profile

Merge fork changes + add SSH passthrough args support

ebda5a0

Merge kelvich's workspace client fix. Add -- passthrough support so extra args (remote commands, port forwarding, ssh flags) are passed directly to the ssh process. Co-authored-by: Isaac

pietern temporarily deployed to test-trigger-is May 8, 2026 14:39 — with GitHub Actions Inactive

pietern temporarily deployed to test-trigger-is May 8, 2026 14:40 — with GitHub Actions Inactive

pietern changed the title ~~demo: lakebox subcommand~~ demo: lakebox May 8, 2026

pietern temporarily deployed to test-trigger-is May 8, 2026 14:51 — with GitHub Actions Inactive

pietern force-pushed the demo-lakebox branch from 756920e to 3cef58d Compare May 8, 2026 14:52

pietern temporarily deployed to test-trigger-is May 8, 2026 14:53 — with GitHub Actions Inactive

pietern added 4 commits May 8, 2026 17:01

akshaysingla-db temporarily deployed to test-trigger-is May 28, 2026 04:06 — with GitHub Actions Inactive

akshaysingla-db temporarily deployed to test-trigger-is May 28, 2026 05:24 — with GitHub Actions Inactive

Merge remote-tracking branch 'origin/main' into demo-lakebox

b14f692

pietern mentioned this pull request May 28, 2026

cmdio: add Bold, Faint, Italic, Underline, Magenta helpers #5360

Merged

Merge remote-tracking branch 'origin/main' into demo-lakebox

221bae3

pietern temporarily deployed to test-trigger-is May 28, 2026 13:37 — with GitHub Actions Inactive

akshaysingla-db temporarily deployed to test-trigger-is May 29, 2026 06:56 — with GitHub Actions Inactive

akshaysingla-db temporarily deployed to test-trigger-is May 29, 2026 07:01 — with GitHub Actions Inactive

akshaysingla-db temporarily deployed to test-trigger-is May 29, 2026 07:54 — with GitHub Actions Inactive

akshaysingla-db temporarily deployed to test-trigger-is May 29, 2026 17:42 — with GitHub Actions Inactive

akshaysingla-db temporarily deployed to test-trigger-is May 29, 2026 20:12 — with GitHub Actions Inactive

akshaysingla-db temporarily deployed to test-trigger-is May 29, 2026 22:16 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demo: lakebox#5223

demo: lakebox#5223
pietern wants to merge 62 commits into
mainfrom
demo-lakebox

pietern commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

pietern commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pietern commented May 8, 2026 •

edited

Loading