How we validate a build end-to-end before release, why that needs a dedicated server, what GitHub Actions does for free on every PR, and how to harden the server so it can't become a liability. This is the operational companion to Releasing (the version/promote pipeline) and Integration Testing (the harness it runs).
GitHub-hosted runners can't do the real-chain tier. On a public repo the hosted Ubuntu runners are generous and free (4 vCPU / 16 GiB RAM), but they are ephemeral — a fresh VM per job, ~14 GiB of free disk, and a 6-hour job ceiling. A Monero chain is ~95 GiB pruned / ~270 GiB full and takes days to sync; Tari adds ~50 GiB. There is nowhere to keep that synced state between runs, and no time to sync it inside a job. So the real-daemon, real merge-mining tier (tier 4) is simply not possible on hosted runners — which is the whole reason a dedicated, already-synced server exists (#54).
But GitHub already runs almost everything else, free, on every PR. Tiers 1–3 of the testing strategy need no real chain and run on the hosted runners in minutes:
- Tier 1 — unit/component (dashboard pytest + coverage gate, frontend, the
pitheadshell suite, compose interpolation and the #90 security/hardening invariants). - Tier 2 — contract (the real Monero/Tari clients vs. controllable fakes).
- Tier 3 — the fake-daemon mini-stack (the real dashboard + docker-control proxy driven against fake daemons, with real Docker on the hosted runner) — this proves the control plane end-to-end (sync hold/release, reject/readmit) on every PR.
So the split is clean:
| Runs | Cost | Triggered | |
|---|---|---|---|
| Tiers 1–3 (logic, wiring, control plane, hardening) | GitHub-hosted runners | free (public repo) | every PR — the merge gate |
| Tier 4 (real synced Monero+Tari, real merge-mining, prune/full DB, TLS/Tor, the config matrix, the staging smoke test) | the dedicated server | your hardware | pre-release / on-demand — the release gate |
The hosted runners catch the vast majority of regressions before merge; the dedicated server proves the things only reality can — and it's the blocking pre-release gate.
You can register the server as a GitHub Actions self-hosted runner so Actions dispatches the tier-4 job to it (self-hosted minutes don't count against anything — also free). But there is a sharp edge, and it's the single most important thing on this page:
GitHub explicitly recommends against self-hosted runners on public repositories. Any user can open a pull request, and a malicious PR can run arbitrary code on the runner. Our server holds real wallet payout addresses, Tor onion private keys, and RPC credentials, so a compromised runner is a key-theft / persistent-backdoor event, not a flaky build.
The safe rule: the keyed server only ever runs code we trust. Concretely:
- Do NOT trigger tier-4 on
pull_request(and never on a fork PR). "Require approval" only gates starting the run — once it starts, the PR's code still executes on the box. - Trigger tier-4 only on trusted code:
workflow_dispatch(a maintainer manually runs it on a ref they've reviewed) and/orpushtomain(post-merge). To E2E a specific fork PR, a maintainer reviews it first, then dispatches the workflow on that ref. - Register the runner as ephemeral / just-in-time (one job, then auto-removed) in its own runner group, isolated from any private repos.
- Keep the runner least-privilege: a dedicated unprivileged user, the box runs nothing else
sensitive, and ideally the runner can reach the stack only through
pithead/docker, not the raw key files.
This is exactly how the workflow ships:
.github/workflows/release-gate.yml runs only on
workflow_dispatch (and push to main) on a [self-hosted, pithead-release] runner — never
automatically on a PR.
Target an LTS Ubuntu (22.04 / 24.04). One-time:
- Install Pithead and let it fully sync (Getting Started) — full
Monero + full Tari, all containers healthy, a worker (ideally two) mining. The synced
monero.data_dir/tari.data_dirare the asset the harness reuses. - Keep the active chain on fast storage (SSD/NVMe). monerod is random-I/O heavy, so the
chain it runs against must not sit on a spinning HDD — that alone makes every scenario crawl.
A snapshot/reflink-capable filesystem (btrfs/zfs/xfs reflink) is a bonus: it lets
the harness snapshot/restore a chain cheaply for the prune axis. But it's optional — on plain
ext4-on-SSD the matrix only edits
config.jsonand reuses one chain, with--safety-backupisolating destructive runs. See the recipe below for the prune-axis details. - Disk headroom — enough for the chains plus a snapshot / second DB (budget ≥ ~150 GiB free beyond the live chains).
- Tools —
jq,curl,docker(compose v2),sha256sum,git,tar.
Check the box is fit at any time, non-destructively:
tests/integration/run.sh --host you@server --dir pithead --readinessIt asserts: chains synced (reusable), the prune axis is exercisable (the live chain FS is
snapshot-capable or a pre-built variant chain is supplied), disk headroom, .env is
owner-only, the dashboard is bound to localhost, and the backup/rollback net is usable.
Put the active chain on fast storage. The biggest factor is the disk, not the filesystem: monerod does heavy random LMDB I/O, so a chain on a 7200 rpm HDD makes every scenario crawl. Check what you have before placing chains:
lsblk -d -o NAME,ROTA,SIZE,MODEL # ROTA=0 is SSD/NVMe, ROTA=1 is a spinning HDDKeep the chain monerod runs against on an SSD/NVMe. A spare HDD is fine for cold backups
and pithead backup archives — but not for an active test chain.
A CoW filesystem (btrfs/zfs/xfs-reflink) is a bonus, not a requirement. On a CoW volume the
harness can snapshot/restore a chain cheaply for per-scenario isolation — but only if it's on
fast storage. A loopback btrfs on a spare HDD gives you CoW semantics at HDD speed, which is the
wrong trade for an active chain. If your root FS is ext4 on an SSD (the common case) you don't
need CoW at all: the matrix only edits config.json and reuses one chain, and --safety-backup
(a pithead backup + auto-rollback) isolates the destructive scenarios.
Covering both prune modes. The box mines one mode (its real config). The harness exercises
that mode against the live chain and skips the other unless you supply a chain for it
(--full-data-dir / --pruned-data-dir). You usually don't need to: the opposite mode is
covered by the fake mini-stack (integration-testing) plus the
compose/config tests, which need no real chain. Supply the opposite-mode chain only to exercise
it end-to-end — and build it on fast storage:
- Pruned chain next to a full one?
build-pruned-chain.shcopies the LMDB consistently (brief monerod stop, then immediate restart) and prunes the copy, leaving the canonical chain untouched. Fetchmonero-blockchain-pruneat the same version as the running monerod and verify it against the hash the image pins (build/monero/Dockerfile→MONERO_VERSION/MONERO_HASH). - Full chain? Pruning is irreversible, so a full chain means a fresh full sync
(
MONERO_PRUNE=0, ~1–3 days) — rarely worth it just for test coverage.
gouda (the reference box) is a pruned node on NVMe: it validates pruned mode live with
--safety-backup, and full mode comes from the fakes. --readiness reports exactly this:
tests/integration/run.sh --host you@server --dir pithead --readinessGotcha — a pruned chain's file stays large. An in-place prune does not shrink the LMDB file: it stays at the full-chain high-water mark (~250 GiB) with the freed space sitting as internal free pages (Monero reuses them as the chain grows). To actually reclaim it you must rewrite the DB with
monero-blockchain-prune --copy-pruned-database(seecompact-chain.sh) — slow (it copies every block over hours), though it reads through a snapshot so monerod keeps mining; you then swap the compact copy in during a ~2 min window. The genericmdb_copy -cdoes not work: Monero ships a patched LMDB and stock mdb_copy rejects the format (MDB_VERSION_MISMATCH). Often it's simplest to leave the free pages.
Treat the box as production-sensitive — it holds keys and it's the thing that signs off releases.
- Secrets.
.env(RPC creds),config.json(wallet addresses), and the Tor data dir (onion private keys) must be owner-only (chmod 600 .env; the--readinesscheck verifies this). Never print secrets in logs; the harness hashes them on the box and redacts artifacts. If the box also publishes releases, the GHCR token lives in the environment / a secret store, never in the repo. - Network. Firewall to least exposure: inbound SSH (key-only, no root login, fail2ban)
and the stratum port scoped to the LAN (workers › firewall); the
dashboard stays on localhost behind Caddy and the monerod RPC on localhost (both
asserted by
--readiness). Nothing else should be reachable from the internet. - Untrusted code. The runner only runs trusted code (see above). Prefer ephemeral/JIT runners; don't share the runner with private repos.
- Least privilege. A dedicated unprivileged user; the stack already runs least-privilege
containers (
no-new-privileges,cap_drop, read-only roots, scoped Docker socket proxies — regression-guarded intests/stack/test_compose.sh). - Reproducible, clean baseline. The matrix reuses the synced chains and never mutates the
canonical copies (config-only changes, snapshot/restore for the prune axis), restores the
original
config.jsonat the end, and--safety-backuptakes apithead backupfirst and rolls the box back (down → restore → up) if anything fails. - Build isolation & integrity. Build images in containers with pinned upstream versions and SHA256-verified binaries (the stack already does this); promote releases by digest so the published bundle is bit-for-bit what was validated (Releasing).
- Every PR → GitHub-hosted runners run tiers 1–3 (the merge gate). Cheap, free, fast.
- Pre-release (or on-demand for a reviewed PR) → a maintainer dispatches the release-gate
workflow on the dedicated server:
make test(tiers 1–2 on the trusted box) + the tier-4 live matrix against the real synced nodes (run.sh --safety-backup), then — per Releasing — the staging smoke test (pull the GHCR images on a clean host, realsetup → up → status → minecheck). - Nothing is tagged or published until that's green, and promotion is by digest, so the version users get is the exact bundle the server validated.
What the live tier-4 gate actually exercises, and what it doesn't — so a release decision is made
with eyes open. (The reference box gouda is a pruned Monero node on NVMe; its own snapshot
and this table also live at ~/pithead-testbench/ on the box, for operators and AI agents.)
Validated live (real synced chains): the config matrix (remote/local node, dashboard
secure/insecure, Tari required/optional, RPC LAN access, XvB on/off) applied + asserted; lifecycle
(restart, secret-preserving apply, backup→restore round-trip); node-down failover → recovery;
release readiness; pruned monerod (the real prod config). Covered without a real chain
(tiers 1–3): client↔daemon contract tests, the fake-daemon mini-stack (incl. full-prune behavior),
compose hardening, config rendering, dashboard tests.
| Gap (not tested live) | Worth filling before release? |
|---|---|
| Full (unpruned) Monero live — a pruned box can't exercise it | Low — stack paths don't differ by prune mode; fakes/config cover it. A multi-day full sync isn't justified. |
| Privacy / Tor egress — no clearnet-leak assertions in the live harness (#160) | High — privacy is a core promise. Add egress checks (no clearnet to XvB stats, p2pool, Tari DNS). |
| Automated PR gate — the self-hosted runner is manual/opt-in | Medium-high, high-leverage — wire the live harness as a required check on workflow_dispatch/push-to-main only (never fork PRs). |
| Upgrade / migration across image versions with chain continuity | Medium — add a scenario: pull new images → apply → assert no re-sync + secrets intact. |
| XvB live routing end-to-end (the raffle optimization) | Medium — core value-prop but unit/sim-tested today; a periodic live smoke test would help. |
| Multi-worker scale — the harness assumes ~2 workers | Medium — add a load-gen worker + assert proxy routing/hashrate for perf confidence. |
| Real Tari merge-mined block acceptance | Low — probabilistic; rely on template/connectivity checks. |
| Fault injection over SSH (currently local-mode only) | Low-Medium — extend the SIGSTOP/remove cases to the --host path. |
Recommended before release: the privacy-egress checks and the automated PR gate; then the upgrade scenario and an XvB live smoke test. The remainder are nice-to-have.