[POC] feat: defragmented snapshots by cartermckinnon · Pull Request #21776 · etcd-io/etcd

cartermckinnon · 2026-05-20T21:33:33Z

This PR adds an option to the Snapshot RPC of the Maintenance service that allows clients to request a "defragmented" snapshot rather than a raw copy of the entire db file.

The implementation uses boltdb's Compact, which is comparable to the backend's internal defragDb routine. The (point-in-time) database is defragmented into a temp file, and the temp file is then streamed to the client before being deleted. This does not require an exclusive lock on writes like the Defragment RPC.

This reduces the data sent to the client significantly in my environments (and I believe many Kubernetes environments). This saves storage space for archives and speeds up restores. While this could be achieved with client-side processing, I think it's preferable for the server to handle these implementation details, plus we save the bandwidth. The tradeoff is (temporary) disk space and IO on the server side.

AI Disclosure: I've used Claude to write tests and scaffolding for proof-of-concept; not the core snapshot function.

k8s-ci-robot · 2026-05-20T21:33:39Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cartermckinnon
Once this PR has been reviewed and has the lgtm label, please assign spzala for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2026-05-20T21:33:43Z

Hi @cartermckinnon. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: Carter McKinnon <cmckinnon@coreweave.com>

cartermckinnon · 2026-05-20T21:38:30Z

 func (b *backend) Snapshot() Snapshot {
+	s, err := b.snapshot(false, 0)
+	if err != nil {
+		b.lg.Panic("failed to create snapshot", zap.Error(err))
+	}
+	return s
+}


I'd prefer to change the signature to return an error, but that touches many call sites and an error is never actually returned, thoughts appreciated 🙏

hakuna-matatah · 2026-05-20T22:32:08Z

/cc

k8s-ci-robot · 2026-05-20T22:32:11Z

@hakuna-matatah: GitHub didn't allow me to request PR reviews from the following users: hakuna-matatah.

Note that only etcd-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

hakuna-matatah · 2026-05-21T03:36:22Z

+1

This will be a good improvement. The wire time, bootstrap times will come down for etcd cluster with large DBs, if/when they have to recover from snapshots.

hakuna-matatah · 2026-05-21T03:37:51Z

cc: @ahrtr @serathius

Please take a look when you guys get a chance.

ahrtr · 2026-05-21T08:36:14Z

I am not sure we need such improvement. The client can do just issue a defrag, and followed by a snapshot request.

cartermckinnon · 2026-05-21T16:13:45Z

Thanks for taking a look @ahrtr!

In my experience, Defragment can be disruptive. In larger DB's, it can take 10s of seconds, and it's a stop-the-world event on the member. I.e. I want to take snapshots much more frequently than I want to defrag.

In a Kubernetes environment, with frequent compactions, there's often a meaningful delta between DB size and in-use size (50+% is not uncommon in my environments, for various reasons). This makes a "defragmented snapshot" beneficial; but the cost of a Defragment might not be justified (I've seen no performance benefit of doing this regularly and the DB is likely to just reach the same high watermark again).

silentred · 2026-05-22T03:35:16Z

If I read it correctly, the result on the wire is functionally equivalent to: client snapshot + etcdutl defrag on the downloaded db file. The main difference is where the defrag work happens and how many bytes cross the network.

Dimension	Server-side defrag (PR)	Client-side defrag (alternative)
Network transfer	Small (live pages only)	Large (includes freed pages)
Server temp disk	Required (~compacted size)	Not required
Server extra IO/CPU	Required (during `bolt.Compact`)	Not required
Client disk	Small	Large then small (must stage raw db first)
Client CPU	Not required	Required (runs one Defrag)
Write lock on live DB	Not required (uses `bolt.Compact`)	Not required
Final db content	Identical	Identical

k8s-ci-robot added area/clientv3 area/documentation labels May 20, 2026

k8s-ci-robot added area/etcdctl area/testing needs-ok-to-test labels May 20, 2026

k8s-ci-robot added the size/XL label May 20, 2026

feat: defragmented snapshots

3d7e6b8

Signed-off-by: Carter McKinnon <cmckinnon@coreweave.com>

cartermckinnon force-pushed the defragmented-snapshot branch from e7896d0 to 3d7e6b8 Compare May 20, 2026 21:35

cartermckinnon commented May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[POC] feat: defragmented snapshots#21776

[POC] feat: defragmented snapshots#21776
cartermckinnon wants to merge 1 commit into
etcd-io:mainfrom
cartermckinnon:defragmented-snapshot

cartermckinnon commented May 20, 2026 •

edited

Loading

Uh oh!

k8s-ci-robot commented May 20, 2026

Uh oh!

k8s-ci-robot commented May 20, 2026

Uh oh!

cartermckinnon May 20, 2026

Uh oh!

hakuna-matatah commented May 20, 2026

Uh oh!

k8s-ci-robot commented May 20, 2026

Uh oh!

hakuna-matatah commented May 21, 2026 •

edited

Loading

Uh oh!

hakuna-matatah commented May 21, 2026

Uh oh!

ahrtr commented May 21, 2026

Uh oh!

cartermckinnon commented May 21, 2026 •

edited

Loading

Uh oh!

silentred commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

Conversation

cartermckinnon commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented May 20, 2026

Uh oh!

k8s-ci-robot commented May 20, 2026

Uh oh!

cartermckinnon May 20, 2026

Choose a reason for hiding this comment

Uh oh!

hakuna-matatah commented May 20, 2026

Uh oh!

k8s-ci-robot commented May 20, 2026

Uh oh!

hakuna-matatah commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hakuna-matatah commented May 21, 2026

Uh oh!

ahrtr commented May 21, 2026

Uh oh!

cartermckinnon commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

silentred commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

cartermckinnon commented May 20, 2026 •

edited

Loading

hakuna-matatah commented May 21, 2026 •

edited

Loading

cartermckinnon commented May 21, 2026 •

edited

Loading

silentred commented May 22, 2026 •

edited

Loading