[POC] feat: defragmented snapshots#21776
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: cartermckinnon The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @cartermckinnon. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: Carter McKinnon <cmckinnon@coreweave.com>
e7896d0 to
3d7e6b8
Compare
| func (b *backend) Snapshot() Snapshot { | ||
| s, err := b.snapshot(false, 0) | ||
| if err != nil { | ||
| b.lg.Panic("failed to create snapshot", zap.Error(err)) | ||
| } | ||
| return s | ||
| } |
There was a problem hiding this comment.
I'd prefer to change the signature to return an error, but that touches many call sites and an error is never actually returned, thoughts appreciated 🙏
|
/cc |
|
@hakuna-matatah: GitHub didn't allow me to request PR reviews from the following users: hakuna-matatah. Note that only etcd-io members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
+1 This will be a good improvement. The wire time, bootstrap times will come down for etcd cluster with large DBs, if/when they have to recover from snapshots. |
|
cc: @ahrtr @serathius Please take a look when you guys get a chance. |
|
I am not sure we need such improvement. The client can do just issue a defrag, and followed by a snapshot request. |
|
Thanks for taking a look @ahrtr! In my experience, In a Kubernetes environment, with frequent compactions, there's often a meaningful delta between DB size and in-use size (50+% is not uncommon in my environments, for various reasons). This makes a "defragmented snapshot" beneficial; but the cost of a |
|
If I read it correctly, the result on the wire is functionally equivalent to:
|
This PR adds an option to the
SnapshotRPC of theMaintenanceservice that allows clients to request a "defragmented" snapshot rather than a raw copy of the entire db file.The implementation uses boltdb's
Compact, which is comparable to the backend's internaldefragDbroutine. The (point-in-time) database is defragmented into a temp file, and the temp file is then streamed to the client before being deleted. This does not require an exclusive lock on writes like theDefragmentRPC.This reduces the data sent to the client significantly in my environments (and I believe many Kubernetes environments). This saves storage space for archives and speeds up restores. While this could be achieved with client-side processing, I think it's preferable for the server to handle these implementation details, plus we save the bandwidth. The tradeoff is (temporary) disk space and IO on the server side.
AI Disclosure: I've used Claude to write tests and scaffolding for proof-of-concept; not the core
snapshotfunction.