Skip to content

PR Draft: membership: fix SHA-1 panic under GODEBUG=fips140=only#21804

Open
herosql wants to merge 1 commit into
etcd-io:mainfrom
herosql:fix/resolve-issue-21673
Open

PR Draft: membership: fix SHA-1 panic under GODEBUG=fips140=only#21804
herosql wants to merge 1 commit into
etcd-io:mainfrom
herosql:fix/resolve-issue-21673

Conversation

@herosql
Copy link
Copy Markdown

@herosql herosql commented May 24, 2026

PR Draft: membership: fix SHA-1 panic under GODEBUG=fips140=only

Closes #21673


Title

membership: replace SHA-1 with runtime-selected hash for FIPS 140-3 compatibility

Description

Problem

etcd panics on startup when launched with GODEBUG=fips140=only on Go 1.24+. Go's strict
FIPS 140-3 mode prohibits any call to crypto/sha1, causing an immediate panic during cluster
membership initialization — before a single Raft log entry is written.

Two call sites are affected:

  • server/etcdserver/api/membership/member.go:74computeMemberID() hashes peer URLs and
    cluster token to derive a deterministic uint64 member ID
  • server/etcdserver/api/membership/cluster.go:237genID() hashes the sorted set of
    member IDs to derive the cluster ID

Both functions use SHA-1 purely as a deterministic hash function — not for any cryptographic
or integrity purpose — but Go's FIPS runtime does not distinguish by intent.

Root Cause

panic: crypto/sha1: use of SHA-1 is not allowed in FIPS 140-only mode

goroutine 1 [running]:
crypto/sha1.Sum(...)
    crypto/sha1/sha1.go:278
go.etcd.io/etcd/server/v3/etcdserver/api/membership.computeMemberID(...)
    server/etcdserver/api/membership/member.go:74

The panic fires unconditionally at process startup for any cluster that does not yet have a
WAL — new bootstraps, member additions, and snapshot restores all reach computeMemberID
before any other initialization proceeds.

Why a naive SHA-256 replacement is wrong

Simply substituting sha256.Sum256 would change the ID values produced during bootstrap.
Member IDs and cluster IDs are written into the WAL metadata header and persisted in the
members BoltDB bucket on first boot. Two existing sentinel tests encode the exact expected
outputs:

  • TestMemberTime asserts specific uint64 values derived from SHA-1 inputs
  • TestSnapshotStatus asserts a CRC32 over the full BoltDB content (0xe7a6e44b), which
    includes the members bucket keyed by SHA-1-derived member IDs

Both tests carry the explicit comment "must not be changed — if it changes, there must be
some backwards incompatible change introduced."
A global algorithm swap violates this
invariant for non-FIPS deployments even though they have no FIPS requirement.

Solution

Use crypto/fips140.Enabled() (public API since Go 1.24, crypto/fips140 package) to
select the hash at runtime:

var b8 [8]byte
if fips140.Enabled() {
    h := sha256.Sum256(b)
    copy(b8[:], h[:8])
} else {
    h := sha1.Sum(b)
    copy(b8[:], h[:8])
}
return types.ID(binary.BigEndian.Uint64(b8[:]))

In non-FIPS mode the code path is identical to before — same algorithm, same output, same
truncation to 8 bytes. Neither sentinel test is affected. In FIPS mode SHA-256 is used and
the panic is eliminated.

The two sentinel tests themselves are updated to skip under fips140.Enabled(), since their
hardcoded expected values are inherently SHA-1-derived and have no meaning in a context where
SHA-1 is prohibited. The t.Skip preserves the tests' protective value for all non-FIPS
configurations.

Backward Compatibility

Scenario Impact
Existing cluster restart (WAL present) None — bootstrap.go:658 reads NodeID/ClusterID directly from WAL metadata; computeMemberID and genID are never called
New cluster bootstrap, non-FIPS None — SHA-1 path unchanged
New cluster bootstrap, FIPS SHA-256-derived IDs; compatible within a homogeneous FIPS cluster
MemberAdd at runtime None — new member computes and advertises its own ID; cluster accepts any uint64
etcdutl snapshot restore None — creates a fresh data directory; no prior IDs to match
Mixed FIPS / non-FIPS bootstrap Not supported — nodes must use the same mode during initial cluster formation, as each node independently computes member IDs from the same inputs. A heterogeneous bootstrap will produce divergent IDs and fail to reach quorum. This constraint existed implicitly before; this change makes it observable.

The storage format (uint64 persisted as a 16-character hex key in BoltDB) is unchanged.
No WAL format changes. No protocol changes.

Testing

Reproduction tests (server/etcdserver/api/membership/fips_repro_test.go):

# Normal mode — both pass, SHA-1 path exercised
go test -run TestFIPS ./server/etcdserver/api/membership/

# FIPS mode — both pass, SHA-256 path exercised, no panic
GODEBUG=fips140=only go test -run TestFIPS ./server/etcdserver/api/membership/

Full unit regression — 36 packages, both modes:

# Non-FIPS: 36/36 ok
go test -count=1 ./server/... ./etcdutl/... ./etcdctl/... ./client/... ./api/...

# FIPS:     36/36 ok  (TestMemberTime and TestSnapshotStatus correctly skip)
GODEBUG=fips140=only go test -count=1 ./server/... ./etcdutl/... ./etcdctl/... ./client/... ./api/...

Files Changed

File Change
server/etcdserver/api/membership/member.go Import crypto/fips140, crypto/sha256; runtime branch in computeMemberID
server/etcdserver/api/membership/cluster.go Same pattern in genID
server/etcdserver/api/membership/member_test.go Skip TestMemberTime under fips140.Enabled()
etcdutl/snapshot/v3_snapshot_test.go Skip TestSnapshotStatus under fips140.Enabled()
server/etcdserver/api/membership/fips_repro_test.go New: reproduction tests for both panic sites

…ompatibility

Fixes etcd-io#21673 - panic when etcd starts with GODEBUG=fips140=only

Use crypto/fips140.Enabled() to select hash at runtime:
- Non-FIPS mode: SHA-1 path unchanged (backward compatible)
- FIPS mode: SHA-256 only, no panic

Both computeMemberID() and genID() avoid calling sha1.Sum when FIPS is enabled.

Sentinel tests (TestMemberTime, TestSnapshotStatus) skip under FIPS mode
since their expected values are SHA-1-derived.

New reproduction tests in fips_repro_test.go for both panic sites.
@k8s-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: herosql
Once this PR has been reviewed and has the lgtm label, please assign serathius for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Copy Markdown

Hi @herosql. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@herosql herosql force-pushed the fix/resolve-issue-21673 branch from 66b4299 to 8c12112 Compare May 24, 2026 10:32
@herosql herosql force-pushed the fix/resolve-issue-21673 branch from 8c12112 to 66b4299 Compare May 24, 2026 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

membership: panic on startup with GODEBUG=fips140=only due to SHA-1 in cluster/member ID derivation

2 participants