Support RDMA handshake V3#3326
Open
chenBright wants to merge 1 commit into
Open
Conversation
Background
==========
The legacy v2 handshake ("RDMA" magic + 36B fixed binary HelloMessage)
had correctness bugs that made the wire format effectively
unevolvable.
v2 bugs
=======
A. Client never drained the "unknown tail" bytes when a peer sent
msg_len > HELLO_MSG_LEN_MIN(40). Leftover bytes stayed in the
socket recv buffer and silently corrupted the next ReadFromFd
(the ACK).
B. Server had the symmetric version of A.
C. Server computed the body read length from its LOCAL
g_rdma_hello_msg_len, implicitly assuming the peer's hello is the
same length as its own. A longer peer left bytes behind; a shorter
peer made the read block. Client used the correct compile-time
constant; the two sides were not symmetric.
Combined, A/B/C meant v2 could not safely append a single byte to the
hello -- even an "optional hint" appended by a newer sender would
mis-align an older receiver's next read.
v3 design
=========
Wire format, magic-namespace-isolated from v2:
[ "RDM3" 4B ][ pb_size 4B big-endian ][ RdmaHello protobuf bytes ]
with pb_size in (0, 4096]. RdmaHello carries the same 6 base fields
as v2 plus room to append future capabilities.
Why protobuf (and not "v2 plus length prefix")
----------------------------------------------
- Variable-length fields are coming. Future capability fields will
include strings (rdma_device_name, netdev_name, ...) and other
variable-length data. Supporting them on a fixed-binary protocol
forces us to invent and maintain a TLV layer (per-field type +
length + value framing, plus version-aware deserialization). That
is reimplementing protobuf badly. Using protobuf from day one
costs nothing and is the canonical answer.
- Fixes v2 bug A/B/C generically: pb_size makes the wire
self-describing, so the receiver never needs to guess the length
or know the peer's schema version to read the body cleanly.
- Append-only field evolution out of the box: new optional fields
cost old receivers nothing -- they're skipped as unknown protobuf
fields. v2 with a hand-rolled length prefix would still need
per-field opt-in code on every side.
- Built-in validation: ParseFromArray fails fast on malformed input;
required-field presence is enforced at the parse layer, not by
ad-hoc has_xxx() checks scattered through wire code.
- bRPC already depends on protobuf -- no new build dependency.
Why a NEW MAGIC rather than a version field inside protobuf
-----------------------------------------------------------
- Forces "breaking change" to be a deployment decision visible at
the wire level. You cannot accidentally ship a backwards-
incompatible patch via a field-semantics tweak.
- Server-side dispatch routes by magic to fully independent state
machines that can't entangle (no `if (version == X)` branches
anywhere -- this is the abstraction's red line).
- Any future breaking change bumps the magic ("RDM4", "RDM5", ...).
v3 fields, once shipped, never change semantics.
Rollout
=======
Server-side ALWAYS accepts both v2 and v3 (no gflag, no kill-switch);
magic routes to fully independent code paths. A single rolling upgrade
enables v3 fleet-wide.
Client-side picks the wire protocol via gflag with a safe default:
FLAGS_rdma_client_handshake_version (default 2)
2 = "RDMA" legacy (zero-regression default)
3 = "RDM3" protobuf (opt-in once target servers support v3)
Sub-second rollback is one flag flip away. v3 client to v2-only legacy
server is NOT guaranteed to transparently fall back on the same
connection -- the supported migration is "upgrade servers first, then
opt-in clients".
Contributor
There was a problem hiding this comment.
Pull request overview
Introduces a new RDMA handshake protocol ("v3") using a "RDM3" magic + length-prefixed protobuf RdmaHello, refactors the handshake state machine out of RdmaEndpoint into pluggable RdmaHandshake subclasses, and fixes long-standing v2 hello bugs (oversized/undersized msg_len, untrained trailing bytes). Server accepts both v2 and v3 by magic dispatch; clients pick the version via FLAGS_rdma_client_handshake_version (defaults to 2 for backward compatibility).
Changes:
- Add
RdmaHelloprotobuf +RdmaHandshake{Client,Server}V{2,3}classes, splitv2_wire/v3_wirehelpers, and wire-format-agnosticParsedHellotranslation; addRdmaEndpoint::ApplyRemoteHello, newReadFromFd(IOPortal*)/WriteToFd(IOBuf*)overloads, and template-based EAGAIN read/write loops. - Tighten v2 server validation (reject
msg_len > 4096, drain valid tails), centralize ACK constants, and improve fallback/log messages including ahandshake_versiondebug field. - Add v2/v3 byte-level baseline tests,
RdmaRpcTestparameterized fixture run under both versions, fixServer::StartInternalso a failed validation no longer leaks user-owned pointers into_options, and update CI/build configs to compile the new.proto.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/brpc/rdma/rdma_handshake.{h,cpp,proto} | New abstract handshake hierarchy + v2/v3 wire helpers and protobuf definition. |
| src/brpc/rdma/rdma_endpoint.{h,cpp} | Endpoint state machine now delegates wire handling, adds IOBuf/IOPortal IO helpers and ApplyRemoteHello. |
| src/brpc/socket.{h,cpp} | Friend declarations for new handshake classes; minor formatting. |
| src/brpc/server.cpp | Validate ServerOptions before copying to avoid taking ownership on failure paths. |
| src/brpc/rdma_transport.h | Adjust friend declarations for new handshake classes. |
| test/brpc_rdma_unittest.cpp | Migrate tests to new v2_wire::* names, add v3 baseline/error tests and parameterized RdmaRpcTest. |
| test/bvar_percentile_unittest.cpp | Gate add test under !WITH_BABYLON_COUNTER. |
| src/butil/thread_key.h | Add missing <algorithm> include. |
| CMakeLists.txt / BUILD.bazel / .bazelrc | Compile the new proto, add rdma bazel config. |
| .github/workflows/ci-linux.yml, .github/actions/install-all-dependencies/action.yml | Install libibverbs-dev, line-wrap commands, enable rdma config in babylon CI. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: resolve
Problem Summary:
The legacy v2 handshake ("RDMA" magic + 36B fixed binary HelloMessage)
had correctness bugs that made the wire format effectively
unevolvable.
msg_len > HELLO_MSG_LEN_MIN(40). Leftover bytes stayed in the
socket recv buffer and silently corrupted the next ReadFromFd
(the ACK).
g_rdma_hello_msg_len, implicitly assuming the peer's hello is the
same length as its own. A longer peer left bytes behind; a shorter
peer made the read block. Client used the correct compile-time
constant; the two sides were not symmetric.
Combined, 1/2/3 meant v2 could not safely append a single byte to the
hello -- even an "optional hint" appended by a newer sender would
mis-align an older receiver's next read.
What is changed and the side effects?
Changed:
v3 handshake is designed which is wire format, magic-namespace-isolated from v2:
with pb_size in (0, 4096]. RdmaHello carries the same 6 base fields
as v2 plus room to append future capabilities.
Why protobuf (and not "v2 plus length prefix")
include strings (rdma_device_name, netdev_name, ...) and other
variable-length data. Supporting them on a fixed-binary protocol
forces us to invent and maintain a TLV layer (per-field type +
length + value framing, plus version-aware deserialization). That
is reimplementing protobuf badly. Using protobuf from day one
costs nothing and is the canonical answer.
self-describing, so the receiver never needs to guess the length
or know the peer's schema version to read the body cleanly.
cost old receivers nothing -- they're skipped as unknown protobuf
fields. v2 with a hand-rolled length prefix would still need
per-field opt-in code on every side.
required-field presence is enforced at the parse layer, not by
ad-hoc has_xxx() checks scattered through wire code.
Why a NEW MAGIC rather than a version field inside protobuf
the wire level. You cannot accidentally ship a backwards-
incompatible patch via a field-semantics tweak.
machines that can't entangle (no
if (version == X)branchesanywhere -- this is the abstraction's red line).
v3 fields, once shipped, never change semantics.
Architecture
Handshake state machine is split out of RdmaEndpoint:
Endpoint-side ProcessHandshakeAt{Client,Server} is protocol-agnostic;
subclasses override only SendLocalHello() and ReceiveAndParseRemoteHello().
A ParsedHello struct decouples endpoint-side bring-up (window calculation,
BringUpQp) from any wire format.
Side effects:
Performance effects:
Breaking backward compatibility:
Server-side ALWAYS accepts both v2 and v3 (no gflag, no kill-switch);
magic routes to fully independent code paths. A single rolling upgrade
enables v3 fleet-wide.
Client-side picks the wire protocol via gflag with a safe default:
Sub-second rollback is one flag flip away. v3 client to v2-only legacy
server is NOT guaranteed to transparently fall back on the same
connection -- the supported migration is "upgrade servers first, then
opt-in clients".
Check List: