Fix client hang during UDP bootstrap that prevents joining servers by JacobWoodson · Pull Request #1141 · ryanhcode/sable

JacobWoodson · 2026-06-08T00:23:17Z

Problem

ConnectionMixin.sable$connect (and its local-server twin) opens a UDP channel alongside the Minecraft TCP one when the player connects. The bootstrap currently ends with:

final ChannelFuture channelFuture = new Bootstrap()...
        .connect(inetSocketAddress.getAddress(), inetSocketAddress.getPort());

channelFuture.syncUninterruptibly();

That syncUninterruptibly() runs on the Server Pinger (multiplayer list refresh) and Server Connector (actual join) threads. Those are the same threads that drive the TCP handshake / config phase of the Minecraft login. As long as the netty UDP connect() future hasn't resolved, no TCP login can make progress.

On most hosts the UDP connect() returns within milliseconds (UDP "connect" is just an OS-level associate-with-remote-address â€” no packets fly). But the netty call goes through NioDatagramChannel.doConnect() â†’ javaChannel().connect(remoteAddress), which on Windows can stall when a third party is interposed on the socket layer: Windows Filtering Platform callouts from antivirus, "secure DNS" / enterprise proxies, some VPN clients, or layered service providers doing UDP inspection. On the affected client in this repro, that stall lasted 20-30+ seconds; the server's read-timeout fired first and closed the TCP socket. From the user's perspective, the login screen sat on "Connectingâ€¦" and then bounced with a generic "Disconnected".

This is the same shape of failure surfaced (with different proximate causes) in #852, #850 (workaround: attempt_udp_networking=false), #1035 (closed as dup of #482), and #1080 (NPE that this PR also fixes).

Repro

Sable NeoForge 1.21.1-1.2.2, Synthetic Horizons modpack, MC 1.21.1 + NeoForge 21.1.233.
One Windows 11 laptop on the LAN couldn't join a dedicated server (locally to 192.168.1.90:25565 or remotely to 47.38.244.85:25565). Other machines on the same LAN with the same pack joined fine.
Pinger thread eventually succeeded in bringing up its UDP channel (server appeared in the list), but the Server Connector's UDP bootstrap was the long one â€” long enough to push the server past its config-phase timeout.
With the patch in place on the same machine, UDP bootstrap completes in 1 ms on the same network path, and the player joins. The "drowning in the spawn ocean" verification was, uh, vivid.

Alternatives considered

I went through a few options before landing on this one. Recording them so reviewers can push back if they prefer a different shape.

Tell users to set disable_udp_pipeline=true and call it done. This is the documented workaround today and was suggested on Client-side NPE in ClientboundSableUDPActivationPacket when UDP channel is null (tunnel/proxy environments) #1080. It works, but (a) it silently downgrades every user to TCP-only sub-level data forever rather than only on machines where UDP genuinely can't be set up, (b) the flag only short-circuits the server-side ServerConnectionListenerMixin â€” ConnectionMixin.sable$connect ignored it, so users following the doc were still hitting the bootstrap on the client, and (c) it doesn't address the root cause (a blocking call on the login thread).
Move the UDP bootstrap off the login thread entirely (run the Bootstrap.connect(...) on the netty event loop and setUDPChannel from a listener). Cleaner long-term shape -- UDP bring-up becomes purely async, no timing coupling with TCP login at all. I didn't take this because it changes the lifecycle assumptions made by ClientboundSableUDPActivationPacket.handle (which today reads sable$getUDPChannel() synchronously, expecting connect() to be done by the time the server's activation packet arrives) and by anyone reading the field elsewhere. That's a bigger change with a wider blast radius for a backport-friendly fix, so I deferred it. Happy to do it as a follow-up if the maintainers prefer.
Bounded await with a timeout, fall back to TCP-only on failure. What this PR does. The synchronous shape is preserved (so existing assumptions about sable$getUDPChannel() after Connection.connect returns still hold for the success path), but a slow or stuck connect() no longer holds the login thread hostage. The timeout (5 s) is generous enough that healthy networks always succeed (success path observed at <1 ms on every machine I tested) and short enough that the server-side login window (which is also ~20-30 s before the read-timeout fires) is preserved.
Make the timeout configurable. Considered; rejected for now. A config knob is overkill when the practical range is "<1 s succeeds, otherwise something is broken and we want to fail fast". If a future deployment surfaces a legitimate need for a longer wait, promoting SABLE$UDP_CONNECT_TIMEOUT_MS to SableConfig is one line.

What this PR does

ConnectionMixin.sable$connect: replace syncUninterruptibly() with awaitUninterruptibly(SABLE$UDP_CONNECT_TIMEOUT_MS, MILLISECONDS) (5 s). On timeout: cancel the future, install a listener that logs the late outcome (and closes the channel if it eventually succeeds after we already gave up), return â€” letting TCP login proceed. On !isSuccess(): log the cause, return. Wrap the bootstrap construction itself in try/catch so an unchecked exception during bootstrap can't break the connect call either.
ConnectionMixin.sable$connectToLocalServer: same treatment for LocalChannel. Less critical (local channels rarely stall) but the inconsistency was hard to justify keeping.
Honor SableConfig.DISABLE_UDP_PIPELINE on the client: short-circuit both connect injects when the flag is set. Previously only ServerConnectionListenerMixin checked it, which made the documented workaround a half-measure.
ClientboundSableUDPActivationPacket.handle: null-check connectionExtension.sable$getUDPChannel() before dereferencing â€” fixes the NPE in Client-side NPE in ClientboundSableUDPActivationPacket when UDP channel is null (tunnel/proxy environments) #1080 (Cannot invoke 'io.netty.channel.Channel.eventLoop()' because 'channel' is null). When the channel is null, log a WARN and return; the client stays on TCP-only and the server's auth state for that connection just stays in AWAITING_AUTH until the connection closes. Also pattern-match the connection.getRemoteAddress() cast instead of an unchecked one.
Logging cleanup: the existing Starting remote client UDP channel future line now carries the remote address and transport class so logs from affected users are immediately actionable. All failure paths log at WARN with the cause. Routine breadcrumbs (config-disabled short-circuit, late-success-after-cancel, success path with timing) are demoted to DEBUG so unaffected users don't see any new chatter.

Implications

Success path

Byte-for-byte identical end-state -- the channel is set on the ConnectionExtension, SableUDPChannelHandlerClient.channelActive fires, ClientboundSableUDPActivationPacket.handle sees the channel and dispatches the auth response, server transitions SableUDPAuthenticationState to AUTHENTICATED. No timing changes when UDP comes up healthily (success observed at <1 ms across machines tested).

Failure / timeout path (where this PR is doing the actual work)

The interesting question is "if UDP setup fails, what's broken for the player?" I traced the gameplay-affecting UDP call sites to answer that:

sendUDPPacket has two production call sites in the codebase: SubLevelTrackingSystem.java:367-397 (sub-level snapshot replication) and SableCommand.java:58 (a debug echo command). The tracking system already has an explicit, designed-in TCP fallback branch keyed on isConnectedTo(player) â€” sub-level snapshots are sent as a ClientboundBundlePacket(ClientboundSableSnapshotInfoDualPacket, ClientboundSableSnapshotDualPacket) over TCP when UDP isn't authenticated. The packets are Dual (SableUDPPacket, SableTCPPacket) and the client decodes them identically off either transport; the only branch in ClientSableInterpolationState.receiveSnapshot on the receive mode is a debug-overlay string ("Networking through UDP/TCP"). The interpolation math, the snapshot buffer, and the rendered sub-level position are all identical.
isConnectedTo(player) returns false cleanly in our failure path (udpAuthStates entry stays in AWAITING_AUTH because the client never replies with SableUDPAuthenticationPacket).
SableUDPServer.sendPings() skips players that aren't AUTHENTICATED, so a fallback-to-TCP player generates no UDP keepalive traffic and is never bounced by the missed-pings logic.

So the practical effect of our timeout firing is: the player joins, sub-levels work normally, the only observable difference is the F3 debug overlay reports "Networking through TCP" (or "UNKNOWN" until the first snapshot arrives). The one real degradation is that under a lossy network, sub-level snapshots over TCP can show jitter from head-of-line blocking that UDP wouldn't have â€” but the failure mode that triggers this PR is at the OS socket layer, not on-wire packet loss, so affected users typically have clean networks and see no visible difference.

Other behavioral notes

disable_udp_pipeline now means what it says on the client â€” no UDP bootstrap, no UDP channel, no log lines. Previously the flag only short-circuited the server bootstrap, so a user setting it client-side (as Client-side NPE in ClientboundSableUDPActivationPacket when UDP channel is null (tunnel/proxy environments) #1080 recommends as a workaround) was still hitting the buggy bootstrap. This is a small behavior change for anyone relying on the side-effect that the client kept opening a UDP channel; I don't think such users exist, but worth calling out.
Stale auth-state entries: in the timeout path, the server's udpAuthStates map keeps an AWAITING_AUTH entry for the connection whose client never replied. The map is a WeakHashMap keyed on Connection and entries are also pruned in SableUDPServer.sendPings() when !connection.isConnected(), so it cleans up on disconnect. No leak; called out for completeness.
No protocol change, no packet format change, no jarjar change. Drop-in compatible across mixed 1.2.2 / patched-1.2.2 deployments â€” server-patched + client-unpatched (or vice versa) negotiates UDP exactly the same way it does today.
No new dependencies. One @Unique field added: SABLE$UDP_CONNECT_TIMEOUT_MS (mixin-safe naming).

AI assistance

This change was developed with the assistance of an AI coding assistant (Claude). I read every line, built the jar locally against mc1.21.1-1.2.2-neoforge, dropped it into the affected modpack instance, reproduced the original join failure on the affected machine, watched it succeed with the fix, and read the resulting logs to confirm the WARN/DEBUG split is what I want a future bug report to contain. I'm submitting this as my own contribution under the CONTRIBUTING.md terms.

Related issues

Client-side NPE in ClientboundSableUDPActivationPacket when UDP channel is null (tunnel/proxy environments) #1080 -- Client-side NPE in ClientboundSableUDPActivationPacket (fixed directly by the channel null guard)
Not able to join the server #852 -- Not able to join the server (consistent with this PR's repro for the subset of reports where the proximate cause is a slow OS-level UDP connect())
Severe Desync from server when UDP is enabled #850 -- Severe desync from server when UDP is enabled (different root cause; not addressed here)
Server crash when not disabling udp pipeline #1035 / Server fails to start when using Sable and Luna together #482 -- Server-side crash without disabling UDP pipeline (server-side; not addressed here)

Connection.connect is mixin-injected to bootstrap a UDP channel alongside the TCP one. The existing implementation calls Bootstrap.connect(...).syncUninterruptibly() on the Server Pinger / Server Connector thread - the same thread that drives TCP login. If the netty connect future is slow to resolve (e.g. antivirus or firewall delaying the OS-level UDP connect() syscall), the entire login is blocked until the server gives up and closes the TCP connection. The client sees a generic "Disconnected" and the server sees "Timed out". Replace the indefinite blocking sync with a bounded awaitUninterruptibly(5s); on timeout / failure, cancel the future, log the cause, and return so the TCP login can proceed. The success path is unchanged. Also: * honor SableConfig.DISABLE_UDP_PIPELINE on the client side too - previously it only short-circuited the server bootstrap, so users running the documented workaround were still bootstrapping a UDP channel on the client. * null-guard the UDP channel in ClientboundSableUDPActivationPacket (fixes ryanhcode#1080 NPE on channel.eventLoop() when UDP setup did not produce a channel). * pattern-match the remote address instead of an unchecked cast. * include remote address / transport class on the existing "Starting remote client UDP channel future" line and surface failure causes on WARN so future UDP setup issues are actionable from latest.log. Routine breadcrumbs at DEBUG only.

CLAassistant · 2026-06-08T00:23:24Z

All committers have signed the CLA.

ryanhcode self-assigned this Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix client hang during UDP bootstrap that prevents joining servers#1141

Fix client hang during UDP bootstrap that prevents joining servers#1141
JacobWoodson wants to merge 1 commit into
ryanhcode:mainfrom
JacobWoodson:fix/udp-join-hang-diagnostics

JacobWoodson commented Jun 8, 2026

Uh oh!

CLAassistant commented Jun 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JacobWoodson commented Jun 8, 2026

Problem

Repro

Alternatives considered

What this PR does

Implications

Success path

Failure / timeout path (where this PR is doing the actual work)

Other behavioral notes

AI assistance

Related issues

Uh oh!

CLAassistant commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Jun 8, 2026 •

edited

Loading