feat(realtime): connection quality, preflight, and glass-to-glass latency#18
Merged
Conversation
…s JS SDK #156, #158) Pre-connect probe: - checkConnectivity(): STUN-only WebRTC reachability + latency probe via a throwaway PeerConnection (no session); classifies udp/relay/failed + RTT bands into good/fair/poor/critical with reasons. - Deep probe (opt-in, CheckConnectivityOptions(deep = true, model)): brief real session on a synthetic capturer measuring true glass-to-glass latency; hard-capped at durationMs + connect budget. In-session quality: - ConnectionQualityEvaluator: smoothed verdict from WebRTC stats (latency, loss, upstream BWE, fps/freezes) with warm-up + asymmetric hysteresis. Surfaced via connectionQuality StateFlow, onConnectionQuality callback, getConnectionQuality(), and DiagnosticEvent.ConnectionQualitySample (per stats tick; level debounced, metrics live). Glass-to-glass (opt-in via ConnectOptions(debugQuality = true)): - Pixel-marker protocol port (luma stamp/read, server pixel_latency mode): StampingVideoProcessor stamps outgoing I420 Y-planes, MarkerReaderSink reads rendered frames, SeqTracker derives ttff/g2g/drop-ratio; measured g2g drives the latency verdict instead of RTT. Emulator-verified: STUN preflight (fair/udp/171ms), deep-probe failure path (invalid key -> critical, no retry hang), stamping pipeline on live camera frames. Marker placement vs server rotation still needs physical-device QA. Sample app demos all flows; README documents the new API. 177 unit tests.
…ass marker The pixel-marker protocol operates in display space, but camera buffers arrive in sensor orientation with rotation metadata. Stamping the raw I420 buffer put the marker where the server's pixel_latency reader never looks, so the camera path produced zero marker matches (verified live on an emulator: synthetic rotation-0 frames round-tripped, camera frames did not). StampingVideoProcessor now rotates each frame upright (libyuv I420Rotate) before the mirror flip + stamp and emits rotation-0 frames; the marker reader uprights rotated remote frames the same way (server output is rotation-0 in practice). Verified against the live server on an emulator: - deep probe: transport=udp rtt=41ms ttff=11.4s g2g=515ms samples=8 (early exit) - camera session with debugQuality: ttff 5.8s, g2g + drop ratio populating
…(ports JS SDK #161) The loss bands were too lenient for a real-time v2v pipeline — up to 2% loss read as "good" and only >10% as "critical", under-reporting genuinely degraded sessions. New bands (shared by packetLoss and g2gDrop, which measure delivery failures on the same scale): good <0.1%, fair 0.1-1%, poor 1-5%, critical >5%. Also fixes the deep-probe reason strings to render fractional percentages (0.1%) instead of truncating to 0%.
- Freeze-delta overcount: lastFreezeCount is now null until the first inbound sample baselines it, so the first interval reports a 0 delta instead of the whole cumulative freezeCount (which wrongly pulled the stall dimension to FAIR right after connect / on stats-loop restart). - Stale quality after failed connect: reset _connectionQuality to null when connect() throws, so getConnectionQuality() can't return a verdict from an aborted attempt until the next disconnect().
- Deslop: compress 7 verbose comments/KDocs to keep the non-obvious WHY (relay headroom, BWE-vs-encoder-target, median tie-break, marker block sizes, probe budget, display-space rotation) while cutting narration/marketing/anecdote. - Reuse: extract the identical 11-param makeSignals() builder duplicated across ConnectionQualityScoringTest and ConnectionQualityEvaluatorTest into one shared QualitySignalsFixtures.kt.
getConnectionQuality()/connectionQuality fell back to the cached _connectionQuality whenever the session manager returned null (e.g. mid-session auto-reconnect or connection loss, when the media channel is torn down), so they could keep returning the previous session's verdict until the next explicit disconnect(). Clear the cache when onConnectionStateChange moves to RECONNECTING or DISCONNECTED, mirroring the JS SDK's evaluator reset on reconnect. (Addresses Cursor Bugbot on PR #18.)
…ot, High) getConnectionQuality() preferred sessionManager.getConnectionQuality() (the live media channel's evaluator) over the _connectionQuality StateFlow, so during a reconnect/disconnect window — where the flow is cleared but the channel/manager still briefly exist — the getter and the connectionQuality flow could disagree (getter returns the stale prior-session verdict while the flow already shows null). Collapse to a single source of truth: getConnectionQuality() now returns _connectionQuality.value, which is updated on every quality sample and cleared on disconnect / failed connect / reconnecting. Removes the now-dead RealtimeSessionManager.getConnectionQuality() and LiveKitMediaChannel.currentConnectionQuality().
… cadence (PR #18 Bugbot) The warmup/window/hysteresis thresholds were ported from the JS SDK as raw sample counts, but the JS evaluator samples at 1s while Android drives it every STATS_INTERVAL_MS (3s) — so warm-up and debounced level changes took ~3x as long as the JS values imply (warm-up ~24s, downgrades ~15s). Scale the counts to ~1/3 so they land on the same wall-clock: window 5->3 (~9s), warmup 8->3 (~9s), downgrade/upgrade 5->2 (~6s). Bands (rtt/g2g/ttff/loss/upstream/stall) are unchanged.
…(PR #18 Bugbot) On RoomEvent.Reconnecting LiveKit reconnects in place — the media channel and its publish-stats loop survive — so unlike the SDK-level reconnect (which recreates the channel + evaluator) the ConnectionQualityEvaluator was never reset. The running stats loop then re-emitted the stale pre-reconnect verdict (repopulating the _connectionQuality that onConnectionStateChange had just cleared) and skipped a fresh warm-up after Reconnected. Reset the evaluator + freeze baseline on Reconnecting, matching the JS SDK's reset-on-reconnect.
…Bugbot) Follow-on to the evaluator reset: on RoomEvent.Reconnecting the opt-in SeqTracker was left intact, so pre-reconnect g2gMs/ttffMs and pending stamps kept feeding the stats loop and produced stale glass-to-glass verdicts after reconnect when debugQuality is on. markStart() it alongside the evaluator/freeze reset (the same call the SDK-level reconnect makes), re-arming its TTFF clock and warm-up.
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit e175f1d. Configure here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Android realtime sessions can now report whether the connection is healthy enough to feel responsive, and apps can probe the network before opening a session.
checkConnectivity()runs a fast STUN reachability check (or an opt-in deep probe that briefly opens a real session), whileconnectionQuality/onConnectionQualityemit a debounced in-session verdict with the limiting factor (bandwidth, latency, loss, stalls). Opt-indebugQualitymeasures true camera→display latency via a pixel marker — startup (ttffMs), steady-state (g2gMs), and end-to-end drops — so latency scoring reflects what users actually feel, not just network RTT.Test plan
debugQualityround-trip — marker visible,g2gMs/ttffMspopulateMade with Cursor