realtime: orient capture dims to device orientation (fixes Bug B black-screen on portrait Android)#16
Open
nagar-decart wants to merge 1 commit into
Open
Conversation
8177658 to
1953dbb
Compare
RealtimeModel definitions hard-code the model's natural landscape (W,H). On a portrait-locked Android app the LiveKit CameraX publisher therefore captures at landscape sensor orientation and relies on the WebRTC CVO rotation extension to deliver portrait frames — but the publication metadata reports the unrotated landscape dims. Server-side, the realtime inference bridge eagerly publishes its output track at the dims the input track advertised in its publication metadata so the output SDP renegotiation overlaps the per-session time-to-first-output-frame on the inference side (saves ~500 ms TTFF). If the signaled dims don't match the orientation of the actual decoded input frames, the bridge has to unpublish that track and republish at the correct orientation when the first model output frame arrives. That unpublish-old + publish-new sequence on the same participant triggers a transceiver-reuse race on the subscriber side: the existing stopped RtpReceiver remains in the PeerConnection, so libwebrtc never fires onAddTrack for the new track, addSubscribedMediaTrack never runs, TrackSubscribed for the new track never reaches RoomEvent, and the renderer is never attached. The app shows a black screen for the rest of the session. Empirically ~37 % of portrait-locked sessions on Lucy 2.1 with this SDK in production. Transposing the requested capture dims to match the device's current orientation makes the SDP carry the actually-displayed dims. The server predicts the right orientation on the first try, no republish happens, no transceiver-reuse race can occur. Tablets in landscape, phones in portrait, and any future rotation-locked configurations are all handled by reading Configuration.orientation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1953dbb to
c4bfaf2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
LocalStreamFactory.createCameraStreamso the LiveKit publisher signals already-rotated dims in SDP.TrackSubscribedfor the server's republished output track, leaving the app rendering a black screen.What was breaking, and why
RealtimeModeldefinitions specify the model's natural landscape(W, H), e.g.LUCY_2_1 = (1088, 624). A portrait-locked Android app therefore asks LiveKit's CameraX capturer for landscape dims, the camera captures sensor-natural landscape, and the WebRTC CVO RTP extension delivers the rotation flag so the receiver sees portrait. The publication metadata, though, carries the unrotated landscape dims (SDP does not carry resolution; LiveKit transports dims via the AddTrackRequest signaling protocol).Server-side, the realtime inference bridge eagerly publishes its output video track at the dims the input track advertised in its publication metadata so the output SDP renegotiation overlaps the per-session time-to-first-output-frame on the inference side (saves ~500 ms TTFF). With the signaled dims being landscape but the actual decoded input frames being portrait, the bridge republishes its output track at the right orientation the moment the first model frame is produced.
That unpublish-old + publish-new sequence on the same participant triggers an intermittent race in LiveKit Android: the existing stopped
RtpReceiverfrom the eager subscription stays in the subscriberPeerConnection. When the new track is published, libwebrtc effectively reuses the existing transceiver slot and never firesonAddTrackagain.RemoteParticipant.addSubscribedMediaTrackis never invoked, no retry loop runs, noTrackSubscriptionFailedis emitted, andTrackSubscribedfor the new track never reachesRoomEvent— so the renderer is never attached. The app sits at "Generating" with a blackOUTPUT (REMOTE)panel until the session times out.We've reproduced this end-to-end with the modified sample app (Android okhttp client, Samsung S921B) and confirmed the failure pattern in production logs:
lucy-vton-3-realtimeandlucy-2-1-realtimein a 24 h prod sample (16 sessions that took the republish path; 6 ended early with no observable wire fps and 10 worked).What this PR does
LocalStreamFactory.createCameraStreamnow readsConfiguration.orientationand transposes the requested(width, height)to match. The capturer asks for capture in the device's actual orientation, LiveKit's publication metadata carries the rotated dims directly, and the server's eager-publish prediction lands at the correct orientation on the first try. No republish, no race, no black screen.Tablets in landscape, phones in portrait, and any rotation-locked configuration are all handled by the same code path — no per-model or per-app config needed; callers can keep passing
model.width/heightas today.Coverage
is_portrait=True✓is_portrait=False✓is_portrait=False✓is_portrait=True✓Test plan
lucy-2.1: run 10 sessions in a row, confirm every session reachesremote_frame_counter count=1quickly andOUTPUT (REMOTE)renders. Before this change the same flow had ~37 % black-screen sessions; after it should be 100 %.1088×624(no transpose) and the session still works.Republished LiveKit video tracklog appears in the server-side Datadog trace for any session in either orientation.🤖 Generated with Claude Code
Note
Medium Risk
Changes realtime video capture / publication metadata on Android for all camera sessions; wrong orientation logic could affect resolution or server inference, but scope is a single factory helper with logging.
Overview
LocalStreamFactory.createCameraStreamnow transposes requested model(width, height)viaorientCaptureDimsso LiveKit capture and SDP use dimensions aligned withConfiguration.orientation(portrait → short×long, landscape → long×short). Callers can keep passing landscape model dims; only the capture path changes.That makes the server’s eager output-track publish match the first real frames, avoiding an output republish that was triggering a LiveKit Android subscriber race (
TrackSubscribednever fired → black remote video on portrait sessions). Transposes are logged when they differ from the request.Reviewed by Cursor Bugbot for commit 1953dbb. Bugbot is set up for automated code reviews on this repo. Configure here.