Skip to content

realtime: orient capture dims to device orientation (fixes Bug B black-screen on portrait Android)#16

Open
nagar-decart wants to merge 1 commit into
mainfrom
fix/bug-b-orientation-aware-capture
Open

realtime: orient capture dims to device orientation (fixes Bug B black-screen on portrait Android)#16
nagar-decart wants to merge 1 commit into
mainfrom
fix/bug-b-orientation-aware-capture

Conversation

@nagar-decart

@nagar-decart nagar-decart commented Jun 8, 2026

Copy link
Copy Markdown

Summary

  • Match capture dimensions to the device's current orientation in LocalStreamFactory.createCameraStream so the LiveKit publisher signals already-rotated dims in SDP.
  • Eliminates a transceiver-reuse race in the LiveKit Android subscriber that intermittently swallowed TrackSubscribed for the server's republished output track, leaving the app rendering a black screen.

What was breaking, and why

RealtimeModel definitions specify the model's natural landscape (W, H), e.g. LUCY_2_1 = (1088, 624). A portrait-locked Android app therefore asks LiveKit's CameraX capturer for landscape dims, the camera captures sensor-natural landscape, and the WebRTC CVO RTP extension delivers the rotation flag so the receiver sees portrait. The publication metadata, though, carries the unrotated landscape dims (SDP does not carry resolution; LiveKit transports dims via the AddTrackRequest signaling protocol).

Server-side, the realtime inference bridge eagerly publishes its output video track at the dims the input track advertised in its publication metadata so the output SDP renegotiation overlaps the per-session time-to-first-output-frame on the inference side (saves ~500 ms TTFF). With the signaled dims being landscape but the actual decoded input frames being portrait, the bridge republishes its output track at the right orientation the moment the first model frame is produced.

That unpublish-old + publish-new sequence on the same participant triggers an intermittent race in LiveKit Android: the existing stopped RtpReceiver from the eager subscription stays in the subscriber PeerConnection. When the new track is published, libwebrtc effectively reuses the existing transceiver slot and never fires onAddTrack again. RemoteParticipant.addSubscribedMediaTrack is never invoked, no retry loop runs, no TrackSubscriptionFailed is emitted, and TrackSubscribed for the new track never reaches RoomEvent — so the renderer is never attached. The app sits at "Generating" with a black OUTPUT (REMOTE) panel until the session times out.

We've reproduced this end-to-end with the modified sample app (Android okhttp client, Samsung S921B) and confirmed the failure pattern in production logs:

  • ~37 % of portrait sessions on lucy-vton-3-realtime and lucy-2-1-realtime in a 24 h prod sample (16 sessions that took the republish path; 6 ended early with no observable wire fps and 10 worked).
  • Direct repro proved by contrasting two real Android sessions with identical server-side traces — the one where the client finished subscribing to the eager track failed, the one where it didn't (server's republish beat the client's subscribe) worked.

What this PR does

LocalStreamFactory.createCameraStream now reads Configuration.orientation and transposes the requested (width, height) to match. The capturer asks for capture in the device's actual orientation, LiveKit's publication metadata carries the rotated dims directly, and the server's eager-publish prediction lands at the correct orientation on the first try. No republish, no race, no black screen.

Tablets in landscape, phones in portrait, and any rotation-locked configuration are all handled by the same code path — no per-model or per-app config needed; callers can keep passing model.width/height as today.

Coverage

Device Held Capture dims Publication metadata reports Server prediction Republish
Phone Portrait 624×1088 624×1088 is_portrait=True None
Phone Landscape 1088×624 1088×624 is_portrait=False None
Tablet Landscape 1088×624 1088×624 is_portrait=False None
Tablet Portrait 624×1088 624×1088 is_portrait=True None

Test plan

  • On a portrait Android phone with lucy-2.1: run 10 sessions in a row, confirm every session reaches remote_frame_counter count=1 quickly and OUTPUT (REMOTE) renders. Before this change the same flow had ~37 % black-screen sessions; after it should be 100 %.
  • Rotate phone to landscape (or run on a tablet locked to landscape): confirm capture dims log 1088×624 (no transpose) and the session still works.
  • Confirm no Republished LiveKit video track log appears in the server-side Datadog trace for any session in either orientation.
  • iOS equivalent is filed separately (decart-ios).

🤖 Generated with Claude Code


Note

Medium Risk
Changes realtime video capture / publication metadata on Android for all camera sessions; wrong orientation logic could affect resolution or server inference, but scope is a single factory helper with logging.

Overview
LocalStreamFactory.createCameraStream now transposes requested model (width, height) via orientCaptureDims so LiveKit capture and SDP use dimensions aligned with Configuration.orientation (portrait → short×long, landscape → long×short). Callers can keep passing landscape model dims; only the capture path changes.

That makes the server’s eager output-track publish match the first real frames, avoiding an output republish that was triggering a LiveKit Android subscriber race (TrackSubscribed never fired → black remote video on portrait sessions). Transposes are logged when they differ from the request.

Reviewed by Cursor Bugbot for commit 1953dbb. Bugbot is set up for automated code reviews on this repo. Configure here.

@nagar-decart nagar-decart force-pushed the fix/bug-b-orientation-aware-capture branch from 8177658 to 1953dbb Compare June 8, 2026 11:14
RealtimeModel definitions hard-code the model's natural landscape (W,H).
On a portrait-locked Android app the LiveKit CameraX publisher therefore
captures at landscape sensor orientation and relies on the WebRTC CVO
rotation extension to deliver portrait frames — but the publication metadata reports the unrotated landscape dims.

Server-side, the realtime inference bridge eagerly publishes its output
track at the dims the input track advertised in its publication metadata so the output SDP renegotiation overlaps the per-session
time-to-first-output-frame on the inference side (saves ~500 ms TTFF). If the signaled dims don't match the
orientation of the actual decoded input frames, the bridge has to
unpublish that track and republish at the correct orientation when the
first model output frame arrives.

That unpublish-old + publish-new sequence on the same participant
triggers a transceiver-reuse race on the subscriber side: the existing
stopped RtpReceiver remains in the PeerConnection, so libwebrtc never
fires onAddTrack for the new track, addSubscribedMediaTrack never runs,
TrackSubscribed for the new track never reaches RoomEvent, and the
renderer is never attached. The app shows a black screen for the rest
of the session. Empirically ~37 % of portrait-locked sessions on Lucy
2.1 with this SDK in production.

Transposing the requested capture dims to match the device's current
orientation makes the SDP carry the actually-displayed dims. The server
predicts the right orientation on the first try, no republish happens,
no transceiver-reuse race can occur. Tablets in landscape, phones in
portrait, and any future rotation-locked configurations are all handled
by reading Configuration.orientation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nagar-decart nagar-decart force-pushed the fix/bug-b-orientation-aware-capture branch from 1953dbb to c4bfaf2 Compare June 8, 2026 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant