Skip to content

Record replay audio#252

Open
rgarcia wants to merge 7 commits into
mainfrom
hypeship/replay-audio
Open

Record replay audio#252
rgarcia wants to merge 7 commits into
mainfrom
hypeship/replay-audio

Conversation

@rgarcia
Copy link
Copy Markdown
Contributor

@rgarcia rgarcia commented May 26, 2026

Summary

  • add optional audio capture to ffmpeg replay recordings using the PulseAudio output monitor
  • start a shared PulseAudio null sink in the browser images and keep the monitor active with a silent sink input
  • keep audio aligned through the full recording by avoiding forced audio PTS reset and using faster low-latency x264 encoding so ffmpeg does not fall behind real time
  • add e2e coverage that records browser audio, downloads the MP4, verifies an audio track, checks the audio stream duration reaches the end of the recording, and includes a Zombocom archive sample recording

Tests

  • go test ./lib/recorder -count=1
  • DOCKER_BUILDKIT=1 docker build -f images/chromium-headful/Dockerfile -t onkernel/chromium-headful-test:latest .
  • TESTCONTAINERS_RYUK_DISABLED=true TESTCONTAINERS_HOST_OVERRIDE=127.0.0.1 RECORDING_ZOMBO_OUTPUT_PATH=/home/agent/repos/kernel-images/server/e2e/testdata/replay-audio-zombocom.mp4 go test ./e2e -run 'TestReplayRecording(IncludesAudioTrack|ZombocomArchiveAudio)' -count=1 -v

Note

Medium Risk
Touches container boot order, ffmpeg encoding, and recording defaults (supervisor env defaults RECORD_AUDIO to true while the Go default remains false); misconfiguration could affect startup timing or produce silent/misaligned audio.

Overview
Adds optional audio to browser replay recordings by capturing from a PulseAudio null sink (KernelOutput / KernelOutput.monitor) and muxing it into MP4s via ffmpeg when RECORD_AUDIO is enabled.

Runtime & images: A shared start-pulseaudio.sh replaces the headful-only script; headless gains pulseaudio and supervisord wiring. Pulse starts before Chromium (with socket readiness), autorestarts, and passes PULSE_* / AUDIO_SOURCE into kernel-images-api and the chromium launcher. Headless default flags drop --mute-audio so page audio can reach the sink.

Recorder: RECORD_AUDIO and AUDIO_SOURCE config, validation, and Linux ffmpeg args add a Pulse input, AAC encode, explicit stream maps, veryfast/zerolatency video, and timestamp tweaks (no wallclock timestamps / no async aresample).

Tests & tooling: Docker e2e tests assert downloaded replays have a non-silent audio track and duration aligned with the container; a large verify_replay_pipeline.mjs script exercises full replay vs golden MP4 (optional SSIM/audio correlation).

Reviewed by Cursor Bugbot for commit ce8e4a2. Bugbot is set up for automated code reviews on this repo. Configure here.

@rgarcia rgarcia marked this pull request as ready for review May 26, 2026 21:26
@firetiger-agent
Copy link
Copy Markdown

Firetiger deploy monitoring skipped

This PR didn't match the auto-monitor filter configured on your GitHub connection:

Any PR that changes the kernel API. Monitor changes to API endpoints (packages/api/cmd/api/) and Temporal workflows (packages/api/lib/temporal) in the kernel repo

Reason: PR modifies chromium launcher, wrapper, recorder, and config packages rather than the kernel API endpoints (packages/api/cmd/api/) or Temporal workflows (packages/api/lib/temporal) specified in the filter.

To monitor this PR anyway, reply with @firetiger monitor this.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Recording cap shorter than script
    • I made the recording duration configurable per test and increased the Zombocom archive scenario cap to 70 seconds so recording no longer ends before the Playwright flow completes.

Create PR

Or push these changes by commenting:

@cursor push 2f6a9b26ef
Preview (2f6a9b26ef)
diff --git a/server/e2e/e2e_recording_audio_test.go b/server/e2e/e2e_recording_audio_test.go
--- a/server/e2e/e2e_recording_audio_test.go
+++ b/server/e2e/e2e_recording_audio_test.go
@@ -52,7 +52,7 @@
 		return await page.title();
 	`, audioSite.ContainerURL())
 
-	recordReplayAudio(t, ctx, c, playwrightCode, os.Getenv("RECORDING_AUDIO_OUTPUT_PATH"), 0.1)
+	recordReplayAudio(t, ctx, c, playwrightCode, os.Getenv("RECORDING_AUDIO_OUTPUT_PATH"), 35, 0.1)
 }
 
 func TestReplayRecordingZombocomArchiveAudio(t *testing.T) {
@@ -132,16 +132,15 @@
 		return playback;
 	`
 
-	recordReplayAudio(t, ctx, c, playwrightCode, outputPath, 0.01)
+	recordReplayAudio(t, ctx, c, playwrightCode, outputPath, 70, 0.01)
 }
 
-func recordReplayAudio(t *testing.T, ctx context.Context, c *TestContainer, playwrightCode string, outputPath string, minPeakLevel float64) {
+func recordReplayAudio(t *testing.T, ctx context.Context, c *TestContainer, playwrightCode string, outputPath string, maxDuration int, minPeakLevel float64) {
 	t.Helper()
 
 	client, err := c.APIClient()
 	require.NoError(t, err, "failed to create API client")
 
-	maxDuration := 35
 	maxFileSize := 100
 	startResp, err := client.StartRecordingWithResponse(ctx, instanceoapi.StartRecordingJSONRequestBody{
 		MaxDurationInSeconds: &maxDuration,

You can send follow-ups to the cloud agent here.

client, err := c.APIClient()
require.NoError(t, err, "failed to create API client")

maxDuration := 35
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recording cap shorter than script

Medium Severity

recordReplayAudio caps ffmpeg at 35 seconds while the Zombocom Playwright script can run ~50+ seconds (30s selector timeout plus 20s of fixed waits). Recording stops mid-script, so the MP4 may omit most playback and audio checks can fail or pass on noise despite a successful browser run.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ee7fb17. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Pulse socket before sink ready
    • I load module-null-sink before module-native-protocol-unix so the Pulse socket appears only after KernelOutput is created.
  • ✅ Fixed: Keepalive exit kills PulseAudio
    • I replaced wait -n with a keepalive retry loop and wait only on PulseAudio so a pacat exit no longer terminates the daemon.

Create PR

Or push these changes by commenting:

@cursor push 8197b93a61
Preview (8197b93a61)
diff --git a/shared/start-pulseaudio.sh b/shared/start-pulseaudio.sh
--- a/shared/start-pulseaudio.sh
+++ b/shared/start-pulseaudio.sh
@@ -22,16 +22,12 @@
     --daemonize=no \
     --log-target=stderr \
     --exit-idle-time=-1 \
-    --load="module-native-protocol-unix socket=/tmp/pulse/native auth-anonymous=1" \
-    --load="module-null-sink sink_name=KernelOutput rate=48000 channels=2 sink_properties=device.description=KernelOutput" &
+    --load="module-null-sink sink_name=KernelOutput rate=48000 channels=2 sink_properties=device.description=KernelOutput" \
+    --load="module-native-protocol-unix socket=/tmp/pulse/native auth-anonymous=1" &
 
   pulse_pid=$!
-  keepalive_pid=""
 
   cleanup() {
-    if [ -n "$keepalive_pid" ]; then
-      kill "$keepalive_pid" 2>/dev/null || true
-    fi
     kill "$pulse_pid" 2>/dev/null || true
     wait 2>/dev/null || true
   }
@@ -44,10 +40,12 @@
     sleep 0.1
   done
 
-  (
-    pacat --raw --rate=48000 --channels=2 --format=s16le --device=KernelOutput /dev/zero
-  ) &
-  keepalive_pid=$!
+  while kill -0 "$pulse_pid" 2>/dev/null; do
+    pacat --raw --rate=48000 --channels=2 --format=s16le --device=KernelOutput /dev/zero || true
+    if kill -0 "$pulse_pid" 2>/dev/null; then
+      sleep 0.1
+    fi
+  done
 
-  wait -n "$pulse_pid" "$keepalive_pid"
+  wait "$pulse_pid"
   '

You can send follow-ups to the cloud agent here.

) &
keepalive_pid=$!

wait -n "$pulse_pid" "$keepalive_pid"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulse socket before sink ready

Medium Severity

Backgrounding pulseaudio exposes the Unix socket as soon as the protocol module loads, while KernelOutput may still be registering. The container wrapper only waits for the socket file, then starts Chromium with PULSE_SINK=KernelOutput, so the browser can launch before that sink exists and miss early replay audio.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f416d9d. Configure here.

) &
keepalive_pid=$!

wait -n "$pulse_pid" "$keepalive_pid"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keepalive exit kills PulseAudio

Medium Severity

The script uses wait -n on both pulseaudio and the pacat keepalive, and the EXIT trap always kills pulse_pid. If pacat exits while pulseaudio is still healthy, cleanup terminates the daemon and the whole supervisord program exits, causing an unnecessary audio stack restart.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f416d9d. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Darwin audio path uses incompatible PulseAudio source name
    • Darwin now validates audio sources as numeric AVFoundation device indices or "none" in both parameter validation and ffmpeg argument construction, preventing invalid PulseAudio names from reaching ffmpeg.

Create PR

Or push these changes by commenting:

@cursor push 159fdd82c8
Preview (159fdd82c8)
diff --git a/server/lib/recorder/ffmpeg.go b/server/lib/recorder/ffmpeg.go
--- a/server/lib/recorder/ffmpeg.go
+++ b/server/lib/recorder/ffmpeg.go
@@ -89,8 +89,16 @@
 	if p.MaxDurationInSeconds != nil && *p.MaxDurationInSeconds <= 0 {
 		return fmt.Errorf("max duration must be greater than 0 seconds")
 	}
-	if p.recordAudio() && strings.TrimSpace(p.audioSource()) == "" {
-		return fmt.Errorf("audio source is required when recording audio")
+	if p.recordAudio() {
+		audioSource := strings.TrimSpace(p.audioSource())
+		if audioSource == "" {
+			return fmt.Errorf("audio source is required when recording audio")
+		}
+		if runtime.GOOS == "darwin" && audioSource != "none" {
+			if _, err := strconv.Atoi(audioSource); err != nil {
+				return fmt.Errorf("audio source must be a numeric device index or \"none\" on darwin")
+			}
+		}
 	}
 
 	return nil
@@ -511,10 +519,15 @@
 	case "darwin":
 		audioDevice := "none"
 		if recordAudio {
-			audioDevice = params.audioSource()
-			if strings.TrimSpace(audioDevice) == "" {
+			audioDevice = strings.TrimSpace(params.audioSource())
+			if audioDevice == "" {
 				return nil, fmt.Errorf("audio source is required when recording audio")
 			}
+			if audioDevice != "none" {
+				if _, err := strconv.Atoi(audioDevice); err != nil {
+					return nil, fmt.Errorf("audio source must be a numeric device index or \"none\" on darwin")
+				}
+			}
 		}
 		args = []string{
 			// Input options for AVFoundation

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit ce8e4a2. Configure here.

"-pixel_format", "nv12",
// Input file
"-i", fmt.Sprintf("%d:none", *params.DisplayNum), // Screen capture, no audio
"-i", fmt.Sprintf("%d:%s", *params.DisplayNum, audioDevice),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darwin audio path uses incompatible PulseAudio source name

Low Severity

The Darwin/macOS path in ffmpegArgs passes params.audioSource() (which defaults to "KernelOutput.monitor") as the AVFoundation audio device identifier. AVFoundation expects a numeric device index (e.g., "0", "1") or "none", not a PulseAudio source name. When recordAudio is true on macOS, ffmpeg will fail at runtime with an invalid device error despite Validate() passing. The validation only checks for an empty string, not platform compatibility.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ce8e4a2. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant