Skip to content

ci(hub-client-e2e): fix Playwright install hang (Node pin) + smoke-all seed/sync flakiness#249

Merged
gordonwoodhull merged 2 commits into
mainfrom
chore/playwright-install-hang
Jun 1, 2026
Merged

ci(hub-client-e2e): fix Playwright install hang (Node pin) + smoke-all seed/sync flakiness#249
gordonwoodhull merged 2 commits into
mainfrom
chore/playwright-install-hang

Conversation

@gordonwoodhull
Copy link
Copy Markdown
Member

@gordonwoodhull gordonwoodhull commented Jun 1, 2026

Fixes two independent problems that were keeping the Hub-Client E2E workflow red. CI is now green (run 26772812401: 78 passed / 14 flaky / 0 failed functional, 6 visual passed, ~13 min with 2 workers).

1. Install hang (Node + Playwright)

The Install Playwright step hung until the 6 h job cap (every run since ~2026-05-27). Root cause: a yauzl extraction-hang regression in Node ≥ 24.16 — after the browser zip downloads to 100 %, extraction deadlocks. It affects Playwright < 1.60.0 (we pin @playwright/test at ^1.50.0 → 1.58.0), and our CI runs node-version: '24' (floats past 24.16). Host-independent; not the CDN. Refs microsoft/playwright#40724, fixed by #40747 in PW 1.60.0.

Fix: pin Node to 24.15.0 (last release before the regression). Keeps Playwright 1.58 / Chromium 145, so visual baselines are unchanged. This is a stopgap — the durable fix is upgrading to PW ≥ 1.60 and dropping the pin (tracked in bd-2njja; deferred because 1.60 bumps Chromium 145 → 148 and needs a visual-baseline regen).

2. smoke-all flakiness (seed-before-synced race)

Unblocking the install surfaced pre-existing smoke-all flakiness (same signature on the 2026-05-27 run; not caused by the Node pin). Root cause is a test-harness race, not the render pipeline, not auth (the 401 on /auth/me is the benign anonymous response), and not parallelism (verified: workers: 1 did not help):

seedProjectInBrowseraddProject writes IDB only. A seeded project reaches the synced project set only via reconcileIntoConnectedProjectSet, which the app runs on the statusconnected transition and which requires isConnected(). The test seeds after the set is already connected, so that effect never re-fires — the seed lands in the set only if a fortuitous WS reconnect re-triggers reconcile. When it doesn't, the project is absent from the set, navigation to /p/<id> drops to the landing page, and waitForPreviewRender times out at 75 s. Confirmed via the failing run's trace + final screenshot ("No projects yet", empty set).

Fix (keeps the full VFS → Automerge → WASM → preview path end-to-end — just stops racing it): expose the live projectSetService + the idempotent reconciler on window.__quartoTest, and have seedProjectInBrowser wait for a real peer connection, run the reconciler, and wait until the project is observably present in the connected set before returning. Bounded 30 s waits fail loudly if the sync server is truly unreachable. workers stays at 2.

Rejected the alternative of seeding content locally — it would bypass the very integration smoke-all exists to exercise.

Follow-ups (tracked)

  • bd-2njja — upgrade @playwright/test ≥ 1.60.0, drop the Node pin, regen Chromium-148 visual baselines.
  • bd-3nzyd — residual smoke-all flakiness: 14 tests still pass-on-retry (likely project file-doc sync during render, a different doc than the project-set race fixed here); harden similarly.

Note on main

Two earlier direct-to-main commits (5f16330f, 78f563b7) tried CDN-based install fixes that didn't work; this PR restores the pristine workflow and supersedes them.

Node 24.16.0 introduced a yauzl stream-destruction regression that hangs
`for await` over `openReadStream`. With @playwright/test 1.58 (< 1.60),
`playwright install chromium` hangs forever right after the browser
download reaches 100% — extraction deadlocks and the job burns the full
6h cap. This is what has failed every Hub-Client E2E run since ~2026-05-27
(the runner's Node 24.x floated past 24.16).

Pin node-version to 24.15.0, the last release before the regression. It
keeps Playwright 1.58 / Chromium 145, so the visual-regression baselines
are unchanged. Remove the pin once @playwright/test is bumped to >= 1.60.0
(the upstream fix, microsoft/playwright#40747), regenerating the visual
baselines for the newer Chromium at that time.

Refs microsoft/playwright#40724.
@gordonwoodhull gordonwoodhull force-pushed the chore/playwright-install-hang branch from 6ba3a81 to faa48e5 Compare June 1, 2026 15:34
@gordonwoodhull gordonwoodhull marked this pull request as draft June 1, 2026 15:35
@gordonwoodhull gordonwoodhull changed the title Fix Hub-Client E2E Playwright install hang (diagnostic WIP) ci(hub-client-e2e): pin Node 24.15.0 to unblock Playwright install hang (interim; real fix = PW 1.60) Jun 1, 2026
@gordonwoodhull gordonwoodhull marked this pull request as ready for review June 1, 2026 15:54
@gordonwoodhull gordonwoodhull marked this pull request as draft June 1, 2026 17:11
@gordonwoodhull gordonwoodhull force-pushed the chore/playwright-install-hang branch from acddaaa to faa48e5 Compare June 1, 2026 17:46
…et (de-flake smoke-all)

smoke-all is flaky (predates the install hang; ~2/6 fail/flaky of ~92,
varying set, worker-count-independent). Root cause is a seed/sync race,
not the render pipeline and not parallelism:

  seedProjectInBrowser -> projectStorage.addProject writes IDB ONLY. A
  seeded project reaches the *synced* project set only via
  reconcileIntoConnectedProjectSet, which the app runs in a useProjectSet
  effect keyed on the status->connected TRANSITION and which requires
  isConnected(). Because the test seeds AFTER bootstrapProjectSet already
  reached connected, that effect never re-fires, so the seed lands in the
  set only if a fortuitous WS reconnect re-triggers reconcile. When it
  doesn't, the project is absent from the set, navigation to /p/<id> drops
  to the landing page, and waitForPreviewRender times out at 75s.
  (Confirmed via the failing run's trace + final screenshot: 'No projects
  yet', empty set; clean network, the 401 on /auth/me is benign.)

Fix (Tier 2 — keep the full Automerge path end-to-end, just stop racing
it): expose the live projectSetService + the idempotent reconciler on
window.__quartoTest, and have seedProjectInBrowser wait for a real peer
connection, run the reconciler, and wait until the project is observably
present in the connected set before returning. Bounded 30s waits fail
loudly if the sync server is truly unreachable instead of surfacing as a
75s preview-render timeout.

Rejected Tier 1 (seed content locally) — it would bypass the very VFS ->
Automerge -> WASM -> preview integration smoke-all exists to exercise.
workers stays at 2 (parallelism was never the cause; verified workers:1
did not help). Follow-up (product, separate): app could reconcile on IDB
change, not only on the status transition. Refs bd-3nzyd.
@gordonwoodhull gordonwoodhull changed the title ci(hub-client-e2e): pin Node 24.15.0 to unblock Playwright install hang (interim; real fix = PW 1.60) ci(hub-client-e2e): fix Playwright install hang (Node pin) + smoke-all seed/sync flakiness Jun 1, 2026
@gordonwoodhull gordonwoodhull marked this pull request as ready for review June 1, 2026 18:31
@gordonwoodhull gordonwoodhull merged commit dfbc271 into main Jun 1, 2026
5 checks passed
@gordonwoodhull gordonwoodhull deleted the chore/playwright-install-hang branch June 1, 2026 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant