ci(hub-client-e2e): fix Playwright install hang (Node pin) + smoke-all seed/sync flakiness#249
Merged
Merged
Conversation
Node 24.16.0 introduced a yauzl stream-destruction regression that hangs `for await` over `openReadStream`. With @playwright/test 1.58 (< 1.60), `playwright install chromium` hangs forever right after the browser download reaches 100% — extraction deadlocks and the job burns the full 6h cap. This is what has failed every Hub-Client E2E run since ~2026-05-27 (the runner's Node 24.x floated past 24.16). Pin node-version to 24.15.0, the last release before the regression. It keeps Playwright 1.58 / Chromium 145, so the visual-regression baselines are unchanged. Remove the pin once @playwright/test is bumped to >= 1.60.0 (the upstream fix, microsoft/playwright#40747), regenerating the visual baselines for the newer Chromium at that time. Refs microsoft/playwright#40724.
6ba3a81 to
faa48e5
Compare
acddaaa to
faa48e5
Compare
…et (de-flake smoke-all) smoke-all is flaky (predates the install hang; ~2/6 fail/flaky of ~92, varying set, worker-count-independent). Root cause is a seed/sync race, not the render pipeline and not parallelism: seedProjectInBrowser -> projectStorage.addProject writes IDB ONLY. A seeded project reaches the *synced* project set only via reconcileIntoConnectedProjectSet, which the app runs in a useProjectSet effect keyed on the status->connected TRANSITION and which requires isConnected(). Because the test seeds AFTER bootstrapProjectSet already reached connected, that effect never re-fires, so the seed lands in the set only if a fortuitous WS reconnect re-triggers reconcile. When it doesn't, the project is absent from the set, navigation to /p/<id> drops to the landing page, and waitForPreviewRender times out at 75s. (Confirmed via the failing run's trace + final screenshot: 'No projects yet', empty set; clean network, the 401 on /auth/me is benign.) Fix (Tier 2 — keep the full Automerge path end-to-end, just stop racing it): expose the live projectSetService + the idempotent reconciler on window.__quartoTest, and have seedProjectInBrowser wait for a real peer connection, run the reconciler, and wait until the project is observably present in the connected set before returning. Bounded 30s waits fail loudly if the sync server is truly unreachable instead of surfacing as a 75s preview-render timeout. Rejected Tier 1 (seed content locally) — it would bypass the very VFS -> Automerge -> WASM -> preview integration smoke-all exists to exercise. workers stays at 2 (parallelism was never the cause; verified workers:1 did not help). Follow-up (product, separate): app could reconcile on IDB change, not only on the status transition. Refs bd-3nzyd.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes two independent problems that were keeping the Hub-Client E2E workflow red. CI is now green (run 26772812401: 78 passed / 14 flaky / 0 failed functional, 6 visual passed, ~13 min with 2 workers).
1. Install hang (Node + Playwright)
The
Install Playwrightstep hung until the 6 h job cap (every run since ~2026-05-27). Root cause: a yauzl extraction-hang regression in Node ≥ 24.16 — after the browser zip downloads to 100 %, extraction deadlocks. It affects Playwright < 1.60.0 (we pin@playwright/testat^1.50.0→ 1.58.0), and our CI runsnode-version: '24'(floats past 24.16). Host-independent; not the CDN. Refs microsoft/playwright#40724, fixed by #40747 in PW 1.60.0.Fix: pin Node to 24.15.0 (last release before the regression). Keeps Playwright 1.58 / Chromium 145, so visual baselines are unchanged. This is a stopgap — the durable fix is upgrading to PW ≥ 1.60 and dropping the pin (tracked in bd-2njja; deferred because 1.60 bumps Chromium 145 → 148 and needs a visual-baseline regen).
2.
smoke-allflakiness (seed-before-synced race)Unblocking the install surfaced pre-existing
smoke-allflakiness (same signature on the 2026-05-27 run; not caused by the Node pin). Root cause is a test-harness race, not the render pipeline, not auth (the 401 on/auth/meis the benign anonymous response), and not parallelism (verified:workers: 1did not help):Fix (keeps the full VFS → Automerge → WASM → preview path end-to-end — just stops racing it): expose the live
projectSetService+ the idempotent reconciler onwindow.__quartoTest, and haveseedProjectInBrowserwait for a real peer connection, run the reconciler, and wait until the project is observably present in the connected set before returning. Bounded 30 s waits fail loudly if the sync server is truly unreachable.workersstays at 2.Rejected the alternative of seeding content locally — it would bypass the very integration
smoke-allexists to exercise.Follow-ups (tracked)
@playwright/test≥ 1.60.0, drop the Node pin, regen Chromium-148 visual baselines.smoke-allflakiness: 14 tests still pass-on-retry (likely project file-doc sync during render, a different doc than the project-set race fixed here); harden similarly.Note on main
Two earlier direct-to-main commits (
5f16330f,78f563b7) tried CDN-based install fixes that didn't work; this PR restores the pristine workflow and supersedes them.