You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The COG-WSI writer's tile-ordering pass (and the transcode/downsample pipeline)
is nondeterministic — regenerating the same input yields a pixel-identical
but byte-different file. This bit us concretely this session:
Regenerating the cog-wsi/CMU-1-Small-Region_cog-wsi.tiff fixture produced a
different SHA each run, so its wsi-fixtures manifest SHA can only ever pin one
arbitrary build — you can't reproduce it from source.
It forces all output-equivalence checks to use hash --mode pixel rather than
a cheap file-SHA (documented gotcha), and blocks committing byte-goldens.
Proposal
A deterministic write mode — a fixed, reproducible tile-emission order (e.g.
a --deterministic flag, or make the default order deterministic when workers/
ordering would otherwise vary) such that convert --to cog-wsi (and the other
streaming writers) produce byte-identical output for identical input + flags.
Payoff
Reproducible fixtures: wsi-fixtures can regenerate a fixture and get the same
SHA (today it can't — see wsi-fixtures#1, where generated fixtures need a "regen
won't reproduce bytes" provenance caveat).
Removes the per-test "use pixel hash not file SHA" footgun.
Notes
Scope: the internal/tiff/cogwsiwriter finalize/tile-order pass and the internal/pipeline worker-pool ordering. Determinism must hold across worker
counts (reorder-on-write, or stable-sort the spool by tile index before emit).
Keep the nondeterministic/parallel fast path as default if it matters for
throughput; --deterministic opt-in is fine for fixtures/goldens.
Why
The COG-WSI writer's tile-ordering pass (and the transcode/downsample pipeline)
is nondeterministic — regenerating the same input yields a pixel-identical
but byte-different file. This bit us concretely this session:
cog-wsi/CMU-1-Small-Region_cog-wsi.tifffixture produced adifferent SHA each run, so its
wsi-fixturesmanifest SHA can only ever pin onearbitrary build — you can't reproduce it from source.
hash --mode pixelrather thana cheap file-SHA (documented gotcha), and blocks committing byte-goldens.
Proposal
A deterministic write mode — a fixed, reproducible tile-emission order (e.g.
a
--deterministicflag, or make the default order deterministic when workers/ordering would otherwise vary) such that
convert --to cog-wsi(and the otherstreaming writers) produce byte-identical output for identical input + flags.
Payoff
wsi-fixturescan regenerate a fixture and get the sameSHA (today it can't — see wsi-fixtures#1, where generated fixtures need a "regen
won't reproduce bytes" provenance caveat).
the round-trip harness issue CI regression harness: write→read round-trip + committed pixel-hash goldens #2).
Notes
internal/tiff/cogwsiwriterfinalize/tile-order pass and theinternal/pipelineworker-pool ordering. Determinism must hold across workercounts (reorder-on-write, or stable-sort the spool by tile index before emit).
throughput;
--deterministicopt-in is fine for fixtures/goldens.