Skip to content

Deterministic write mode for cogwsiwriter (reproducible output + byte-goldens) #3

Description

@cornish

Why

The COG-WSI writer's tile-ordering pass (and the transcode/downsample pipeline)
is nondeterministic — regenerating the same input yields a pixel-identical
but byte-different
file. This bit us concretely this session:

  • Regenerating the cog-wsi/CMU-1-Small-Region_cog-wsi.tiff fixture produced a
    different SHA each run, so its wsi-fixtures manifest SHA can only ever pin one
    arbitrary build — you can't reproduce it from source.
  • It forces all output-equivalence checks to use hash --mode pixel rather than
    a cheap file-SHA (documented gotcha), and blocks committing byte-goldens.

Proposal

A deterministic write mode — a fixed, reproducible tile-emission order (e.g.
a --deterministic flag, or make the default order deterministic when workers/
ordering would otherwise vary) such that convert --to cog-wsi (and the other
streaming writers) produce byte-identical output for identical input + flags.

Payoff

  • Reproducible fixtures: wsi-fixtures can regenerate a fixture and get the same
    SHA (today it can't — see wsi-fixtures#1, where generated fixtures need a "regen
    won't reproduce bytes" provenance caveat).
  • Enables byte-goldens in CI (stronger + cheaper than pixel-hash goldens; see
    the round-trip harness issue CI regression harness: write→read round-trip + committed pixel-hash goldens #2).
  • Removes the per-test "use pixel hash not file SHA" footgun.

Notes

  • Scope: the internal/tiff/cogwsiwriter finalize/tile-order pass and the
    internal/pipeline worker-pool ordering. Determinism must hold across worker
    counts (reorder-on-write, or stable-sort the spool by tile index before emit).
  • Keep the nondeterministic/parallel fast path as default if it matters for
    throughput; --deterministic opt-in is fine for fixtures/goldens.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions