I just thought of an addition to the website that IMO would serve well as marketing material for zarr and also be fun.
Id love to propose a new section of the website where users can present their zarr stores (they need to be public) and we can keep track of which zarr store is the largest (either by space on disk or no of chunks), what technology was used (native zarr, virtual icechunk, etc) and optionally have a link to docs that e.g. go into the ingestion code.
A simple sortable/filterable table on the site, something like:
| Store Link |
Total Data on Disk |
Total Chunks |
Tech Implementation |
Submitted By |
Last Verified |
More Info / Docs |
s3://example-bucket/era5-reanalysis.zarr |
2.4 PB |
1.2B |
native zarr v3 |
alice |
2026-05-01 |
Ingestion write-up |
gs://example/cmip6-mirror.zarr |
850 TB |
410M |
icechunk |
bob |
2026-04-22 |
Repo |
s3://example/sentinel2-mosaic |
312 TB |
88M |
virtual icechunk |
carol |
2026-05-10 |
Docs |
https://example.org/goes16.zarr |
95 TB |
22M |
native zarr v3 |
dave |
2026-03-15 |
— |
s3://example/noaa-hrrr.zarr |
41 TB |
9.5M |
native zarr v2 |
erin |
2026-05-20 |
Notebook |
- Store Link — must point to a publicly readable store (anonymous S3/GCS/HTTPS access).
- Total Data on Disk — uncompressed-on-disk size, reported by the submitter and ideally reproducible from store metadata.
- Total Chunks — total number of chunks across all arrays in the store.
- Tech Implementation — controlled vocabulary so the column stays sortable/filterable. Suggested values:
native zarr v2, native zarr v3, icechunk, virtual icechunk, virtualizarr. (Open to additions.)
- Submitted By — GitHub handle for attribution and credibility.
- Last Verified — date the stats were last confirmed; helps keep numbers honest as stores grow or change.
- More Info / Docs — optional link to a blog post, repo, notebook, or docs page explaining the ingestion / use case.
Something like:
- store_link: s3://example-bucket/era5-reanalysis.zarr
size_bytes: 2_400_000_000_000_000
total_chunks: 1_200_000_000
tech: native-zarr-v3
submitted_by: alice
last_verified: 2026-05-01
docs_url: https://example.com/era5-blog
could probably serve as a user supplied source for the page.
Some open questions and my thoughts about them:
- Verification: We could verify the numbers of public stores either on submission or when rebuilding the page? - I think for now this is overkill for fun addition
- At some point we probably want to only show the N-largest stores. For now but Id wait for this until many submissions come in?
- Should we include datasets that need some sort of sign up (NASA EDL etc) but are free to access? Again Id punt this to later.
Happy to implement this, just wanted to float the idea first.
I just thought of an addition to the website that IMO would serve well as marketing material for zarr and also be fun.
Id love to propose a new section of the website where users can present their zarr stores (they need to be public) and we can keep track of which zarr store is the largest (either by space on disk or no of chunks), what technology was used (native zarr, virtual icechunk, etc) and optionally have a link to docs that e.g. go into the ingestion code.
A simple sortable/filterable table on the site, something like:
s3://example-bucket/era5-reanalysis.zarrgs://example/cmip6-mirror.zarrs3://example/sentinel2-mosaichttps://example.org/goes16.zarrs3://example/noaa-hrrr.zarrnative zarr v2,native zarr v3,icechunk,virtual icechunk,virtualizarr. (Open to additions.)Something like:
could probably serve as a user supplied source for the page.
Some open questions and my thoughts about them:
Happy to implement this, just wanted to float the idea first.