feat: add SupportsSetRange protocol and store implementations by d-v-b · Pull Request #3907 · zarr-developers/zarr-python

d-v-b · 2026-04-15T08:44:20Z

Adds a protocol for stores that support synchronously and asynchronously writing a bytes into a range in the target object. only MemoryStore and LocalStore implement this.

this behavior is necessary to enable an in-place writing mode for shards, e.g. where a single subchunk is written without re-writing the entire shard.

Add SupportsSetRange protocol for stores that support writing to a byte range within an existing value (set_range/set_range_sync). Implement in MemoryStore and LocalStore, both explicitly subclassing the protocol. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-04-15T08:54:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.57%. Comparing base (6ce787d) to head (6aa4e6d).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3907      +/-   ##
==========================================
+ Coverage   93.55%   93.57%   +0.01%     
==========================================
  Files          88       88              
  Lines       11896    11930      +34     
==========================================
+ Hits        11129    11163      +34     
  Misses        767      767

Files with missing lines	Coverage Δ
src/zarr/abc/store.py	`96.47% <100.00%> (+0.04%)`	⬆️
src/zarr/storage/_local.py	`97.42% <100.00%> (+0.17%)`	⬆️
src/zarr/storage/_memory.py	`96.95% <100.00%> (+0.21%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Tests cover isinstance check, async set_range, sync set_range_sync, and edge case (writing at end of value). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…r-python into feat/byte-range-setter

maxrjones · 2026-05-15T21:23:53Z

should this get the same design pivot as #3925 (comment) started in #3925, regarding protocols vs. abc methods?

Do you remember where you were previously pointed to using protocols over methods despite the weight of our existing store API? It might be helpful to quickly jot down our decisions here (use methods for now, plan for a better protocol-based store API in the future) in either https://zarr.readthedocs.io/en/stable/contributing/ or a CLAUDE/AGENTS.md for future reference.

d-v-b · 2026-05-15T21:33:01Z

byte-range writes are an optional behavior that only a handful of "niche" stores support (local and memory). There's not really a sensible fallback or default implementation, (unlike get_ranges). So it makes sense for stores to opt in rather than opt out.

And if we made this a method on the Store abc, callers would need to check for NotImplementedError to figure out of the store really supports it, and the method would clutter the signatures of most stores that will never support it (cloud storage).

I don't think we can categorically say "no" to adding functionality to stores or codecs via protocols. There's already a precedent for defining extra functionality with semi-structural mixins: see

zarr-python/src/zarr/abc/codec.py

Line 256 in 7e58df0

class ArrayBytesCodecPartialEncodeMixin:

. Arguably this should have been a protocol from the start.

maxrjones · 2026-05-15T22:08:19Z

I'd prefer someone whose work is more oriented towards local/HPC filesystems review this PR if they're available and willing (@LDeakin and @ilan-gold come to mind).

I'm not fully prepared to discuss tradeoffs, but lack of a concurrency/atomicity contract in the docstring raised a few questions for me:

Is parallel set_range to disjoint ranges of the same key supposed to be safe? The motivating sharded-write use case suggests yes, but LocalStore doesn't seem to have locking.
Is set_range racing against set defined?
Should crash-mid-write atomicity be a protocol requirement?

d-v-b · 2026-05-15T22:19:10Z

Is parallel set_range to disjoint ranges of the same key supposed to be safe? The motivating sharded-write use case suggests yes, but LocalStore doesn't seem to have locking.

yes, in the two target stores (local and memory), disjoint range writes should be safe. overlapping range writes will have order-dependent behavior.

Is set_range racing against set defined?

set + set_range is a race condition, but so are concurrent sets.

Should crash-mid-write atomicity be a protocol requirement?

probably, zarr-python 2.x used a write to a temporary file + a rename for atomicity. we don't do that now, but we should!

It's worth keeping in mind that there is just 1 intended caller of this method, and only under very special circumstances: the sharding codec when the inner chunks have deterministic compressed sizes. I don't know when this method would be called outside that context.

mkitti

I think we may need to try some kind of file or thread locking. At the very least, we should ensure that the partial writes are atomic.

mkitti · 2026-05-29T11:46:01Z

+def _put_range(path: Path, value: Buffer, start: int) -> None:
+    """Write bytes at a specific offset within an existing file."""
+    with path.open("r+b") as f:
+        f.seek(start)
+        f.write(value.as_numpy_array().tobytes())


Do we need a lock here? Or maybe use pwrite when possible and this shim otherwise?

import threading file_lock = threading.Lock() def pwrite_cross_platform(file_object, data, offset): with file_lock: file_object.seek(offset) file_object.write(data) finally: file_object.seek(current_pos)

if we introduce locking we need to be careful about the guarantees we make. thread-based locks will only block races in the same python process, which means dask can still set up races. and file-based locks are only reliable on a subset of storage backends.

The bare minimum is document the need for some kind of concurrency control when doing this.

sounds good, I can add that. the only anticipated consumer of this API will be the sharding codec when targeting uncompressed data in memory or local storage, so the burden of coordination will be on that call site.

mkitti · 2026-05-29T18:39:05Z

Thinking about this a little more, I'm not sure if we should expose all of this as public API.

For the public API, what we could do in lieu of file locking is use a context manager that does the following:

On entering the context, moves, copies, or creates a hard link of the current file being edited to a temporary "lock" file. This is similar to the atomic file write mechanisms that are currently implemented.
Allows for partial writes to the file of interest within the context
On exit, flushes the partial writes to disk and then "moves" the current file back to the orignal location.

What we have now here increasingly looks to me like low-level helper functions than something that should not be exposed to external client software directly. Also, this encourages clients to aggregate partial chunk updates which would increase the potential overhead of opening and closing the file many times.

Under parallel executation situation, we should encourage a single-writer multiple-reader pattern (SWMR), a concept that I'm borrowing from HDF5:

In this case, there should be a single writer that holds the context and lock for doing partial writes to the file. Other parallel units then should communicate with the single writer if they have data that needs to be written. This allows the single writer to coordinate concurrency and ensure consistent atomic partial writes. In some cases, the single writer might even be able to coalesce writes into a single operation. This might involve some buffering of the writes in memory before actually flushing these to disk. If we are smart about memory paging, we could write in units of "super-chunks" that provide more granularity than an entire shard file but that are coarser than individual chunks. At the moment, the I/O bottleneck is often not shear write speed but rather the number of system calls and the required context switching.

d-v-b · 2026-05-29T18:52:11Z

@mkitti thanks, that's insightful. I'm perfectly happy making this private API for now. But I would caution that what you propose sounds like a major (albeit helpful) change to the way stores work today, whereas just adding this method here as-is is closer to the minimal required functionality to support atomic subchunk writes inside a shard.

We should probably agree on scope. IMO, in the short term it would be useful to expose range writes via an opt-in mechanism. In the medium-to-long term, we should figure out an elegant and safe way to express this functionality.

mkitti · 2026-05-29T20:08:40Z

The problem my suggestion is trying to fix is actually to make this operation look more like the prior store operations. I started to have to worry about locks and concurrency because with these partial writes it became unclear if the key-value pair had a single owner or multiple owners.

The prior store API were either pure functions (get_*) or operated on the entire value associated with a key. These partial write operations break that because they manipulate part of the value, allowing the possibility of multiple partial updates which may have to be deconflicted because it is not clear if the partial updates are coordinated.

My suggestion above is more consistent with the prior API because it forces a macro operation on the entire value. It just implements that operation as a series of micro operations within the macro operation. The important aspect here is that the ownership of the whole value must be clear.

Once we establish atomic ownership of the whole key-value, I think there is now a clear path to extend this operation to other stores. These partial updates could be implemented as whole value updates. The partial write operation is just an optimization available to certain stores.

d-v-b · 2026-05-29T20:18:50Z

I started to have to worry about locks and concurrency because with these partial writes it became unclear if the key-value pair had a single owner or multiple owners.

I am not sure thinking about things in terms of ownership helps us much here. When Zarr writes to files / objects, there's always the possibility of a third party accessing the same file / object at the same time. In fact, that's kind of the whole point of the feature in this PR: we want multiple loosely coordinated writers to write subchunks to the same shard. Those writers might be separate threads, or even separate processes. This is only safe under special conditions, but under those conditions it would be a very useful feature.

So in general a single store instance doesn't own the objects it describes. That's unrealistic given the open nature of general storage backends, and also the specific goal of this PR. It might be more realistic to focus on failure modes: e.g., when can a store operation leave a file / object in an undefined state, and what can we do about that?

mkitti · 2026-05-29T21:11:06Z

Prior to this pull request though, I don't think you would expect that a value to have orignated partly from one write and partly from another write leaving the combined value in perhaps in an inconsistent state. It would be one or the other. There indeed may still be a race condition, but at least the chunk would be internally consistent and decodable.

The owner could partition that ownership out to concurrent processes by issuing licenses to certain byte ranges. That could occur by perhaps providing a buffer or a dict-like store structure to write into.

The array provided by TensorStore's virtual chunked to user functions is one example of this:
https://google.github.io/tensorstore/python/api/tensorstore.virtual_chunked.html#

Another example is how ImarisWriter works.

d-v-b · 2026-05-30T07:02:23Z

Prior to this pull request though, I don't think you would expect that a value to have orignated partly from one write and partly from another write leaving the combined value in perhaps in an inconsistent state. It would be one or the other. There indeed may still be a race condition, but at least the chunk would be internally consistent and decodable.

The owner could partition that ownership out to concurrent processes by issuing licenses to certain byte ranges. That could occur by perhaps providing a buffer or a dict-like store structure to write into.

The array provided by TensorStore's virtual chunked to user functions is one example of this: https://google.github.io/tensorstore/python/api/tensorstore.virtual_chunked.html#

Another example is how ImarisWriter works.

Agreed, I think our sharding write path is missing a key : value model for the contents of a shard. If they keys are strings / bytes and the values are buffers and / or arrays, we would need a transformation on that key : valure that resolves each key to a byte range (with space for the index). We have something roughly like this today, but IMO the abstraction is not complete. Partitioning these keys (byte ranges) among workers serves as the substrate for a coordination / allocation mechanism.

But I think this is out of scope for this PR. I think we need to get to that final state incrementally, and IMO the first step is simply defining a low-level routine stores can use for writing a byte range into an object, which is the goal of this PR.

If that makes sense, then:

what needs to change in this PR
what should the follow-up PRs contain?

github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label Apr 15, 2026

Merge branch 'main' into feat/byte-range-setter

e57bd5a

d-v-b and others added 2 commits April 15, 2026 11:03

test: add tests for SupportsSetRange on MemoryStore and LocalStore

579ff16

Tests cover isinstance check, async set_range, sync set_range_sync, and edge case (writing at end of value). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: changelog

2b9d804

github-actions Bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Apr 15, 2026

d-v-b mentioned this pull request Apr 15, 2026

perf: phased codecpipeline #3885

Open

Merge branch 'main' into feat/byte-range-setter

25f05e6

d-v-b requested a review from maxrjones April 21, 2026 19:12

d-v-b added 8 commits April 23, 2026 12:26

Merge branch 'main' into feat/byte-range-setter

f04f594

test: add tests for open / not open

5c26a08

fixup

91590dd

Merge branch 'main' into feat/byte-range-setter

7757ecd

Merge branch 'feat/byte-range-setter' of https://github.com/d-v-b/zar…

e7745ac

…r-python into feat/byte-range-setter

chore: mypy

a9da33a

Merge branch 'main' into feat/byte-range-setter

1af7019

Merge branch 'main' into feat/byte-range-setter

ca60e90

d-v-b added 2 commits May 15, 2026 23:33

Merge branch 'main' into feat/byte-range-setter

25760bf

Merge branch 'main' into feat/byte-range-setter

f3b8afe

Merge branch 'main' into feat/byte-range-setter

b70f70d

d-v-b requested a review from mkitti May 29, 2026 11:38

Merge branch 'main' into feat/byte-range-setter

7121616

mkitti requested changes May 29, 2026

View reviewed changes

Merge branch 'main' into feat/byte-range-setter

6aa4e6d

Uh oh!

Conversation

d-v-b commented Apr 15, 2026

Uh oh!

codecov Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

maxrjones commented May 15, 2026

Uh oh!

d-v-b commented May 15, 2026

Uh oh!

maxrjones commented May 15, 2026

Uh oh!

d-v-b commented May 15, 2026

Uh oh!

mkitti left a comment

Choose a reason for hiding this comment

Uh oh!

mkitti May 29, 2026

Choose a reason for hiding this comment

Uh oh!

d-v-b May 29, 2026

Choose a reason for hiding this comment

Uh oh!

mkitti May 29, 2026

Choose a reason for hiding this comment

Uh oh!

d-v-b May 29, 2026

Choose a reason for hiding this comment

Uh oh!

mkitti commented May 29, 2026

Uh oh!

d-v-b commented May 29, 2026

Uh oh!

mkitti commented May 29, 2026

Uh oh!

d-v-b commented May 29, 2026

Uh oh!

mkitti commented May 29, 2026

Uh oh!

d-v-b commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Apr 15, 2026 •

edited

Loading