Add S3 prefix to reduce rate limiting by fulmicoton-dd · Pull Request #6564 · quickwit-oss/quickwit

fulmicoton-dd · 2026-06-30T07:46:52Z

Description

We observed rate limiting from s3 on high workloads. Limits are set per prefix so that the fact that we are using ULID is counterproductive and create hotspots.

This PR introduces a new env variable called QW_RANDOM_SPLIT_PREFIX. When enabled, we will name
splits by moving 4 chars from back of the random section to the front of the split id. We also separate this prefix from the rest of the string with a _.

This PR contains two commits. One tries to encapsulate further the notion of Ulid.
The indexing split store was relying on ULID to get a notion of creation date that influence eviction.

The split store is meant to be removed once we have compactors. As an interim solution, we added local logic to identify the type of the split id and correctly extract the timestamp.

Deploy

This change is backward compatible but not forward compatible.
To deploy, one must deploy on all nodes and then flag flip the environment variable.

Replace the `pub type SplitId = String` alias with an opaque `SplitId(Arc<str>)` newtype so that no code outside of split id generation depends on the internal structure of a split id. This is the first step towards adding a random prefix to split ids (to spread S3 read load across prefixes), keeping that future change localized to the generator. The newtype is serde-transparent, so on-disk and wire representations are unchanged (backward compatible). It is adopted by SplitMetadata, the searcher split cache (where Ulid was used as an opaque key) and the indexing split store registry. Wire/proto types, search-internal hit tracking, CLI args and id-list helpers feeding proto requests stay String. After this change ulid::Ulid is referenced for split ids in only two places: - new_split_id(), the generation site - a temporary shim in the indexing split cache that recovers a split's creation time from the trailing ULID of its id, tolerant of a future random prefix. This store is slated for removal with the compactor service.

This is only done in the split id generation phase and ruled by a QW_RANDOM_SPLIT_PREFIX env variable.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e55baf50f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 57b4f22961

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-07-01T12:36:22Z

-                continue;
-            };
+            // Split ids are opaque: any non-empty string is accepted (no ULID validation).
+            let split_id = SplitId::from(report_split.split_id);


Reject path separators before caching reported splits

When a reported split ID contains path components such as ../x or foo/bar, this now accepts it instead of rejecting it via ULID parsing. The split cache later turns that ID into filesystem paths in download_split and delete_evicted_splits with root_path.join(split_file(split_id)), so those IDs can escape the cache directory or cause repeated download failures because nested parent directories are not created. Please keep opaque split IDs path-safe, or reject separators/.., before registering them.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c264342b1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-07-01T16:18:51Z

+            let prefix = &ulid_str[22..]; // chars 22-25: random section
+            let body = &ulid_str[..22]; // chars 0-21: timestamp + most of random
+            SplitId::from(format!("{prefix}_{body}"))


Keep generated split IDs time-ordered for default sorting

When QW_RANDOM_SPLIT_PREFIX is enabled, this makes the stored split_id start with random ULID characters, so lexicographic split-id order is no longer creation order. That affects default searches that omit sort_fields: sort_by_from_request falls back to document-id descending, GlobalDocAddress compares the split id first, and the REST API documents that this default shows recent events first (docs/reference/rest-api.md:70). With prefixed IDs, default match_all/unsorted searches and search_after tie-breaking will prefer arbitrary random prefixes instead of newer splits; consider separating the S3 object key prefix from the logical split/document id, or teaching the default sort to reconstruct/use the ULID timestamp.

Useful? React with 👍 / 👎.

fulmicoton-dd force-pushed the split-id-newtype branch 13 times, most recently from 68bd42d to 2a13d2f Compare June 30, 2026 13:00

fulmicoton-dd force-pushed the split-id-newtype branch from 2a13d2f to 2d30d2a Compare June 30, 2026 13:06

fulmicoton-dd added 2 commits June 30, 2026 17:09

Added support for s3 prefixing.

24e2d16

This is only done in the split id generation phase and ruled by a QW_RANDOM_SPLIT_PREFIX env variable.

Adapting the timestamp from split id code to work on both format.

4e55baf

guilload approved these changes Jun 30, 2026

View reviewed changes

Comment thread quickwit/quickwit-indexing/src/actors/indexer.rs

Comment thread quickwit/quickwit-indexing/src/models/indexed_split.rs Outdated

Comment thread quickwit/quickwit-proto/src/types/split_id.rs Outdated

fulmicoton changed the title ~~Introduce SplitId(Arc<str>) newtype, confine Ulid to split id generation~~ Add S3 prefix to reduce rate limiting Jul 1, 2026

fulmicoton marked this pull request as ready for review July 1, 2026 12:07

fulmicoton requested review from a team as code owners July 1, 2026 12:07

chatgpt-codex-connector Bot reviewed Jul 1, 2026

View reviewed changes

Comment thread quickwit/quickwit-indexing/src/split_store/indexing_split_cache.rs Outdated

fulmicoton-dd force-pushed the split-id-newtype branch from 9f9d13d to 57b4f22 Compare July 1, 2026 12:26

chatgpt-codex-connector Bot reviewed Jul 1, 2026

View reviewed changes

fulmicoton-dd force-pushed the split-id-newtype branch 4 times, most recently from 2d81d25 to 0c26434 Compare July 1, 2026 16:10

chatgpt-codex-connector Bot reviewed Jul 1, 2026

View reviewed changes

Code review

fecf26e

fulmicoton-dd force-pushed the split-id-newtype branch from 0c26434 to fecf26e Compare July 1, 2026 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add S3 prefix to reduce rate limiting#6564

Add S3 prefix to reduce rate limiting#6564
fulmicoton-dd wants to merge 4 commits into
mainfrom
split-id-newtype

fulmicoton-dd commented Jun 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jul 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

fulmicoton-dd commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Deploy

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fulmicoton-dd commented Jun 30, 2026 •

edited

Loading