Skip to content

Add S3 prefix to reduce rate limiting#6564

Open
fulmicoton-dd wants to merge 4 commits into
mainfrom
split-id-newtype
Open

Add S3 prefix to reduce rate limiting#6564
fulmicoton-dd wants to merge 4 commits into
mainfrom
split-id-newtype

Conversation

@fulmicoton-dd

@fulmicoton-dd fulmicoton-dd commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Description

We observed rate limiting from s3 on high workloads. Limits are set per prefix so that the fact that we are using ULID is counterproductive and create hotspots.

This PR introduces a new env variable called QW_RANDOM_SPLIT_PREFIX. When enabled, we will name
splits by moving 4 chars from back of the random section to the front of the split id. We also separate this prefix from the rest of the string with a _.

This PR contains two commits. One tries to encapsulate further the notion of Ulid.
The indexing split store was relying on ULID to get a notion of creation date that influence eviction.

The split store is meant to be removed once we have compactors. As an interim solution, we added local logic to identify the type of the split id and correctly extract the timestamp.

Deploy

This change is backward compatible but not forward compatible.
To deploy, one must deploy on all nodes and then flag flip the environment variable.

@fulmicoton-dd fulmicoton-dd force-pushed the split-id-newtype branch 13 times, most recently from 68bd42d to 2a13d2f Compare June 30, 2026 13:00
Replace the `pub type SplitId = String` alias with an opaque
`SplitId(Arc<str>)` newtype so that no code outside of split id
generation depends on the internal structure of a split id. This is the
first step towards adding a random prefix to split ids (to spread S3
read load across prefixes), keeping that future change localized to the
generator.

The newtype is serde-transparent, so on-disk and wire representations
are unchanged (backward compatible). It is adopted by SplitMetadata, the
searcher split cache (where Ulid was used as an opaque key) and the
indexing split store registry. Wire/proto types, search-internal hit
tracking, CLI args and id-list helpers feeding proto requests stay
String.

After this change ulid::Ulid is referenced for split ids in only two
places:
- new_split_id(), the generation site
- a temporary shim in the indexing split cache that recovers a split's
  creation time from the trailing ULID of its id, tolerant of a future
  random prefix. This store is slated for removal with the compactor
  service.
This is only done in the split id generation phase and ruled by a QW_RANDOM_SPLIT_PREFIX env variable.
Comment thread quickwit/quickwit-indexing/src/actors/indexer.rs
Comment thread quickwit/quickwit-indexing/src/models/indexed_split.rs Outdated
Comment thread quickwit/quickwit-proto/src/types/split_id.rs Outdated
@fulmicoton fulmicoton changed the title Introduce SplitId(Arc<str>) newtype, confine Ulid to split id generation Add S3 prefix to reduce rate limiting Jul 1, 2026
@fulmicoton fulmicoton marked this pull request as ready for review July 1, 2026 12:07
@fulmicoton fulmicoton requested review from a team as code owners July 1, 2026 12:07

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e55baf50f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread quickwit/quickwit-indexing/src/split_store/indexing_split_cache.rs Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 57b4f22961

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

continue;
};
// Split ids are opaque: any non-empty string is accepted (no ULID validation).
let split_id = SplitId::from(report_split.split_id);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject path separators before caching reported splits

When a reported split ID contains path components such as ../x or foo/bar, this now accepts it instead of rejecting it via ULID parsing. The split cache later turns that ID into filesystem paths in download_split and delete_evicted_splits with root_path.join(split_file(split_id)), so those IDs can escape the cache directory or cause repeated download failures because nested parent directories are not created. Please keep opaque split IDs path-safe, or reject separators/.., before registering them.

Useful? React with 👍 / 👎.

@fulmicoton-dd fulmicoton-dd force-pushed the split-id-newtype branch 4 times, most recently from 2d81d25 to 0c26434 Compare July 1, 2026 16:10

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c264342b1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +69 to +71
let prefix = &ulid_str[22..]; // chars 22-25: random section
let body = &ulid_str[..22]; // chars 0-21: timestamp + most of random
SplitId::from(format!("{prefix}_{body}"))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep generated split IDs time-ordered for default sorting

When QW_RANDOM_SPLIT_PREFIX is enabled, this makes the stored split_id start with random ULID characters, so lexicographic split-id order is no longer creation order. That affects default searches that omit sort_fields: sort_by_from_request falls back to document-id descending, GlobalDocAddress compares the split id first, and the REST API documents that this default shows recent events first (docs/reference/rest-api.md:70). With prefixed IDs, default match_all/unsorted searches and search_after tie-breaking will prefer arbitrary random prefixes instead of newer splits; consider separating the S3 object key prefix from the logical split/document id, or teaching the default sort to reconstruct/use the ULID timestamp.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants