From ebe0399dca761ac450e027e8f1d90155a176c7ed Mon Sep 17 00:00:00 2001 From: Florin Braghis Date: Tue, 23 Jun 2026 17:54:57 +0200 Subject: [PATCH 01/15] Add sorted map (Issue 1): walking skeleton for string/keyword keys Implements on-disk sorted maps wrapping the Java SortedMap (rank-augmented B-tree), behaving like Clojure's sorted-map for string and keyword keys. - xitdb.util.sorted-key: order-preserving, reversible key codec (1-byte type tag + UTF-8) for strings and keywords. - xitdb.util.sorted-operations: bridges wrapper types to Read/WriteSortedMap. - xitdb.sorted-map: XITDBSortedMap (read) and XITDBWriteSortedMap (write), modelled on xitdb.hash-map, with ordered seq and print-method. - conversion/v->slot!: detect PersistentTreeMap before the generic map? branch, persist as SORTED_MAP; reject custom comparators. - xitdb-types/read-from-cursor: SORTED_MAP read dispatch. Sorted/Indexed/Reversible (subseq, nth, rseq) and numeric/temporal keys are deferred to Issues 2 and 3. Includes PRD and issue breakdown. Co-Authored-By: Claude Opus 4.8 --- doc/issues/01-walking-skeleton-sorted-map.md | 61 ++++ doc/issues/02-sorted-protocol-map.md | 47 +++ doc/issues/03-numeric-temporal-key-codec.md | 48 +++ doc/issues/04-sorted-set.md | 51 +++ doc/issues/05-rank-and-pagination.md | 40 +++ doc/issues/README.md | 17 + doc/sorted-map-prd.md | 331 +++++++++++++++++++ src/xitdb/sorted_map.clj | 205 ++++++++++++ src/xitdb/util/conversion.clj | 32 +- src/xitdb/util/sorted_key.clj | 68 ++++ src/xitdb/util/sorted_operations.clj | 78 +++++ src/xitdb/xitdb_types.clj | 6 + test/xitdb/sorted_key_test.clj | 23 ++ test/xitdb/sorted_map_test.clj | 119 +++++++ 14 files changed, 1125 insertions(+), 1 deletion(-) create mode 100644 doc/issues/01-walking-skeleton-sorted-map.md create mode 100644 doc/issues/02-sorted-protocol-map.md create mode 100644 doc/issues/03-numeric-temporal-key-codec.md create mode 100644 doc/issues/04-sorted-set.md create mode 100644 doc/issues/05-rank-and-pagination.md create mode 100644 doc/issues/README.md create mode 100644 doc/sorted-map-prd.md create mode 100644 src/xitdb/sorted_map.clj create mode 100644 src/xitdb/util/sorted_key.clj create mode 100644 src/xitdb/util/sorted_operations.clj create mode 100644 test/xitdb/sorted_key_test.clj create mode 100644 test/xitdb/sorted_map_test.clj diff --git a/doc/issues/01-walking-skeleton-sorted-map.md b/doc/issues/01-walking-skeleton-sorted-map.md new file mode 100644 index 0000000..eb771eb --- /dev/null +++ b/doc/issues/01-walking-skeleton-sorted-map.md @@ -0,0 +1,61 @@ +# Issue 1: Walking skeleton — string/keyword-keyed sorted map (read + write) + +Type: AFK +Status: ready-for-agent + +## Parent + +[Sorted Map & Sorted Set PRD](../sorted-map-prd.md) + +## What to build + +The end-to-end walking skeleton that makes a persisted `sorted-map` a working, +ordered, on-disk Clojure collection — for **string and keyword keys only**. +This slice threads every integration layer once so the remaining slices can +extend it: + +- A small **order-preserving key codec** (`xitdb.util.sorted-key`) with a stable + 1-byte type tag per key type. This slice implements the tag infrastructure plus + the **string** and **keyword** encodings (UTF-8 bytes, which already sort in + code-point order). Interface: `encode-key ^bytes [k]` and `decode-key [^bytes]`. +- **Construction detection**: `conversion/v->slot!` (and the nested writers) + recognise `clojure.lang.PersistentTreeMap` and persist it as a `SORTED_MAP`. + The tree-map branch must be checked **before** the generic `map?` branch, since + a tree map is also a `map?`. If the tree map carries a non-default comparator, + throw `IllegalArgumentException` (custom comparators are not supported). +- **Read dispatch**: `xitdb-types/read-from-cursor` returns `XITDBSortedMap` (read) + or `XITDBWriteSortedMap` (write) for the `SORTED_MAP` tag, mirroring the + existing `HASH_MAP` cases. +- **Wrapper types** (`xitdb.sorted-map`), modelled on `xitdb.hash-map`: + - `XITDBSortedMap` (read): `ILookup`, `Associative`, `IPersistentMap`, + `Counted`, `Seqable` (ascending ordered `seq`), `IFn`, `Iterable`, + `IKVReduce`, plus `common/ISlot`/`IUnwrap`/`IMaterialize`/ + `IMaterializeShallow`. Read-only `assoc`/`dissoc`/`cons` materialise-shallow + and return a plain Clojure `sorted-map`. + - `XITDBWriteSortedMap` (write): mutating `assoc`/`without`/`cons`/`empty` + against the live `WriteSortedMap`, plus `IReadOnly`. +- A **sorted-operations** namespace (`xitdb.util.sorted-operations`) bridging the + types to the Java `Read/WriteSortedMap` (`put`/`remove`/`getCursor`/`count`/ + `iterator`, decoding keys on read). +- `print-method` for both types (ordered output, `#XITDBSortedMap`). + +`Sorted`/`Indexed`/`Reversible` (subseq, nth, rseq) are intentionally deferred to +Issue 2. Numeric/temporal keys are deferred to Issue 3. + +## Acceptance criteria + +- [ ] `(reset! db (sorted-map "b" 2 "a" 1))` then `@db` seqs as `(["a" 1] ["b" 2])` in key order. +- [ ] `(get @db "a")`, `(@db "a")`, `(:k ...)`-style lookup, `(contains? @db "a")`, `(find @db "a")` all work. +- [ ] `(count @db)` is correct and O(1) (delegates to `ReadSortedMap.count()`). +- [ ] `(swap! db assoc "c" 3)` keeps order; `(swap! db dissoc "a")` removes and preserves order; re-assoc of an existing key replaces the value without changing count. +- [ ] Keyword keys round-trip to keywords and sort correctly; string keys round-trip to strings. +- [ ] `(sorted? @db)` is **not** required yet, but `(materialize @db)` returns a plain Clojure `sorted-map` with matching order. +- [ ] Read-only `assoc`/`dissoc` (outside a transaction) returns a plain Clojure sorted collection, not an `XITDB*` type — consistent with `XITDBHashMap`. +- [ ] Persisting a `sorted-map-by` with a custom comparator throws `IllegalArgumentException`. +- [ ] A sorted map nests inside a hash map value and round-trips; values may be vectors/maps/sets. +- [ ] `(tu/db-equal-to-atom? db)` style round-trip holds for a string/keyword-keyed sorted map. +- [ ] `print-method` renders ordered, distinguishable output. + +## Blocked by + +None - can start immediately. diff --git a/doc/issues/02-sorted-protocol-map.md b/doc/issues/02-sorted-protocol-map.md new file mode 100644 index 0000000..7f77694 --- /dev/null +++ b/doc/issues/02-sorted-protocol-map.md @@ -0,0 +1,47 @@ +# Issue 2: `clojure.lang.Sorted` for the sorted map — subseq / rsubseq / rseq / nth / sorted? + +Type: AFK +Status: ready-for-agent + +## Parent + +[Sorted Map & Sorted Set PRD](../sorted-map-prd.md) + +## What to build + +Make `XITDBSortedMap` a fully sorted Clojure collection by implementing the +three interfaces that `clojure.core` builds its ordered operations on, so +`sorted?`, `subseq`, `rsubseq`, `rseq`, and indexed `nth` all work against disk. + +- `clojure.lang.Sorted`: + - `comparator` → a comparator consistent with the codec's natural ordering, so + `subseq`'s own bound checks agree with the engine. + - `entryKey` → `key` of the MapEntry. + - `seq(ascending?)` → ascending uses `iterator()`; descending uses a + rank-based index walk (there is no native reverse iterator). + - `seqFrom(k, ascending?)` → ascending maps directly to + `ReadSortedMap.iteratorFrom(encode k)` (native O(log n) lower-bound seek); + descending uses `rank(encode k)` + a descending `getIndexKeyValuePair` walk. +- `clojure.lang.Indexed`: + - `nth(i)` / `nth(i, not-found)` → `getIndexKeyValuePair(i)` returning a + MapEntry (decode key, read value). Support negative indices per Java + semantics (`-1` = last). +- `clojure.lang.Reversible`: + - `rseq` → descending lazy seq (index walk from `count-1` down). + +Descending seqs must stay lazy and low-memory (step via `getIndexKeyValuePair`, +do not materialise the whole map). + +## Acceptance criteria + +- [ ] `(sorted? @db)` returns `true` for a persisted sorted map. +- [ ] `(subseq @db >= k)`, `> k`, `<= k`, `< k`, and the two-bound form all return the same entries (in order) as the equivalent plain-Clojure `sorted-map` oracle. +- [ ] `(rsubseq @db ...)` mirrors the plain-Clojure oracle for all test/bound forms. +- [ ] `(seq @db)` is ascending; `(rseq @db)` is descending; both lazy. +- [ ] `(nth @db i)` returns the entry at rank `i` in O(log n); `(nth @db -1)` returns the last entry; out-of-range honours `not-found`/throws like a vector. +- [ ] `subseq`/`rsubseq` on an empty (none-cursor) sorted map yield nothing. +- [ ] `(comparator @db)` is consistent with iteration order (subseq bound filtering agrees with engine order). + +## Blocked by + +- [Issue 1: Walking skeleton — string/keyword-keyed sorted map](01-walking-skeleton-sorted-map.md) diff --git a/doc/issues/03-numeric-temporal-key-codec.md b/doc/issues/03-numeric-temporal-key-codec.md new file mode 100644 index 0000000..fbf9021 --- /dev/null +++ b/doc/issues/03-numeric-temporal-key-codec.md @@ -0,0 +1,48 @@ +# Issue 3: Numeric & temporal key codec — long, double, inst/date + +Type: AFK +Status: ready-for-agent + +## Parent + +[Sorted Map & Sorted Set PRD](../sorted-map-prd.md) + +## What to build + +Extend the order-preserving key codec (`xitdb.util.sorted-key`) with tagged +encodings for the remaining v1 key types, so that numeric and temporal keys sort +in their natural order on disk. No wrapper-type changes are needed — the sorted +map (and later the sorted set) call `encode-key`/`decode-key`, so they gain these +key types automatically once the codec supports them. + +Encodings (each carries its own type tag; the tag also defines a stable +cross-type order so heterogeneous keys never throw): + +- **Long / integer** → tag + 8-byte big-endian with the **sign bit flipped** + (XOR `0x80` on the top byte). Makes signed integers sort correctly as unsigned + bytes: negatives before positives, ascending within each. (Same technique the + Java library uses for its creation-time index example.) +- **Double** → tag + IEEE-754 8-byte big-endian with the order-preserving bit + flip: if the sign bit is set, flip all bits; otherwise flip only the sign bit. +- **Instant** → tag + big-endian epoch encoding (e.g. epoch-second + nanos) so + chronological order equals byte order; decodes back to `Instant`. +- **Date** → tag + big-endian epoch encoding (distinct tag from `Instant`); + decodes back to `java.util.Date`. + +This slice is the correctness-critical one and must ship with property-based +ordering tests (see Testing Decisions in the PRD). + +## Acceptance criteria + +- [ ] `(reset! db (sorted-map 9 :a 10 :b 1 :c))` iterates numerically as `1, 9, 10` (not lexically). +- [ ] Negative and positive long keys sort correctly together (e.g. `-5 < 0 < 3`), including `Long/MIN_VALUE` and `Long/MAX_VALUE`. +- [ ] Double keys sort numerically, including negatives, zero, and large magnitudes. +- [ ] `Instant` keys iterate in chronological order and round-trip to `Instant`; `Date` keys likewise round-trip to `Date`. +- [ ] Round-trip property: `(= k (decode-key (encode-key k)))` for every supported type. +- [ ] Order-preservation property (generative): for random same-type pairs `a`,`b`, `sign(compareUnsigned(encode a, encode b)) == sign(compare a b)`. +- [ ] Cross-type ordering is total and never throws. +- [ ] Unsupported key types throw a clear error. + +## Blocked by + +- [Issue 1: Walking skeleton — string/keyword-keyed sorted map](01-walking-skeleton-sorted-map.md) diff --git a/doc/issues/04-sorted-set.md b/doc/issues/04-sorted-set.md new file mode 100644 index 0000000..93668c3 --- /dev/null +++ b/doc/issues/04-sorted-set.md @@ -0,0 +1,51 @@ +# Issue 4: Sorted set end-to-end — `XITDBSortedSet` / `XITDBWriteSortedSet` + +Type: AFK +Status: ready-for-agent + +## Parent + +[Sorted Map & Sorted Set PRD](../sorted-map-prd.md) + +## What to build + +The set counterpart to the sorted map: persist a `clojure.lang.PersistentTreeSet` +as an on-disk `SORTED_SET` and expose it as a fully ordered Clojure set. Reuses +the key codec (Issues 1 + 3) and the `Sorted`/`Indexed`/`Reversible` machinery +established for the map (Issue 2). + +- **Construction detection**: `conversion/v->slot!` recognises + `clojure.lang.PersistentTreeSet` (checked before the generic `set?` branch) and + writes a `SORTED_SET`. Reject non-default comparators with `IllegalArgumentException`. +- **Read dispatch**: `read-from-cursor` returns `XITDBSortedSet` / + `XITDBWriteSortedSet` for the `SORTED_SET` tag. +- **Wrapper types** (`xitdb.sorted-set`), modelled on `xitdb.hash-set`: + - `XITDBSortedSet` (read): `IPersistentSet` (`contains?`/`get`/`disjoin`), + `Counted`, `Seqable` (ordered), `IFn`, `Iterable`, plus `ISlot`/`IUnwrap`/ + `IMaterialize`/`IMaterializeShallow`, **and** `Sorted`/`Indexed`/`Reversible` + so `subseq`/`rsubseq`/`rseq`/`nth`/`sorted?` work over the set. Read-only + `conj`/`disj` return a plain Clojure `sorted-set`. + - `XITDBWriteSortedSet` (write): mutating `conj`/`disjoin`/`empty` against the + live `WriteSortedSet`, plus `IReadOnly`. +- Set operations in `xitdb.util.sorted-operations` over the Java + `Read/WriteSortedSet` (`put`/`remove`/`contains`/`count`/`iterator`/ + `getIndexKeyValuePair`), decoding members on read. +- `print-method` (`#XITDBSortedSet`, ordered). +- `materialize` returns a plain Clojure `sorted-set` with matching order. + +## Acceptance criteria + +- [ ] `(reset! db (sorted-set 3 1 2))` then `@db` seqs as `(1 2 3)`. +- [ ] `(contains? @db 2)` works; `(get @db 2)` returns the member. +- [ ] `(swap! db conj 5)` and `(swap! db disj 1)` keep order; adding a duplicate is a no-op and does not change count. +- [ ] `(count @db)` is correct and O(1). +- [ ] `(sorted? @db)` is `true`; `(subseq @db >= 2)`, `(rsubseq ...)`, `(nth @db 0)`, `(rseq @db)` all match the plain-Clojure `sorted-set` oracle. +- [ ] String, keyword, long, double, and inst/date members each iterate in correct natural order. +- [ ] Read-only `conj`/`disj` (outside a transaction) returns a plain Clojure sorted set, not an `XITDB*` type. +- [ ] `(materialize @db)` returns a plain `sorted-set` with matching order. +- [ ] A sorted set nests inside other structures and round-trips. + +## Blocked by + +- [Issue 2: `clojure.lang.Sorted` for the sorted map](02-sorted-protocol-map.md) +- [Issue 3: Numeric & temporal key codec](03-numeric-temporal-key-codec.md) diff --git a/doc/issues/05-rank-and-pagination.md b/doc/issues/05-rank-and-pagination.md new file mode 100644 index 0000000..bbeb7c2 --- /dev/null +++ b/doc/issues/05-rank-and-pagination.md @@ -0,0 +1,40 @@ +# Issue 5: `rank` + pagination public helpers + +Type: AFK +Status: ready-for-agent + +## Parent + +[Sorted Map & Sorted Set PRD](../sorted-map-prd.md) + +## What to build + +Expose the rank-augmented B-tree "superpowers" that go beyond `clojure.core`'s +in-memory sorted collections, as a small public surface usable on both +`XITDBSortedMap` and `XITDBSortedSet`. + +- **`rank`** — given a key/member, return the number of entries strictly less + than it (its index), in O(log n). Backed by `ReadSortedMap.rank` / + `ReadSortedSet.rank`. It is the inverse of indexed `nth`. +- **Pagination helper** — an offset/limit (or "from index N, take K") accessor + backed by `ReadSortedMap.iteratorFromIndex` / `iteratorFromIndex`, returning a + lazy ordered seq starting at a rank. This makes serving ordered, paged queries + from disk efficient (the motivating secondary-index use case in the PRD). + +Place these in a stable namespace (e.g. `xitdb.sorted` or extend `xitdb.db`) +and document them as the recommended way to build/paginate on-disk secondary +indexes. + +## Acceptance criteria + +- [ ] `(rank m k)` returns the correct index for present keys/members, and the would-be insertion index for absent ones, in O(log n). +- [ ] `rank` and indexed `nth` are inverses: `(= i (rank m (key (nth m i))))` for all `i`. +- [ ] The pagination helper returns the correct ordered page for a given offset/limit and stops at the end of the collection. +- [ ] Both helpers work on `XITDBSortedMap` and `XITDBSortedSet`. +- [ ] Pagination is lazy and does not materialise the whole collection. +- [ ] A doc example shows building a timestamp→id secondary index and paging through it. + +## Blocked by + +- [Issue 1: Walking skeleton — string/keyword-keyed sorted map](01-walking-skeleton-sorted-map.md) +- [Issue 4: Sorted set end-to-end](04-sorted-set.md) diff --git a/doc/issues/README.md b/doc/issues/README.md new file mode 100644 index 0000000..59af80c --- /dev/null +++ b/doc/issues/README.md @@ -0,0 +1,17 @@ +# Sorted Map & Sorted Set — implementation issues + +Tracer-bullet slices for [the PRD](../sorted-map-prd.md). Each is a thin vertical +slice through every layer (codec → construction detection → read dispatch → +wrapper type → tests) and is independently verifiable. All are AFK. + +| # | Slice | Blocked by | +|---|-------|-----------| +| [1](01-walking-skeleton-sorted-map.md) | Walking skeleton: string/keyword-keyed sorted map (read + write) | — | +| [2](02-sorted-protocol-map.md) | `clojure.lang.Sorted` for the map — subseq/rsubseq/rseq/nth/sorted? | 1 | +| [3](03-numeric-temporal-key-codec.md) | Numeric & temporal key codec — long, double, inst/date | 1 | +| [4](04-sorted-set.md) | Sorted set end-to-end (`XITDBSortedSet`/`XITDBWriteSortedSet`) | 2, 3 | +| [5](05-rank-and-pagination.md) | `rank` + pagination public helpers | 1, 4 | + +## Suggested order + +1 → (2 and 3 in parallel) → 4 → 5 diff --git a/doc/sorted-map-prd.md b/doc/sorted-map-prd.md new file mode 100644 index 0000000..3401781 --- /dev/null +++ b/doc/sorted-map-prd.md @@ -0,0 +1,331 @@ +# PRD: Sorted Map & Sorted Set support for xitdb-clj + +Status: ready-for-implementation +Date: 2026-06-23 + +## Problem Statement + +As a user of xitdb-clj, I can persist hash maps, hash sets, array lists and +linked lists, but I have no way to keep keys (or set members) **in order** on +disk. When I need range queries, ordered iteration, pagination, or "the entry +at position N", I have to load the whole collection into memory and sort it in +Clojure on every read. That defeats the point of an embedded, immutable, +on-disk database — and it does not scale to large collections or secondary +indexes (e.g. "all posts created between T1 and T2, page 3"). + +The upstream Java library (`io.github.radarroark.xitdb`) now ships a +rank-augmented B-tree exposed as `SortedMap` and `SortedSet`. xitdb-clj has no +Clojure types that wrap them, so none of this capability is reachable from +Clojure today. + +## Solution + +Add two new pairs of wrapper types — `XITDBSortedMap` / `XITDBWriteSortedMap` +and `XITDBSortedSet` / `XITDBWriteSortedSet` — that wrap the Java +`ReadSortedMap`/`WriteSortedMap` and `ReadSortedSet`/`WriteSortedSet`. These +behave like first-class Clojure sorted collections: they implement +`clojure.lang.Sorted`, so `sorted?`, `subseq`, `rsubseq`, `seq`, and `rseq` +work out of the box, and they additionally implement `clojure.lang.Indexed` +(O(log n) `nth` by rank) and `clojure.lang.Reversible`. + +Construction is fully idiomatic and requires **no new public API**: when a value +written to the database is a `clojure.lang.PersistentTreeMap` (i.e. a +`sorted-map`) or a `clojure.lang.PersistentTreeSet` (a `sorted-set`), xitdb-clj +persists it as an on-disk `SORTED_MAP` / `SORTED_SET`. Reading it back returns +the corresponding `XITDBSorted*` type. So: + +```clojure +(reset! db (sorted-map 3 :c 1 :a 2 :b)) +(subseq @db >= 2) ;; => ([2 :b] [3 :c]) +(nth @db 0) ;; => [1 :a] ; O(log n), not O(n) +(rseq @db) ;; => ([3 :c] [2 :b] [1 :a]) +(sorted? @db) ;; => true +``` + +The single honest limitation: ordering is the engine's fixed natural ordering +(produced by an order-preserving key codec). **Custom comparators +(`sorted-map-by` / `sorted-set-by`) are not supported** — the comparison lives +in the Java B-tree as unsigned byte comparison, not in a pluggable Clojure fn. + +## User Stories + +1. As a developer, I want to write a `(sorted-map ...)` to the db, so that it is + persisted as an ordered on-disk structure without me learning a new API. +2. As a developer, I want to write a `(sorted-set ...)` to the db, so that set + members are kept in sorted order on disk. +3. As a developer, I want `(sorted? db-value)` to return `true` for a persisted + sorted map/set, so that generic code can detect orderedness. +4. As a developer, I want to call `(subseq m >= k)`, `(subseq m > k)`, + `(subseq m <= k)`, `(subseq m < k)` and the two-bound form, so that I can run + ascending range queries directly against disk. +5. As a developer, I want `(rsubseq m ...)` with the same test/bound forms, so + that I can run descending range queries. +6. As a developer, I want `(seq m)` to iterate entries in ascending key order, so + that ordered traversal is the default. +7. As a developer, I want `(rseq m)` to iterate entries in descending key order, + so that I can walk from the largest key down. +8. As a developer, I want `(nth m i)` to return the entry at rank `i` in + O(log n), so that positional access and pagination are cheap even for large + maps. +9. As a developer, I want `(nth m -1)`/negative indexing semantics surfaced via a + helper, so that I can get the last/last-k entries without counting. +10. As a developer, I want `(get m k)` / `(m k)` / `(:k m)` lookups, so that a + sorted map is a drop-in associative read. +11. As a developer, I want `(contains? m k)` and `(find m k)`, so that presence + checks and entry retrieval work like any map. +12. As a developer, I want `(count m)` to be O(1), so that size checks are cheap. +13. As a developer, I want to `(swap! db assoc k v)` a sorted map inside a + transaction, so that inserts keep the structure ordered and persistent. +14. As a developer, I want to `(swap! db dissoc k)` a sorted map, so that I can + remove keys while preserving order. +15. As a developer, I want `(swap! db conj v)` / `(swap! db disj v)` on a sorted + set, so that membership edits preserve order. +16. As a developer, I want re-assoc'ing an existing key to replace the value and + not change the count, so that updates behave like a normal map. +17. As a developer, I want string and keyword keys to sort in their natural + (code-point) order, so that text indexes read correctly. +18. As a developer, I want long/integer keys to sort numerically (so `9 < 10`, + and negatives before positives), so that numeric indexes behave intuitively. +19. As a developer, I want double keys to sort numerically, so that floating + point indexes are ordered correctly. +20. As a developer, I want `java.time.Instant` and `java.util.Date` keys to sort + chronologically, so that I can build time-ordered secondary indexes. +21. As a developer, I want keys to round-trip to their exact original Clojure + type on read, so that `(keys m)` and entry keys are not stringly-typed. +22. As a developer, I want to build a timestamp→id secondary index and paginate + it (offset/limit) efficiently, so that I can serve ordered, paged queries + from disk. +23. As a developer, I want a `rank` operation ("how many keys are strictly less + than k") in O(log n), so that I can compute a key's position / build + pagination cursors. +24. As a developer, I want sorted maps/sets to nest inside other structures + (e.g. a hash map whose value is a sorted map), so that I can model rich + documents. +25. As a developer, I want a sorted map value to nest arbitrary values (vectors, + maps, sets) as its values, so that the value side is as flexible as a hash + map's. +26. As a developer, I want `(materialize sorted-map-value)` to return a plain + Clojure `sorted-map` with the same ordering, so that I can fully realise it + in memory. +27. As a developer, I want `(empty sorted-map)` semantics to produce an empty + ordered structure, so that clearing works in a transaction. +28. As a developer, I want `=` / `equiv` to compare a persisted sorted map to a + plain Clojure map by contents, so that test assertions read naturally. +29. As a developer reading the result of `assoc`/`dissoc` on a **read-only** + sorted map (outside a transaction), I want a plain Clojure sorted collection + back, so that the immutable-read contract matches the existing hash + map/set types. +30. As a developer, I want a clear, early error if I try to persist a + `sorted-map-by`/`sorted-set-by` with a custom comparator, so that I am not + silently given a different ordering. +31. As a developer, I want a clear error if I use an unsupported key type, so + that I fail fast instead of getting corrupt ordering. +32. As a developer, I want the print representation of a persisted sorted map/set + to be distinguishable (e.g. `#XITDBSortedMap{...}`) and ordered, so that REPL + output is legible. +33. As a developer using multiple threads, I want sorted-map reads to work from + reader threads like the other types, so that concurrency behaves consistently. + +## Implementation Decisions + +### Construction trigger (no new public API) + +- `conversion/v->slot!` and the nested writers (`coll->ArrayListCursor!`, + `map->WriteHashMapCursor!`, etc.) gain branches that detect + `clojure.lang.PersistentTreeMap` and `clojure.lang.PersistentTreeSet` **before** + the generic `map?` / `set?` branches (a tree map is also a `map?`, so order of + checks matters). These write `SORTED_MAP` / `SORTED_SET` respectively. +- If the detected tree map/set carries a **non-default comparator**, throw + `IllegalArgumentException` ("custom comparators are not supported; sorted + collections use natural ordering"). Detection: compare `.comparator` against + `clojure.lang.RT/DEFAULT_COMPARATOR` / `compare`. + +### Read dispatch + +- `xitdb-types/read-from-cursor` gains `SORTED_MAP` and `SORTED_SET` cases that + return `XITDBSortedMap`/`XITDBSortedSet` (read) or the `Write` variants when + `for-writing?` is true — mirroring the existing `HASH_MAP`/`HASH_SET` cases. + +### New module: key codec (deep module — the heart of this PRD) + +A new namespace (e.g. `xitdb.util.sorted-key`) provides a bijective, +**order-preserving** encoding between supported Clojure key values and `byte[]`, +such that `Arrays.compareUnsigned(encode(a), encode(b))` has the same sign as the +natural ordering of `a` and `b`. + +- Interface (small, stable): + - `encode-key ^bytes [k]` — Clojure key → order-preserving bytes. + - `decode-key [^bytes b]` — bytes → original Clojure key (exact type). +- Each encoding is prefixed with a 1-byte **type tag** that also defines a stable + cross-type ordering, so heterogeneous keys never throw (a strict improvement + over Clojure's `compare`, which throws across classes). Even though v1's + primary contract is single-type maps, the tag makes the encoding total. +- Supported key types for v1 (per decision): + - **String** → tag + UTF-8 bytes. UTF-8 byte order equals Unicode code-point + order, so no transformation needed. + - **Keyword** → tag + UTF-8 of the (namespace-qualified) name. Reuses + `conversion/keyname`. + - **Long / integer** → tag + 8-byte big-endian with the **sign bit flipped** + (XOR `0x80` on the top byte), which makes signed integers sort correctly as + unsigned bytes (negatives < positives, ascending within each). This is the + same big-endian-with-flipped-sign technique used in the Java library's own + `testSortedMap` example for a creation-time index. + - **Double** → tag + IEEE-754 8-byte big-endian with the order-preserving bit + flip (if sign bit set, flip all bits; else flip only the sign bit). Handles + negative/positive ordering. (NaN handling: documented as undefined / rejected.) + - **Instant / Date** → tag + big-endian epoch encoding (e.g. epoch-second + + nanos, or epoch-milli) so chronological order = byte order. `Date` decodes + back to `Date`, `Instant` back to `Instant` (distinct tags). +- The codec is the single place ordering correctness lives; it is pure + (no DB handle needed) and unit-testable in isolation. + +> Note: this is intentionally **separate** from the existing +> `conversion/db-key-hash`, which SHA-1-hashes keys for hash maps. Hashing +> destroys order and identity; sorted keys must be stored as their +> order-preserving bytes and recovered via the key cursor. + +### New module: sorted-map operations + +A namespace parallel to `xitdb.util.operations` (e.g. `xitdb.util.sorted-operations`) +holding the imperative bridge between the wrapper types and the Java API: + +- `sorted-map-assoc-value!` — `encode-key` then `WriteSortedMap.putCursor(key)` + + write value slot via `conversion/v->slot!`. +- `sorted-map-dissoc-key!` — `WriteSortedMap.remove(encoded)`. +- `sorted-map-get-cursor` — `ReadSortedMap.getCursor(encoded)` (nil when absent). +- `sorted-map-contains?` — non-nil cursor / `getKeyValuePair`. +- `sorted-map-count` — `ReadSortedMap.count()` (O(1)). +- `sorted-map-rank` — `ReadSortedMap.rank(encoded)`. +- `sorted-map-nth` — `ReadSortedMap.getIndexKeyValuePair(i)` → MapEntry + (decode key, read value); supports negative indices per Java semantics. +- `sorted-map-seq` — lazy seq of `MapEntry` from `iterator()` (ascending), + decoding keys. +- `sorted-map-seq-from` — ascending lazy seq from `iteratorFrom(encoded)`. +- `sorted-map-rseq` / descending-from — built on `rank` + descending + `getIndexKeyValuePair` walk (no native reverse iterator exists; this is the + agreed implementation strategy). Lazy and low-memory. +- Set variants (`sorted-set-*`) over `ReadSortedSet`/`WriteSortedSet` + (`put`/`remove`/`contains`/`rank`/`getIndexKeyValuePair`/iterators); members + are decoded keys, no value side. + +### New module: the wrapper types + +`xitdb.sorted-map` and `xitdb.sorted-set`, modelled on `xitdb.hash-map` / +`xitdb.hash-set`. + +- `XITDBSortedMap` (read) implements: + - `clojure.lang.ILookup`, `Associative`, `IPersistentMap`, + `IPersistentCollection`, `Counted`, `Seqable`, `IFn`, `Iterable`, + `IKVReduce`, plus `common/ISlot` / `IUnwrap` / `IMaterialize` / + `IMaterializeShallow`. + - `clojure.lang.Sorted`: `comparator`, `entryKey`, `seq(ascending?)`, + `seqFrom(k, ascending?)` — this is what powers `subseq`/`rsubseq`/`sorted?`. + - `clojure.lang.Indexed`: `nth` → rank-based `getIndexKeyValuePair`. + - `clojure.lang.Reversible`: `rseq`. + - Read-only `assoc`/`dissoc`/`cons` materialise-shallow then operate (return a + plain Clojure `sorted-map`), matching `XITDBHashMap` behaviour. +- `XITDBWriteSortedMap` (write) implements the mutating `assoc`/`without`/`cons`/ + `empty` against the live `WriteSortedMap`, plus `IReadOnly`, mirroring + `XITDBWriteHashMap`. +- `XITDBSortedSet` / `XITDBWriteSortedSet` analogously implement + `IPersistentSet` + `Sorted` + `Indexed` + `Reversible`. +- `print-method` registered for each, ordered output, distinct tags + (`#XITDBSortedMap` / `#XITDBSortedSet`). + +### `clojure.lang.Sorted` contract mapping (the load-bearing detail) + +- `seqFrom(k, true)` → `iteratorFrom(encode k)` (native O(log n) lower bound). +- `seqFrom(k, false)` → `rank(encode k)` then descending index walk. +- `seq(true)` → `iterator()`; `seq(false)` → descending index walk. +- `comparator` → a comparator consistent with the codec's natural ordering + (so `subseq`'s own bound checks agree with the engine). +- `entryKey` → `key` of the MapEntry (map) / identity (set). + +### Public surface for the "superpowers" + +Expose, from a stable namespace (e.g. `xitdb.db` or a new `xitdb.sorted`): +- `rank` — key → index, O(log n). +- (Optional) a `nth`/`get-by-index` convenience already covered by `Indexed`. +- (Optional) a paginate helper built on `iteratorFromIndex`. + +## Testing Decisions + +Good tests here verify **external behavior** — what a user observes through the +Clojure collection API — not the internal byte layout of the B-tree or the +private shape of the operations namespace. Assertions compare against plain +Clojure `sorted-map`/`sorted-set` built with the same data, which is the +ground-truth oracle for ordering. + +Prior art to mirror: +- `test/xitdb/set_test.clj` and `test/xitdb/map_test.clj` — `with-db` fixture, + `reset!`/`swap!`, `(tu/db-equal-to-atom? db)` round-trip checks, + read-only-return-type assertions. +- `test/xitdb/data_types_test.clj` — per-key-type round-tripping. +- `test/xitdb/generated_data_test.clj` / `gen_map.clj` — generative coverage. + +Modules and what to test: + +1. **Key codec (`xitdb.util.sorted-key`)** — the priority for unit tests, in + isolation, no DB: + - Round-trip: `(= k (decode-key (encode-key k)))` for each supported type. + - **Order preservation (property-based)**: for random pairs `a`, `b` of the + same type, `sign(compareUnsigned(encode a, encode b)) = sign(compare a b)`. + Cover negatives, zero, large longs, `Long/MIN_VALUE`/`MAX_VALUE`, negative + and positive doubles, sub-second instants. + - Cross-type total ordering is stable (no exceptions across types). + - Unsupported type / custom-comparator → throws. +2. **Sorted map/set integration tests** (with `with-db`), mirroring the Java + `testSortedMap` scenario and the existing set/map tests: + - Build from `(sorted-map ...)` / `(sorted-set ...)`; `@db` equals the plain + sorted collection (order-sensitive comparison). + - `subseq`/`rsubseq` for all six test/bound forms vs. the plain-Clojure oracle. + - `seq` ascending, `rseq` descending, `nth` (including negative), `count` O(1). + - `assoc`/`dissoc` in a `swap!` keep order; re-assoc replaces without changing + count; `disj`/`conj` on the set. + - Key-type matrix: string, keyword, long, double, inst/date keys each iterate + in correct natural order. + - `sorted?` is true; `materialize` returns a plain `sorted-map` with matching + order; read-only `assoc`/`dissoc` returns a plain Clojure sorted collection. + - `rank` returns correct positions and is the inverse of `nth`. + - Empty / none-cursor cases: `subseq` and iteration on an empty sorted map + yield nothing. + - Nesting: sorted map as a value inside a hash map, and rich values inside a + sorted map. +3. **Multi-threaded read** — a light check that reader threads can `subseq`/read a + sorted map, consistent with `multi_threaded_test.clj`. + +Generative tests (`test.check`) are the recommended vehicle for the codec +ordering property and for "insert a random key set, assert iteration order == +`(sort ...)`". + +## Out of Scope + +- **Custom comparators** (`sorted-map-by` / `sorted-set-by`). The engine's order + is fixed; custom comparators are rejected with a clear error, not supported. +- Key types beyond strings, keywords, longs, doubles, `Instant`/`Date` in v1 + (e.g. booleans, `nil`, `BigInteger`/`BigDecimal`, vectors/tuples as keys, + `ratio`). These can be added later by extending the codec's tag table. +- A native streaming **reverse iterator** in the Java layer — descending is + implemented via rank + index walk in Clojure; we do not modify the Java lib. +- Changing how hash maps/sets are stored or their key hashing. +- A bespoke public constructor API (decision: reuse `sorted-map`/`sorted-set`). + +## Further Notes + +- **Why the codec is the risk center**: every ordering guarantee in the feature + reduces to "does `encode-key` preserve order". It is pure and isolated + specifically so it can be proven correct independently before the wrapper types + are trusted. De-risking the codec first (round-trip + property tests) is the + recommended build order. +- **Headline win over `clojure.core`**: `nth`/positional access and `rank` are + O(log n) here, whereas Clojure's in-memory `sorted-map` is O(n) for positional + access. Combined with `iteratorFromIndex`, this makes `XITDBSortedMap` an + excellent fit for **on-disk secondary indexes** with efficient pagination — the + motivating use case demonstrated in the Java library's own tests. +- **Check ordering of type checks** in `v->slot!`: `PersistentTreeMap` satisfies + `map?` and `PersistentTreeSet` satisfies `set?`, so the sorted branches must be + evaluated first or the generic hash branches will shadow them. +- **UTF-8 ordering caveat**: UTF-8 byte order matches Unicode code-point order, + which matches Clojure `compare` on strings for the entire BMP and beyond except + for the surrogate-pair region edge cases; this is acceptable and should be + noted in docs. ASCII keys (the common case) are exact. diff --git a/src/xitdb/sorted_map.clj b/src/xitdb/sorted_map.clj new file mode 100644 index 0000000..4832058 --- /dev/null +++ b/src/xitdb/sorted_map.clj @@ -0,0 +1,205 @@ +(ns xitdb.sorted-map + "On-disk sorted map wrapper types, modelled on `xitdb.hash-map`. + + `XITDBSortedMap` is the read view; `XITDBWriteSortedMap` is the mutable view + used inside a transaction. Ordering is by the engine's unsigned byte + comparison over order-preserving encoded keys (see `xitdb.util.sorted-key`). + + The `clojure.lang.Sorted`/`Indexed`/`Reversible` protocols (subseq, nth, rseq) + are added in a later slice; this slice provides ascending ordered `seq` only." + (:require + [xitdb.common :as common] + [xitdb.util.conversion :as conversion] + [xitdb.util.sorted-operations :as sorted-ops]) + (:import + [io.github.radarroark.xitdb + ReadCursor ReadSortedMap WriteCursor WriteSortedMap])) + +(defn smap-seq [rsm] + (sorted-ops/smap-seq rsm common/-read-from-cursor)) + +(deftype XITDBSortedMap [^ReadSortedMap rsm] + + clojure.lang.ILookup + (valAt [this key] + (.valAt this key nil)) + + (valAt [this key not-found] + (let [cursor (sorted-ops/smap-read-cursor rsm key)] + (if (nil? cursor) + not-found + (common/-read-from-cursor cursor)))) + + clojure.lang.Associative + (containsKey [this key] + (sorted-ops/smap-contains-key? rsm key)) + + (entryAt [this key] + (when (.containsKey this key) + (clojure.lang.MapEntry. key (.valAt this key nil)))) + + (assoc [this k v] + (assoc (common/-materialize-shallow this) k v)) + + clojure.lang.IPersistentMap + (without [this k] + (dissoc (common/-materialize-shallow this) k)) + + (count [this] + (sorted-ops/smap-item-count rsm)) + + clojure.lang.IPersistentCollection + (cons [this o] + (. clojure.lang.RT (conj (common/-materialize-shallow this) o))) + + (empty [this] + (sorted-map)) + + (equiv [this other] + (and (instance? clojure.lang.IPersistentMap other) + (= (into {} this) (into {} other)))) + + clojure.lang.Seqable + (seq [_] + (smap-seq rsm)) + + clojure.lang.IFn + (invoke [this k] + (.valAt this k)) + + (invoke [this k not-found] + (.valAt this k not-found)) + + java.lang.Iterable + (iterator [this] + (let [iter (clojure.lang.SeqIterator. (seq this))] + (reify java.util.Iterator + (hasNext [_] + (.hasNext iter)) + (next [_] + (.next iter)) + (remove [_] + (throw (UnsupportedOperationException. "XITDBSortedMap iterator is read-only")))))) + + clojure.core.protocols/IKVReduce + (kv-reduce [this f init] + (sorted-ops/smap-kv-reduce rsm common/-read-from-cursor f init)) + + common/ISlot + (-slot [this] + (-> rsm .cursor .slot)) + + common/IUnwrap + (-unwrap [this] + rsm) + + common/IMaterialize + (-materialize [this] + (reduce (fn [m [k v]] + (assoc m k (common/materialize v))) (sorted-map) (seq this))) + + common/IMaterializeShallow + (-materialize-shallow [this] + (reduce (fn [m [k v]] + (assoc m k v)) (sorted-map) (seq this))) + + Object + (toString [this] + (str (into (sorted-map) this)))) + +(defmethod print-method XITDBSortedMap [o ^java.io.Writer w] + (.write w "#XITDBSortedMap") + (print-method (into (sorted-map) o) w)) + +;--------------------------------------------------- + +(deftype XITDBWriteSortedMap [^WriteSortedMap wsm] + clojure.lang.IPersistentCollection + (cons [this o] + (cond + (instance? clojure.lang.MapEntry o) + (.assoc this (key o) (val o)) + + (map? o) + (doseq [[k v] (seq o)] + (.assoc this k v)) + + (and (sequential? o) (= 2 (count o))) + (.assoc this (first o) (second o)) + + :else + (throw (IllegalArgumentException. "Can only cons MapEntries or key-value pairs onto maps"))) + this) + + (empty [this] + (sorted-ops/smap-empty! wsm) + this) + + (equiv [this other] + (and (= (count this) (count other)) + (every? (fn [[k v]] (= v (get other k ::not-found))) + (seq this)))) + + clojure.lang.Associative + (assoc [this k v] + (sorted-ops/smap-assoc-value! wsm k (common/unwrap v)) + this) + + (containsKey [this key] + (sorted-ops/smap-contains-key? wsm key)) + + (entryAt [this key] + (when (.containsKey this key) + (clojure.lang.MapEntry. key (.valAt this key nil)))) + + clojure.lang.IPersistentMap + (without [this key] + (sorted-ops/smap-dissoc-key! wsm key) + this) + + (count [this] + (sorted-ops/smap-item-count wsm)) + + clojure.lang.ILookup + (valAt [this key] + (.valAt this key nil)) + + (valAt [this key not-found] + (let [cursor (sorted-ops/smap-read-cursor wsm key)] + (if (nil? cursor) + not-found + (common/-read-from-cursor cursor)))) + + clojure.lang.Seqable + (seq [_] + (smap-seq wsm)) + + clojure.core.protocols/IKVReduce + (kv-reduce [this f init] + (sorted-ops/smap-kv-reduce wsm common/-read-from-cursor f init)) + + common/ISlot + (-slot [this] + (-> wsm .cursor .slot)) + + common/IUnwrap + (-unwrap [this] + wsm) + + common/IReadOnly + (-read-only [this] + (XITDBSortedMap. wsm)) + + Object + (toString [this] + (str "XITDBWriteSortedMap"))) + +(defmethod print-method XITDBWriteSortedMap [o ^java.io.Writer w] + (.write w "#XITDBWriteSortedMap") + (print-method (into (sorted-map) (common/-read-only o)) w)) + +(defn xwrite-sorted-map [^WriteCursor write-cursor] + (->XITDBWriteSortedMap (WriteSortedMap. write-cursor))) + +(defn xsorted-map [^ReadCursor read-cursor] + (->XITDBSortedMap (ReadSortedMap. read-cursor))) diff --git a/src/xitdb/util/conversion.clj b/src/xitdb/util/conversion.clj index 1774bc6..e16843b 100644 --- a/src/xitdb/util/conversion.clj +++ b/src/xitdb/util/conversion.clj @@ -1,11 +1,13 @@ (ns xitdb.util.conversion (:require + [xitdb.util.sorted-key :as sorted-key] [xitdb.util.validation :as validation]) (:import + [clojure.lang PersistentTreeMap] [io.github.radarroark.xitdb Database Database$Bytes Database$Float Database$Int ReadCursor Slot Slotted Tag WriteArrayList WriteCountedHashMap WriteCountedHashSet WriteCursor - WriteHashMap WriteHashSet WriteLinkedArrayList] + WriteHashMap WriteHashSet WriteLinkedArrayList WriteSortedMap] [java.io OutputStream OutputStreamWriter] [java.security DigestOutputStream])) @@ -133,6 +135,13 @@ (declare ^WriteCursor coll->ArrayListCursor!) (declare ^WriteCursor list->LinkedArrayListCursor!) (declare ^WriteCursor set->WriteCursor!) +(declare ^WriteCursor sorted-map->WriteSortedMapCursor!) + +(defn default-sorted-comparator? + "True if `tm` uses Clojure's natural ordering (no custom comparator). + Custom comparators cannot be honoured by the engine's fixed byte ordering." + [^PersistentTreeMap tm] + (identical? clojure.lang.RT/DEFAULT_COMPARATOR (.comparator tm))) (defn ^Slot v->slot! "Converts a value to a XitDB slot. @@ -148,6 +157,16 @@ (instance? Slotted v) (.slot ^Slotted v) + ;; A sorted map is also `map?`, so it MUST be checked before the generic + ;; hash-map branch or it would be shadowed and stored as a hash map. + (instance? PersistentTreeMap v) + (do + (when-not (default-sorted-comparator? v) + (throw (IllegalArgumentException. + "sorted-map-by with a custom comparator is not supported; only natural ordering is allowed."))) + (.write cursor nil) + (.slot (sorted-map->WriteSortedMapCursor! cursor v))) + (map? v) (do (.write cursor nil) @@ -250,6 +269,17 @@ (.write cursor (v->slot! cursor v)))) (.-cursor whm))) +(defn ^WriteCursor sorted-map->WriteSortedMapCursor! + "Writes a Clojure sorted map `m` to a XitDB WriteSortedMap. + Keys are encoded with the order-preserving codec; values are written + recursively via `v->slot!`. Returns the cursor of the created WriteSortedMap." + [^WriteCursor cursor m] + (let [wsm (WriteSortedMap. cursor)] + (doseq [[k v] m] + (let [value-cursor (.putCursor wsm (sorted-key/encode-key k))] + (.write value-cursor (v->slot! value-cursor v)))) + (.-cursor wsm))) + (defn ^WriteCursor set->WriteCursor! "Writes a Clojure set `s` to a XitDB WriteHashSet. Returns the cursor of the created WriteHashSet." diff --git a/src/xitdb/util/sorted_key.clj b/src/xitdb/util/sorted_key.clj new file mode 100644 index 0000000..d6cb0fd --- /dev/null +++ b/src/xitdb/util/sorted_key.clj @@ -0,0 +1,68 @@ +(ns xitdb.util.sorted-key + "Order-preserving, reversible key codec for on-disk sorted maps/sets. + + Unlike hash maps (which SHA-1-hash their keys), sorted collections store the + real key bytes so they can be recovered on read and compared by the engine's + unsigned lexicographic byte comparison (`Arrays.compareUnsigned`). The codec + must therefore be: + + 1. reversible - `decode-key (encode-key k)` == k + 2. order-preserving - `sign(compareUnsigned (encode a) (encode b))` + == `sign(compare a b)` for any two keys. + + Every encoding carries a leading 1-byte type tag. The tag both identifies the + type on decode and establishes a total order across types, so heterogeneous + keys never throw. + + This namespace currently implements string and keyword keys (UTF-8 bytes, + which already sort in code-point order). Numeric/temporal keys are added in a + later slice." + (:import + [java.io ByteArrayOutputStream] + [java.nio.charset StandardCharsets])) + +;; Type tags. Ordering of the tag values defines the cross-type order; they are +;; intentionally sparse to leave room for numeric/temporal types between/around +;; them in later slices. +(def ^:const tag-string (int 0x20)) +(def ^:const tag-keyword (int 0x21)) + +(defn- ^bytes utf8 [^String s] + (.getBytes s StandardCharsets/UTF_8)) + +(defn- ^bytes tagged [tag ^bytes body] + (let [out (ByteArrayOutputStream. (inc (alength body)))] + (.write out (int tag)) + (.write out body 0 (alength body)) + (.toByteArray out))) + +(defn ^String keyname + "String form of a keyword key, namespace-qualified when present." + [k] + (if (namespace k) + (str (namespace k) "/" (name k)) + (name k))) + +(defn encode-key + "Encodes Clojure key `k` to an order-preserving, reversible byte array." + ^bytes [k] + (cond + (string? k) + (tagged tag-string (utf8 k)) + + (keyword? k) + (tagged tag-keyword (utf8 (keyname k))) + + :else + (throw (IllegalArgumentException. + (str "Unsupported sorted-map key type: " (type k)))))) + +(defn decode-key + "Decodes a byte array produced by `encode-key` back to the Clojure key." + [^bytes ba] + (let [tag (bit-and (int (aget ba 0)) 0xff) + body (String. ba 1 (dec (alength ba)) StandardCharsets/UTF_8)] + (condp = tag + tag-string body + tag-keyword (keyword body) + (throw (IllegalArgumentException. (str "Unknown sorted-key tag: " tag)))))) diff --git a/src/xitdb/util/sorted_operations.clj b/src/xitdb/util/sorted_operations.clj new file mode 100644 index 0000000..8334d8a --- /dev/null +++ b/src/xitdb/util/sorted_operations.clj @@ -0,0 +1,78 @@ +(ns xitdb.util.sorted-operations + "Bridges the XITDBSorted* wrapper types to the Java Read/WriteSortedMap. + Keys are encoded/decoded through `xitdb.util.sorted-key` (order-preserving, + reversible) rather than hashed, so the real key is recoverable on read." + (:require + [xitdb.util.conversion :as conversion] + [xitdb.util.sorted-key :as sorted-key]) + (:import + [io.github.radarroark.xitdb ReadCursor WriteCursor ReadSortedMap WriteSortedMap])) + +(defn smap-item-count + "O(1) entry count, delegating to the rank-augmented B-tree." + [^ReadSortedMap rsm] + (.count rsm)) + +(defn- decode-key-cursor [^ReadCursor key-cursor] + (sorted-key/decode-key (.readBytes key-cursor nil))) + +(defn smap-read-cursor + "Read cursor for `key`, or nil if absent." + [^ReadSortedMap rsm key] + (.getCursor rsm (sorted-key/encode-key key))) + +(defn smap-contains-key? + [^ReadSortedMap rsm key] + (some? (smap-read-cursor rsm key))) + +(defn smap-assoc-value! + "Encodes `k`, writes value `v` at its cursor. Returns the WriteSortedMap." + [^WriteSortedMap wsm k v] + (let [value-cursor (.putCursor wsm (sorted-key/encode-key k))] + (.write value-cursor (conversion/v->slot! value-cursor v)) + wsm)) + +(defn smap-dissoc-key! + [^WriteSortedMap wsm k] + (.remove wsm (sorted-key/encode-key k)) + wsm) + +(defn smap-empty! + "Replaces contents with an empty sorted map, in place." + [^WriteSortedMap wsm] + (let [^WriteCursor cursor (.-cursor wsm)] + (.write cursor nil) + ;; re-init an empty sorted map at the same cursor the wsm holds + (WriteSortedMap. cursor)) + wsm) + +(defn smap-seq + "Lazy ascending seq of key-value MapEntry pairs, or nil if empty. + `read-from-cursor` converts a value cursor to a Clojure value." + [^ReadSortedMap rsm read-from-cursor] + (let [it (.iterator rsm)] + (when (.hasNext it) + (letfn [(step [] + (lazy-seq + (when (.hasNext it) + (let [cursor (.next it) + kv (.readKeyValuePair cursor) + k (decode-key-cursor (.-keyCursor kv)) + v (read-from-cursor (.-valueCursor kv))] + (cons (clojure.lang.MapEntry. k v) (step))))))] + (step))))) + +(defn smap-kv-reduce + [^ReadSortedMap rsm read-from-cursor f init] + (let [it (.iterator rsm)] + (loop [result init] + (if (.hasNext it) + (let [cursor (.next it) + kv (.readKeyValuePair cursor) + k (decode-key-cursor (.-keyCursor kv)) + v (read-from-cursor (.-valueCursor kv)) + new-result (f result k v)] + (if (reduced? new-result) + @new-result + (recur new-result))) + result)))) diff --git a/src/xitdb/xitdb_types.clj b/src/xitdb/xitdb_types.clj index 2e1c2d4..6be9d2d 100644 --- a/src/xitdb/xitdb_types.clj +++ b/src/xitdb/xitdb_types.clj @@ -5,6 +5,7 @@ [xitdb.hash-map :as xhash-map] [xitdb.hash-set :as xhash-set] [xitdb.linked-list :as xlinked-list] + [xitdb.sorted-map :as xsorted-map] [xitdb.util.conversion :as conversion]) (:import [io.github.radarroark.xitdb ReadCursor Slot Tag WriteCursor])) @@ -51,6 +52,11 @@ (xhash-set/xwrite-hash-set-counted cursor) (xhash-set/xhash-set-counted cursor)) + (= value-tag Tag/SORTED_MAP) + (if for-writing? + (xsorted-map/xwrite-sorted-map cursor) + (xsorted-map/xsorted-map cursor)) + (= value-tag Tag/ARRAY_LIST) (if for-writing? (xarray-list/xwrite-array-list cursor) diff --git a/test/xitdb/sorted_key_test.clj b/test/xitdb/sorted_key_test.clj new file mode 100644 index 0000000..5cf64c7 --- /dev/null +++ b/test/xitdb/sorted_key_test.clj @@ -0,0 +1,23 @@ +(ns xitdb.sorted-key-test + (:require + [clojure.test :refer :all] + [xitdb.util.sorted-key :as sk])) + +(defn cmp-unsigned [^bytes a ^bytes b] + (java.util.Arrays/compareUnsigned a b)) + +(deftest string-roundtrip + (testing "strings encode and decode back to the same string" + (doseq [s ["" "a" "hello" "with spaces" "unicode-é-字"]] + (is (= s (sk/decode-key (sk/encode-key s))))))) + +(deftest keyword-roundtrip + (testing "keywords round-trip, including namespaced" + (doseq [k [:a :foo/bar :a-much-longer-keyword]] + (is (= k (sk/decode-key (sk/encode-key k))))))) + +(deftest string-order-preserved + (testing "byte order matches code-point order for strings" + (doseq [[a b] [["a" "b"] ["a" "ab"] ["abc" "abd"] ["" "a"] ["k0009" "k0010"]]] + (is (neg? (cmp-unsigned (sk/encode-key a) (sk/encode-key b))) + (str a " < " b))))) diff --git a/test/xitdb/sorted_map_test.clj b/test/xitdb/sorted_map_test.clj new file mode 100644 index 0000000..a18bd26 --- /dev/null +++ b/test/xitdb/sorted_map_test.clj @@ -0,0 +1,119 @@ +(ns xitdb.sorted-map-test + (:require + [clojure.test :refer :all] + [xitdb.db :as xdb] + [xitdb.test-utils :as tu :refer [with-db]])) + +(deftest lookups-and-count + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "a" 1 "b" 2 "c" 3)) + (let [m @db] + (testing "get / invoke / find / contains?" + (is (= 1 (get m "a"))) + (is (= 2 (m "b"))) + (is (= ::nf (get m "z" ::nf))) + (is (true? (contains? m "c"))) + (is (false? (contains? m "z"))) + (is (= (clojure.lang.MapEntry. "a" 1) (find m "a"))) + (is (nil? (find m "z")))) + (testing "count is correct" + (is (= 3 (count m))))))) + +(deftest mutation-keeps-order + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "b" 2 "d" 4)) + (testing "assoc inserts in order" + (swap! db assoc "c" 3) + (swap! db assoc "a" 1) + (is (= ["a" "b" "c" "d"] (map key (seq @db))))) + (testing "dissoc removes and preserves order" + (swap! db dissoc "b") + (is (= ["a" "c" "d"] (map key (seq @db))))) + (testing "re-assoc replaces value without changing count" + (swap! db assoc "c" 30) + (is (= 3 (count @db))) + (is (= 30 (get @db "c")))))) + +(deftest keyword-keys + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map :banana 2 :apple 1 :cherry 3)) + (testing "keyword keys round-trip as keywords, in sorted order" + (is (= [:apple :banana :cherry] (map key (seq @db)))) + (is (every? keyword? (map key (seq @db)))) + (is (= 1 (get @db :apple)))))) + +(deftest materialize-returns-plain-sorted-map + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "b" 2 "a" 1)) + (let [m (tu/materialize @db)] + (is (sorted? m)) + (is (not (instance? xitdb.sorted_map.XITDBSortedMap m))) + (is (= ["a" "b"] (keys m))) + (is (= {"a" 1 "b" 2} m))))) + +(deftest read-only-ops-return-plain-collections + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "a" 1 "b" 2)) + (let [m @db] + (testing "assoc outside a transaction returns a plain sorted map" + (let [r (assoc m "c" 3)] + (is (not (instance? xitdb.sorted_map.XITDBSortedMap r))) + (is (sorted? r)) + (is (= ["a" "b" "c"] (keys r))))) + (testing "dissoc outside a transaction returns a plain sorted map" + (let [r (dissoc m "a")] + (is (not (instance? xitdb.sorted_map.XITDBSortedMap r))) + (is (sorted? r)) + (is (= ["b"] (keys r)))))))) + +(deftest custom-comparator-rejected + (with-open [db (xdb/xit-db :memory)] + (is (thrown? IllegalArgumentException + (reset! db (sorted-map-by > 1 :a 2 :b)))))) + +(deftest nesting-and-complex-values + (testing "sorted map nests inside a hash map value" + (with-open [db (xdb/xit-db :memory)] + (reset! db {:idx (sorted-map "b" 2 "a" 1)}) + (is (instance? xitdb.sorted_map.XITDBSortedMap (:idx @db))) + (is (= ["a" "b"] (map key (seq (:idx @db))))))) + (testing "nested sorted map round-trips against an in-memory atom" + (with-db [db (tu/test-db)] + (reset! db {:idx (sorted-map "b" 2 "a" 1)}) + (is (tu/db-equal-to-atom? db)))) + (testing "sorted map values may be vectors, maps and sets" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "vec" [1 2 3] + "map" {:x 1} + "set" #{:a :b})) + (is (= [1 2 3] (tu/materialize (get @db "vec")))) + (is (= {:x 1} (tu/materialize (get @db "map")))) + (is (= #{:a :b} (tu/materialize (get @db "set"))))))) + +(deftest empty-clears-map + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "a" 1 "b" 2)) + (swap! db empty) + (is (= 0 (count @db))) + (is (empty? (seq @db))) + (swap! db assoc "c" 3) + (is (= ["c"] (map key (seq @db)))))) + +(deftest print-method-ordered + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "b" 2 "a" 1)) + (let [s (pr-str @db)] + (is (clojure.string/starts-with? s "#XITDBSortedMap")) + (is (clojure.string/includes? s "\"a\" 1, \"b\" 2"))))) + +(deftest tracer-bullet-ordered-seq + (testing "a persisted sorted-map is stored as a sorted map and seqs in key order" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "b" 2 "a" 1)) + (is (instance? xitdb.sorted_map.XITDBSortedMap @db)) + (is (= [["a" 1] ["b" 2]] (map (juxt key val) (seq @db)))))) + (testing "ordering holds for many keys regardless of insertion order" + (with-open [db (xdb/xit-db :memory)] + (let [ks (map #(format "k%04d" %) (shuffle (range 50)))] + (reset! db (into (sorted-map) (map vector ks (range)))) + (is (= (sort ks) (map key (seq @db)))))))) From 1742800031d2eecc601234c6ab56f6c57c3254aa Mon Sep 17 00:00:00 2001 From: Florin Braghis Date: Tue, 23 Jun 2026 18:05:45 +0200 Subject: [PATCH 02/15] Add numeric/temporal key codec (Issue 3): long, double, Instant, Date Extend the order-preserving sorted-key codec with tagged encodings for long, double, java.time.Instant and java.util.Date, so numeric and temporal keys sort in their natural order on disk. - long: 8-byte big-endian with the sign bit flipped (XOR 0x80) so signed integers sort correctly under unsigned byte comparison. - double: IEEE-754 big-endian with the order-preserving bit flip (flip all bits when sign set, else flip sign bit). NaN is rejected (ordering undefined). - Instant: epoch-second (sign-flipped) + nano-of-second (4-byte BE). - Date: epoch-millis, sign-flipped 8-byte BE; distinct tag from Instant. Cross-type tag order: long < double < instant < date < string < keyword, making heterogeneous-key comparison total (never throws). Tests: round-trip + order-preservation property tests (deterministic randomized loops, fixed seeds; test.check is not a dependency), boundary values, cross-type-never-throws, and :memory-DB integration showing numeric/temporal keys iterate in natural order. Co-Authored-By: Claude Opus 4.8 --- src/xitdb/util/sorted_key.clj | 107 +++++++++++++++++++++++++-- test/xitdb/sorted_key_test.clj | 128 ++++++++++++++++++++++++++++++++- test/xitdb/sorted_map_test.clj | 38 +++++++++- 3 files changed, 264 insertions(+), 9 deletions(-) diff --git a/src/xitdb/util/sorted_key.clj b/src/xitdb/util/sorted_key.clj index d6cb0fd..e9faea2 100644 --- a/src/xitdb/util/sorted_key.clj +++ b/src/xitdb/util/sorted_key.clj @@ -19,11 +19,20 @@ later slice." (:import [java.io ByteArrayOutputStream] - [java.nio.charset StandardCharsets])) + [java.nio ByteBuffer] + [java.nio.charset StandardCharsets] + [java.time Instant] + [java.util Date])) ;; Type tags. Ordering of the tag values defines the cross-type order; they are -;; intentionally sparse to leave room for numeric/temporal types between/around -;; them in later slices. +;; intentionally sparse to leave room for additional types in later slices. +;; Current cross-type order (by ascending tag byte): +;; long (0x10) < double (0x11) < instant (0x18) < date (0x19) +;; < string (0x20) < keyword (0x21) +(def ^:const tag-long (int 0x10)) +(def ^:const tag-double (int 0x11)) +(def ^:const tag-instant (int 0x18)) +(def ^:const tag-date (int 0x19)) (def ^:const tag-string (int 0x20)) (def ^:const tag-keyword (int 0x21)) @@ -36,6 +45,68 @@ (.write out body 0 (alength body)) (.toByteArray out))) +(defn- ^bytes long->bytes + "8-byte big-endian with the sign bit flipped, so signed longs sort correctly + under unsigned byte comparison (negatives before positives)." + [^long n] + (let [buf (doto (ByteBuffer/allocate 8) (.putLong (bit-xor n Long/MIN_VALUE)))] + (.array buf))) + +(defn- bytes->long + "Inverse of `long->bytes`. Reads 8 big-endian bytes starting at `off`." + [^bytes ba off] + (bit-xor (.getLong (ByteBuffer/wrap ba (int off) 8)) Long/MIN_VALUE)) + +(defn- ^bytes double->bytes + "IEEE-754 8-byte big-endian with an order-preserving bit flip: if the sign bit + is set, flip all bits; otherwise flip only the sign bit. This makes doubles + sort numerically under unsigned byte comparison. NaN is rejected by + `encode-key` (its ordering is undefined), so it never reaches here." + [^double d] + (let [bits (Double/doubleToLongBits d) + flipped (if (neg? bits) + (bit-not bits) + (bit-or bits Long/MIN_VALUE)) + buf (doto (ByteBuffer/allocate 8) (.putLong flipped))] + (.array buf))) + +(defn- bytes->double + "Inverse of `double->bytes`. Reads 8 big-endian bytes starting at `off`." + [^bytes ba off] + (let [flipped (.getLong (ByteBuffer/wrap ba (int off) 8)) + bits (if (neg? flipped) + (bit-and flipped Long/MAX_VALUE) + (bit-not flipped))] + (Double/longBitsToDouble bits))) + +(defn- ^bytes instant->bytes + "12 bytes: epoch-second (8-byte big-endian, sign-flipped so negative epochs + sort first) followed by nano-of-second (4-byte big-endian; always 0..1e9-1, so + unsigned order is chronological). Byte order therefore equals chronological + order across the full Instant range." + [^Instant i] + (let [buf (doto (ByteBuffer/allocate 12) + (.putLong (bit-xor (.getEpochSecond i) Long/MIN_VALUE)) + (.putInt (int (.getNano i))))] + (.array buf))) + +(defn- ^Instant bytes->instant [^bytes ba off] + (let [bb (ByteBuffer/wrap ba (int off) 12) + secs (bit-xor (.getLong bb) Long/MIN_VALUE) + nano (.getInt bb)] + (Instant/ofEpochSecond secs nano))) + +(defn- ^bytes date->bytes + "8-byte big-endian epoch-millis with the sign bit flipped, so pre-epoch dates + sort before the epoch. Byte order equals chronological order." + [^Date d] + (let [buf (doto (ByteBuffer/allocate 8) + (.putLong (bit-xor (.getTime d) Long/MIN_VALUE)))] + (.array buf))) + +(defn- ^Date bytes->date [^bytes ba off] + (Date. (bit-xor (.getLong (ByteBuffer/wrap ba (int off) 8)) Long/MIN_VALUE))) + (defn ^String keyname "String form of a keyword key, namespace-qualified when present." [k] @@ -53,16 +124,38 @@ (keyword? k) (tagged tag-keyword (utf8 (keyname k))) + (integer? k) + (tagged tag-long (long->bytes (long k))) + + (float? k) + (let [d (double k)] + (when (Double/isNaN d) + (throw (IllegalArgumentException. + "NaN is not a valid sorted-map key (ordering undefined)"))) + (tagged tag-double (double->bytes d))) + + (instance? Instant k) + (tagged tag-instant (instant->bytes k)) + + (instance? Date k) + (tagged tag-date (date->bytes k)) + :else (throw (IllegalArgumentException. (str "Unsupported sorted-map key type: " (type k)))))) +(defn- ^String utf8-body [^bytes ba] + (String. ba 1 (dec (alength ba)) StandardCharsets/UTF_8)) + (defn decode-key "Decodes a byte array produced by `encode-key` back to the Clojure key." [^bytes ba] - (let [tag (bit-and (int (aget ba 0)) 0xff) - body (String. ba 1 (dec (alength ba)) StandardCharsets/UTF_8)] + (let [tag (bit-and (int (aget ba 0)) 0xff)] (condp = tag - tag-string body - tag-keyword (keyword body) + tag-string (utf8-body ba) + tag-keyword (keyword (utf8-body ba)) + tag-long (bytes->long ba 1) + tag-double (bytes->double ba 1) + tag-instant (bytes->instant ba 1) + tag-date (bytes->date ba 1) (throw (IllegalArgumentException. (str "Unknown sorted-key tag: " tag)))))) diff --git a/test/xitdb/sorted_key_test.clj b/test/xitdb/sorted_key_test.clj index 5cf64c7..44e8413 100644 --- a/test/xitdb/sorted_key_test.clj +++ b/test/xitdb/sorted_key_test.clj @@ -1,7 +1,10 @@ (ns xitdb.sorted-key-test (:require [clojure.test :refer :all] - [xitdb.util.sorted-key :as sk])) + [xitdb.util.sorted-key :as sk]) + (:import + [java.time Instant] + [java.util Date])) (defn cmp-unsigned [^bytes a ^bytes b] (java.util.Arrays/compareUnsigned a b)) @@ -21,3 +24,126 @@ (doseq [[a b] [["a" "b"] ["a" "ab"] ["abc" "abd"] ["" "a"] ["k0009" "k0010"]]] (is (neg? (cmp-unsigned (sk/encode-key a) (sk/encode-key b))) (str a " < " b))))) + +(deftest long-roundtrip + (testing "longs round-trip, including boundary values" + (doseq [n [0 1 -1 42 -42 Long/MIN_VALUE Long/MAX_VALUE + (long Integer/MIN_VALUE) (long Integer/MAX_VALUE)]] + (is (= n (sk/decode-key (sk/encode-key n))) + (str "roundtrip " n))))) + +(deftest long-order-preserved + (testing "byte order matches numeric order, negatives before positives" + (doseq [[a b] [[1 2] [9 10] [-5 0] [-5 3] [0 3] + [Long/MIN_VALUE -1] [Long/MIN_VALUE Long/MAX_VALUE] + [-1 0] [0 Long/MAX_VALUE]]] + (is (neg? (cmp-unsigned (sk/encode-key a) (sk/encode-key b))) + (str a " < " b))))) + +(deftest double-roundtrip + (testing "doubles round-trip, including extremes" + (doseq [d [0.0 1.0 -1.0 3.14 -3.14 1.0e308 -1.0e308 1.0e-308 -1.0e-308 + Double/MIN_VALUE Double/MAX_VALUE]] + (is (= d (sk/decode-key (sk/encode-key d))) + (str "roundtrip " d))))) + +(deftest double-order-preserved + (testing "byte order matches numeric order across sign and magnitude" + (doseq [[a b] [[1.0 2.0] [-1.0 0.0] [-2.0 -1.0] [-1.0e308 1.0e308] + [0.0 1.0e-308] [-3.14 3.14] [-1.0e-308 0.0]]] + (is (neg? (cmp-unsigned (sk/encode-key a) (sk/encode-key b))) + (str a " < " b))))) + +(deftest date-roundtrip + (testing "dates round-trip to Date" + (doseq [d [(Date. 0) (Date. 1719100000000) (Date. -100000) (Date.)]] + (is (= d (sk/decode-key (sk/encode-key d))) + (str "roundtrip " d)) + (is (instance? Date (sk/decode-key (sk/encode-key d))))))) + +(deftest date-order-preserved + (testing "byte order matches chronological order, including pre-epoch" + (doseq [[a b] [[(Date. 0) (Date. 1)] + [(Date. -5000) (Date. 0)] + [(Date. 1000) (Date. 2000)] + [(Date. -2000) (Date. -1000)]]] + (is (neg? (cmp-unsigned (sk/encode-key a) (sk/encode-key b))) + (str a " < " b))))) + +;; ----- property-based ordering (deterministic randomized loops, fixed seed) --- + +(defn- order-agrees? [a b] + (= (Integer/signum (compare a b)) + (Integer/signum (cmp-unsigned (sk/encode-key a) (sk/encode-key b))))) + +(deftest prop-long-order + (testing "for 2000 random long pairs, byte order == numeric order" + (let [r (java.util.Random. 42)] + (is (every? (fn [_] (order-agrees? (.nextLong r) (.nextLong r))) + (range 2000)))))) + +(deftest prop-double-order + (testing "for 2000 random finite double pairs, byte order == numeric order" + (let [r (java.util.Random. 43) + rand-d (fn [] (* (.nextDouble r) + (Math/pow 10 (- (.nextInt r 40) 20)) + (if (.nextBoolean r) 1.0 -1.0)))] + (is (every? (fn [_] (order-agrees? (rand-d) (rand-d))) + (range 2000)))))) + +(deftest prop-instant-order + (testing "for 2000 random instant pairs, byte order == chronological order" + (let [r (java.util.Random. 44) + rand-i (fn [] (Instant/ofEpochSecond + (- (.nextLong (java.util.Random. (.nextLong r)) + 4000000000) 2000000000) + (.nextInt r 1000000000)))] + (is (every? (fn [_] (order-agrees? (rand-i) (rand-i))) + (range 2000)))))) + +(deftest prop-roundtrip + (testing "random keys of every type round-trip exactly" + (let [r (java.util.Random. 45)] + (is (every? + (fn [_] + (let [k (case (.nextInt r 5) + 0 (.nextLong r) + 1 (* (.nextDouble r) (if (.nextBoolean r) 1e9 -1e-9)) + 2 (Instant/ofEpochSecond (.nextInt r 2000000000) + (.nextInt r 1000000000)) + 3 (Date. (long (.nextInt r 2000000000))) + 4 (str "k" (.nextInt r 100000)))] + (= k (sk/decode-key (sk/encode-key k))))) + (range 2000)))))) + +(deftest cross-type-never-throws + (testing "encoding any supported type and comparing across types never throws" + (let [vals [0 -1 Long/MAX_VALUE 3.14 -2.0 "abc" :kw + (Instant/ofEpochSecond 5) (Date. 1000)] + encoded (map sk/encode-key vals)] + (doseq [a encoded b encoded] + (is (integer? (cmp-unsigned a b))))))) + +(deftest unsupported-key-throws + (is (thrown? IllegalArgumentException (sk/encode-key nil))) + (is (thrown? IllegalArgumentException (sk/encode-key true))) + (is (thrown? IllegalArgumentException (sk/encode-key Double/NaN)))) + +(deftest instant-roundtrip + (testing "instants round-trip to Instant, preserving sub-second precision" + (doseq [i [(Instant/ofEpochSecond 0) + (Instant/ofEpochSecond 1719100000 123456789) + (Instant/ofEpochSecond -100 500) + Instant/EPOCH]] + (is (= i (sk/decode-key (sk/encode-key i))) + (str "roundtrip " i)) + (is (instance? Instant (sk/decode-key (sk/encode-key i))))))) + +(deftest instant-order-preserved + (testing "byte order matches chronological order, incl. negative epoch & nanos" + (doseq [[a b] [[(Instant/ofEpochSecond 0) (Instant/ofEpochSecond 1)] + [(Instant/ofEpochSecond -5) (Instant/ofEpochSecond 0)] + [(Instant/ofEpochSecond 10 100) (Instant/ofEpochSecond 10 200)] + [(Instant/ofEpochSecond 10 999999999) (Instant/ofEpochSecond 11 0)]]] + (is (neg? (cmp-unsigned (sk/encode-key a) (sk/encode-key b))) + (str a " < " b))))) diff --git a/test/xitdb/sorted_map_test.clj b/test/xitdb/sorted_map_test.clj index a18bd26..f9e7903 100644 --- a/test/xitdb/sorted_map_test.clj +++ b/test/xitdb/sorted_map_test.clj @@ -2,7 +2,10 @@ (:require [clojure.test :refer :all] [xitdb.db :as xdb] - [xitdb.test-utils :as tu :refer [with-db]])) + [xitdb.test-utils :as tu :refer [with-db]]) + (:import + [java.time Instant] + [java.util Date])) (deftest lookups-and-count (with-open [db (xdb/xit-db :memory)] @@ -106,6 +109,39 @@ (is (clojure.string/starts-with? s "#XITDBSortedMap")) (is (clojure.string/includes? s "\"a\" 1, \"b\" 2"))))) +(deftest numeric-keys-iterate-numerically + (testing "long keys iterate in numeric, not lexical, order" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map 9 :a 10 :b 1 :c)) + (is (= [1 9 10] (map key (seq @db)))) + (is (= [:c :a :b] (map val (seq @db)))))) + (testing "negative and positive longs sort together, incl. extremes" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-map) + (map vector [3 -5 0 Long/MIN_VALUE Long/MAX_VALUE] + (range)))) + (is (= [Long/MIN_VALUE -5 0 3 Long/MAX_VALUE] (map key (seq @db)))))) + (testing "double keys sort numerically, incl. negatives and zero" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map 3.5 :a -1.5 :b 0.0 :c 1.0e308 :d -1.0e308 :e)) + (is (= [-1.0e308 -1.5 0.0 3.5 1.0e308] (map key (seq @db))))))) + +(deftest temporal-keys-iterate-chronologically + (testing "Instant keys iterate chronologically and round-trip to Instant" + (with-open [db (xdb/xit-db :memory)] + (let [t0 (Instant/ofEpochSecond 100) + t1 (Instant/ofEpochSecond 200 500) + t2 (Instant/ofEpochSecond 200 999)] + (reset! db (sorted-map t2 :c t0 :a t1 :b)) + (is (= [t0 t1 t2] (map key (seq @db)))) + (is (every? #(instance? Instant %) (map key (seq @db))))))) + (testing "Date keys iterate chronologically and round-trip to Date" + (with-open [db (xdb/xit-db :memory)] + (let [d0 (Date. 0) d1 (Date. 1000) d2 (Date. 2000)] + (reset! db (sorted-map d2 :c d0 :a d1 :b)) + (is (= [d0 d1 d2] (map key (seq @db)))) + (is (every? #(instance? Date %) (map key (seq @db)))))))) + (deftest tracer-bullet-ordered-seq (testing "a persisted sorted-map is stored as a sorted map and seqs in key order" (with-open [db (xdb/xit-db :memory)] From f6d5e962704703e52ba939c806ac17b1d6ef4699 Mon Sep 17 00:00:00 2001 From: Florin Braghis Date: Tue, 23 Jun 2026 18:12:01 +0200 Subject: [PATCH 03/15] Implement Sorted/Indexed/Reversible on XITDBSortedMap (Issue 2) Make XITDBSortedMap a fully sorted Clojure collection so sorted?, subseq, rsubseq, seq, rseq and indexed nth work against disk. - clojure.lang.Sorted: comparator (Arrays.compareUnsigned over encoded keys, consistent with the engine and total across types), entryKey, seq(ascending?), seqFrom(k, ascending?). Ascending seqFrom uses the native O(log n) iteratorFrom lower-bound seek; descending uses rank + a lazy descending getIndexKeyValuePair index walk (no native reverse iterator exists). - clojure.lang.Indexed: nth via getIndexKeyValuePair (negative indices count from the end; nth/2 returns not-found, nth/1 throws IndexOutOfBoundsException out of range). - clojure.lang.Reversible: rseq (lazy descending walk). seqFrom/rseq helpers return nil (not an empty lazy-seq) when empty so clojure.core/subseq's when-let short-circuits instead of NPEing. Added sorted-key/key-comparator and sorted-operations helpers (smap-seq-from, smap-nth, smap-rank, smap-rseq). Tests assert subseq / rsubseq / rseq / nth entry-for-entry against a plain sorted-map oracle for all bound forms, plus empty-map and negative-index edge cases. Co-Authored-By: Claude Opus 4.8 --- src/xitdb/sorted_map.clj | 43 +++++++++++++++ src/xitdb/util/sorted_key.clj | 9 ++++ src/xitdb/util/sorted_operations.clj | 51 ++++++++++++++++++ test/xitdb/sorted_map_test.clj | 78 ++++++++++++++++++++++++++++ 4 files changed, 181 insertions(+) diff --git a/src/xitdb/sorted_map.clj b/src/xitdb/sorted_map.clj index 4832058..799b892 100644 --- a/src/xitdb/sorted_map.clj +++ b/src/xitdb/sorted_map.clj @@ -10,6 +10,7 @@ (:require [xitdb.common :as common] [xitdb.util.conversion :as conversion] + [xitdb.util.sorted-key :as sorted-key] [xitdb.util.sorted-operations :as sorted-ops]) (:import [io.github.radarroark.xitdb @@ -18,6 +19,16 @@ (defn smap-seq [rsm] (sorted-ops/smap-seq rsm common/-read-from-cursor)) +(defn- descending-start-index + "Index to begin a descending walk for `seqFrom(key, false)`: the largest rank + whose key is <= `key`. Uses `rank` (count of keys strictly < key); if `key` + itself is present, include it." + [^ReadSortedMap rsm key] + (let [r (sorted-ops/smap-rank rsm key)] + (if (sorted-ops/smap-contains-key? rsm key) + r + (dec r)))) + (deftype XITDBSortedMap [^ReadSortedMap rsm] clojure.lang.ILookup @@ -63,6 +74,38 @@ (seq [_] (smap-seq rsm)) + clojure.lang.Sorted + (comparator [_] + sorted-key/key-comparator) + + (entryKey [_ entry] + (key entry)) + + (seq [_ ascending?] + (if ascending? + (smap-seq rsm) + (sorted-ops/smap-rseq rsm common/-read-from-cursor))) + + (seqFrom [_ key ascending?] + (if ascending? + (sorted-ops/smap-seq-from rsm common/-read-from-cursor key) + (sorted-ops/smap-rseq rsm common/-read-from-cursor + (descending-start-index rsm key)))) + + clojure.lang.Indexed + (nth [this i] + (let [e (.nth this i ::not-found)] + (if (identical? e ::not-found) + (throw (IndexOutOfBoundsException.)) + e))) + + (nth [_ i not-found] + (sorted-ops/smap-nth rsm common/-read-from-cursor i not-found)) + + clojure.lang.Reversible + (rseq [_] + (sorted-ops/smap-rseq rsm common/-read-from-cursor)) + clojure.lang.IFn (invoke [this k] (.valAt this k)) diff --git a/src/xitdb/util/sorted_key.clj b/src/xitdb/util/sorted_key.clj index e9faea2..c8d5288 100644 --- a/src/xitdb/util/sorted_key.clj +++ b/src/xitdb/util/sorted_key.clj @@ -147,6 +147,15 @@ (defn- ^String utf8-body [^bytes ba] (String. ba 1 (dec (alength ba)) StandardCharsets/UTF_8)) +(def key-comparator + "A `java.util.Comparator` consistent with the engine's natural ordering: + compares two keys by `Arrays.compareUnsigned` over their encoded bytes. Use + this (not `clojure.core/compare`) so `subseq`/`rsubseq` bound checks agree with + on-disk order across all supported types, including heterogeneous keys." + (reify java.util.Comparator + (compare [_ a b] + (java.util.Arrays/compareUnsigned (encode-key a) (encode-key b))))) + (defn decode-key "Decodes a byte array produced by `encode-key` back to the Clojure key." [^bytes ba] diff --git a/src/xitdb/util/sorted_operations.clj b/src/xitdb/util/sorted_operations.clj index 8334d8a..d52cf41 100644 --- a/src/xitdb/util/sorted_operations.clj +++ b/src/xitdb/util/sorted_operations.clj @@ -62,6 +62,57 @@ (cons (clojure.lang.MapEntry. k v) (step))))))] (step))))) +(defn- kvpair->entry + "Turns a Java KeyValuePair (with .-keyCursor/.-valueCursor) into a Clojure + MapEntry (decoded key, read value)." + [kv read-from-cursor] + (clojure.lang.MapEntry. + (decode-key-cursor (.-keyCursor kv)) + (read-from-cursor (.-valueCursor kv)))) + +(defn smap-seq-from + "Lazy ascending seq of MapEntry pairs starting at the first key >= `key`, + using the engine's native O(log n) lower-bound seek. nil if none." + [^ReadSortedMap rsm read-from-cursor key] + (let [it (.iteratorFrom rsm (sorted-key/encode-key key))] + (when (.hasNext it) + (letfn [(step [] + (lazy-seq + (when (.hasNext it) + (cons (kvpair->entry (.readKeyValuePair (.next it)) + read-from-cursor) + (step)))))] + (step))))) + +(defn smap-nth + "MapEntry at rank `index` (negative counts from the end), or `not-found` when + out of range. O(log n) via the rank-augmented B-tree." + [^ReadSortedMap rsm read-from-cursor index not-found] + (let [kv (.getIndexKeyValuePair rsm (long index))] + (if (nil? kv) + not-found + (kvpair->entry kv read-from-cursor)))) + +(defn smap-rank + "Number of keys strictly less than `key`. O(log n)." + [^ReadSortedMap rsm key] + (.rank rsm (sorted-key/encode-key key))) + +(defn smap-rseq + "Lazy descending seq of MapEntry pairs, walking `getIndexKeyValuePair` from + index `start` down to 0. Stays low-memory (one entry materialised at a time)." + ([^ReadSortedMap rsm read-from-cursor] + (smap-rseq rsm read-from-cursor (dec (.count rsm)))) + ([^ReadSortedMap rsm read-from-cursor start] + (when (>= start 0) + (letfn [(step [i] + (lazy-seq + (when (>= i 0) + (let [kv (.getIndexKeyValuePair rsm (long i))] + (when kv + (cons (kvpair->entry kv read-from-cursor) (step (dec i))))))))] + (step start))))) + (defn smap-kv-reduce [^ReadSortedMap rsm read-from-cursor f init] (let [it (.iterator rsm)] diff --git a/test/xitdb/sorted_map_test.clj b/test/xitdb/sorted_map_test.clj index f9e7903..7c039d4 100644 --- a/test/xitdb/sorted_map_test.clj +++ b/test/xitdb/sorted_map_test.clj @@ -109,6 +109,84 @@ (is (clojure.string/starts-with? s "#XITDBSortedMap")) (is (clojure.string/includes? s "\"a\" 1, \"b\" 2"))))) +(deftest sorted-predicate-and-comparator + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map 3 :c 1 :a 2 :b)) + (testing "sorted? is true for a persisted sorted map" + (is (sorted? @db))) + (testing "comparator is consistent with iteration order" + (let [^java.util.Comparator c (.comparator ^clojure.lang.Sorted @db)] + (is (neg? (.compare c 1 2))) + (is (pos? (.compare c 2 1))) + (is (zero? (.compare c 2 2))) + ;; cross-type bound checks must agree with the engine (not core/compare) + (is (neg? (.compare c 5 "x"))))))) + +(deftest nth-indexed + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-map) (map vector (shuffle (range 20)) (range 20)))] + (reset! db oracle) + (let [m @db + ov (vec oracle)] + (testing "nth by positive index matches the oracle's entry at that rank" + (doseq [i (range 20)] + (is (= (nth ov i) (nth m i)) (str "nth " i)))) + (testing "negative index counts from the end (-1 = last)" + (is (= (last ov) (nth m -1))) + (is (= (nth ov 18) (nth m -2)))) + (testing "out-of-range nth/2 returns not-found" + (is (= ::nf (nth m 100 ::nf))) + (is (= ::nf (nth m -100 ::nf)))) + (testing "out-of-range nth/1 throws like a vector" + (is (thrown? IndexOutOfBoundsException (nth m 100)))))))) + +(deftest subseq-matches-oracle + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-map) (map vector (shuffle (range 0 40 2)) (range)))] + (reset! db oracle) + (let [m @db] + (doseq [k [10 11 0 38 39 -1 50]] + (testing (str "single-bound subseq at " k) + (is (= (subseq oracle >= k) (subseq m >= k)) (str ">= " k)) + (is (= (subseq oracle > k) (subseq m > k)) (str "> " k)) + (is (= (subseq oracle <= k) (subseq m <= k)) (str "<= " k)) + (is (= (subseq oracle < k) (subseq m < k)) (str "< " k)))) + (testing "two-bound subseq" + (is (= (subseq oracle >= 10 <= 30) (subseq m >= 10 <= 30))) + (is (= (subseq oracle > 10 < 30) (subseq m > 10 < 30))) + (is (= (subseq oracle >= 11 <= 29) (subseq m >= 11 <= 29)))))))) + +(deftest rseq-and-rsubseq-match-oracle + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-map) (map vector (shuffle (range 0 40 2)) (range)))] + (reset! db oracle) + (let [m @db] + (testing "rseq is the full descending sequence" + (is (= (rseq oracle) (rseq m)))) + (doseq [k [10 11 0 38 39 -1 50]] + (testing (str "single-bound rsubseq at " k) + (is (= (rsubseq oracle >= k) (rsubseq m >= k)) (str ">= " k)) + (is (= (rsubseq oracle > k) (rsubseq m > k)) (str "> " k)) + (is (= (rsubseq oracle <= k) (rsubseq m <= k)) (str "<= " k)) + (is (= (rsubseq oracle < k) (rsubseq m < k)) (str "< " k)))) + (testing "two-bound rsubseq" + (is (= (rsubseq oracle >= 10 <= 30) (rsubseq m >= 10 <= 30))) + (is (= (rsubseq oracle > 10 < 30) (rsubseq m > 10 < 30))) + (is (= (rsubseq oracle >= 11 <= 29) (rsubseq m >= 11 <= 29)))))))) + +(deftest empty-sorted-map-range-queries + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map)) + (let [m @db] + (testing "range queries on an empty (none-cursor) sorted map yield nothing" + (is (nil? (seq m))) + (is (nil? (rseq m))) + (is (empty? (subseq m >= 5))) + (is (empty? (subseq m < 5))) + (is (empty? (rsubseq m >= 5))) + (is (empty? (rsubseq m <= 5))) + (is (= ::nf (nth m 0 ::nf))))))) + (deftest numeric-keys-iterate-numerically (testing "long keys iterate in numeric, not lexical, order" (with-open [db (xdb/xit-db :memory)] From 32b96a8d2dec970c54fc1bd0dfa4ffef4e501e84 Mon Sep 17 00:00:00 2001 From: Florin Braghis Date: Tue, 23 Jun 2026 18:19:35 +0200 Subject: [PATCH 04/15] Implement on-disk sorted set (Issue 4) Persist clojure.lang.PersistentTreeSet as a SORTED_SET (a SortedMap with no values, member-as-key) and expose it as a fully ordered Clojure set. - conversion/v->slot!: detect PersistentTreeSet before the generic set? branch; reject non-default comparators. Generalise default-sorted-comparator? to any clojure.lang.Sorted. - sorted-operations: sset-* helpers (count/contains/assoc/disj/empty/seq/ seq-from/nth/rank/rseq), decoding members from each entry's key cursor. - xitdb.sorted-set: XITDBSortedSet (read) implementing IPersistentSet, Counted, Seqable, IFn, Iterable plus Sorted/Indexed/Reversible (subseq/rsubseq/rseq/ nth/sorted?); XITDBWriteSortedSet (write) with mutating conj/disj/empty. - read-from-cursor: SORTED_SET tag dispatch. Members of string/keyword/long/double/Instant/Date all iterate in natural order; ordered ops verified entry-for-entry against a plain sorted-set oracle. Co-Authored-By: Claude Opus 4.8 --- src/xitdb/sorted_set.clj | 211 +++++++++++++++++++++++++++ src/xitdb/util/conversion.clj | 34 ++++- src/xitdb/util/sorted_operations.clj | 98 ++++++++++++- src/xitdb/xitdb_types.clj | 6 + test/xitdb/sorted_set_test.clj | 211 +++++++++++++++++++++++++++ 5 files changed, 553 insertions(+), 7 deletions(-) create mode 100644 src/xitdb/sorted_set.clj create mode 100644 test/xitdb/sorted_set_test.clj diff --git a/src/xitdb/sorted_set.clj b/src/xitdb/sorted_set.clj new file mode 100644 index 0000000..62eaf52 --- /dev/null +++ b/src/xitdb/sorted_set.clj @@ -0,0 +1,211 @@ +(ns xitdb.sorted-set + "On-disk sorted set wrapper types, modelled on `xitdb.hash-set` (set shape) and + `xitdb.sorted-map` (the `Sorted`/`Indexed`/`Reversible` machinery). + + A `SORTED_SET` is a `SORTED_MAP` with no values: each member is its own key. + `XITDBSortedSet` is the read view; `XITDBWriteSortedSet` is the mutable view + used inside a transaction. Ordering is by the engine's unsigned byte + comparison over order-preserving encoded members (see `xitdb.util.sorted-key`)." + (:require + [xitdb.common :as common] + [xitdb.util.sorted-key :as sorted-key] + [xitdb.util.sorted-operations :as sorted-ops]) + (:import + [io.github.radarroark.xitdb + ReadCursor ReadSortedSet WriteCursor WriteSortedSet])) + +(defn- descending-start-index + "Index to begin a descending walk for `seqFrom(member, false)`: the largest + rank whose member is <= `member`. Uses `rank` (count of members strictly < + member); if `member` itself is present, include it." + [^ReadSortedSet rss member] + (let [r (sorted-ops/sset-rank rss member)] + (if (sorted-ops/sset-contains? rss member) + r + (dec r)))) + +(deftype XITDBSortedSet [^ReadSortedSet rss] + + clojure.lang.IPersistentSet + (disjoin [this k] + (disj (common/-materialize-shallow this) k)) + + (contains [this k] + (sorted-ops/sset-contains? rss k)) + + (get [this k] + (when (.contains this k) + k)) + + clojure.lang.IPersistentCollection + (cons [this o] + (. clojure.lang.RT (conj (common/-materialize-shallow this) o))) + + (empty [this] + (sorted-set)) + + (equiv [this other] + (and (instance? clojure.lang.IPersistentSet other) + (= (count this) (count other)) + (every? #(.contains this %) other))) + + (count [_] + (sorted-ops/sset-item-count rss)) + + clojure.lang.Seqable + (seq [_] + (sorted-ops/sset-seq rss)) + + clojure.lang.Sorted + (comparator [_] + sorted-key/key-comparator) + + (entryKey [_ entry] + entry) + + (seq [_ ascending?] + (if ascending? + (sorted-ops/sset-seq rss) + (sorted-ops/sset-rseq rss))) + + (seqFrom [_ member ascending?] + (if ascending? + (sorted-ops/sset-seq-from rss member) + (sorted-ops/sset-rseq rss (descending-start-index rss member)))) + + clojure.lang.Indexed + (nth [this i] + (let [e (.nth this i ::not-found)] + (if (identical? e ::not-found) + (throw (IndexOutOfBoundsException.)) + e))) + + (nth [_ i not-found] + (sorted-ops/sset-nth rss i not-found)) + + clojure.lang.Reversible + (rseq [_] + (sorted-ops/sset-rseq rss)) + + clojure.lang.ILookup + (valAt [this k] + (.valAt this k nil)) + + (valAt [this k not-found] + (if (.contains this k) + k + not-found)) + + clojure.lang.IFn + (invoke [this k] + (.valAt this k)) + + (invoke [this k not-found] + (.valAt this k not-found)) + + java.lang.Iterable + (iterator [this] + (let [iter (clojure.lang.SeqIterator. (seq this))] + (reify java.util.Iterator + (hasNext [_] + (.hasNext iter)) + (next [_] + (.next iter)) + (remove [_] + (throw (UnsupportedOperationException. "XITDBSortedSet iterator is read-only")))))) + + common/ISlot + (-slot [this] + (-> rss .cursor .slot)) + + common/IUnwrap + (-unwrap [this] + rss) + + common/IMaterialize + (-materialize [this] + (into (sorted-set) (map common/materialize (seq this)))) + + common/IMaterializeShallow + (-materialize-shallow [this] + (into (sorted-set) (seq this))) + + Object + (toString [this] + (str (into (sorted-set) this)))) + +(defmethod print-method XITDBSortedSet [o ^java.io.Writer w] + (.write w "#XITDBSortedSet") + (print-method (into (sorted-set) o) w)) + +;--------------------------------------------------- + +(deftype XITDBWriteSortedSet [^WriteSortedSet wss] + clojure.lang.IPersistentSet + (disjoin [this v] + (sorted-ops/sset-disj-value! wss (common/unwrap v)) + this) + + (contains [this v] + (sorted-ops/sset-contains? wss (common/unwrap v))) + + (get [this k] + (when (.contains this (common/unwrap k)) + k)) + + clojure.lang.IPersistentCollection + (cons [this o] + (sorted-ops/sset-assoc-value! wss (common/unwrap o)) + this) + + (empty [this] + (sorted-ops/sset-empty! wss) + this) + + (equiv [this other] + (and (instance? clojure.lang.IPersistentSet other) + (= (count this) (count other)) + (every? #(.contains this %) other))) + + (count [_] + (sorted-ops/sset-item-count wss)) + + clojure.lang.Seqable + (seq [_] + (sorted-ops/sset-seq wss)) + + clojure.lang.ILookup + (valAt [this k] + (.valAt this k nil)) + + (valAt [this k not-found] + (if (.contains this k) + k + not-found)) + + common/ISlot + (-slot [_] + (-> wss .cursor .slot)) + + common/IUnwrap + (-unwrap [_] + wss) + + common/IReadOnly + (-read-only [this] + (XITDBSortedSet. wss)) + + Object + (toString [_] + (str "XITDBWriteSortedSet"))) + +(defmethod print-method XITDBWriteSortedSet [o ^java.io.Writer w] + (.write w "#XITDBWriteSortedSet") + (print-method (into (sorted-set) (common/-read-only o)) w)) + +;; Constructor functions +(defn xwrite-sorted-set [^WriteCursor write-cursor] + (->XITDBWriteSortedSet (WriteSortedSet. write-cursor))) + +(defn xsorted-set [^ReadCursor read-cursor] + (->XITDBSortedSet (ReadSortedSet. read-cursor))) diff --git a/src/xitdb/util/conversion.clj b/src/xitdb/util/conversion.clj index e16843b..ccc41f8 100644 --- a/src/xitdb/util/conversion.clj +++ b/src/xitdb/util/conversion.clj @@ -3,11 +3,11 @@ [xitdb.util.sorted-key :as sorted-key] [xitdb.util.validation :as validation]) (:import - [clojure.lang PersistentTreeMap] + [clojure.lang PersistentTreeMap PersistentTreeSet] [io.github.radarroark.xitdb Database Database$Bytes Database$Float Database$Int ReadCursor Slot Slotted Tag WriteArrayList WriteCountedHashMap WriteCountedHashSet WriteCursor - WriteHashMap WriteHashSet WriteLinkedArrayList WriteSortedMap] + WriteHashMap WriteHashSet WriteLinkedArrayList WriteSortedMap WriteSortedSet] [java.io OutputStream OutputStreamWriter] [java.security DigestOutputStream])) @@ -136,12 +136,14 @@ (declare ^WriteCursor list->LinkedArrayListCursor!) (declare ^WriteCursor set->WriteCursor!) (declare ^WriteCursor sorted-map->WriteSortedMapCursor!) +(declare ^WriteCursor sorted-set->WriteSortedSetCursor!) (defn default-sorted-comparator? - "True if `tm` uses Clojure's natural ordering (no custom comparator). - Custom comparators cannot be honoured by the engine's fixed byte ordering." - [^PersistentTreeMap tm] - (identical? clojure.lang.RT/DEFAULT_COMPARATOR (.comparator tm))) + "True if `coll` (a PersistentTreeMap or PersistentTreeSet) uses Clojure's + natural ordering (no custom comparator). Custom comparators cannot be honoured + by the engine's fixed byte ordering." + [coll] + (identical? clojure.lang.RT/DEFAULT_COMPARATOR (.comparator ^clojure.lang.Sorted coll))) (defn ^Slot v->slot! "Converts a value to a XitDB slot. @@ -177,6 +179,16 @@ (.write cursor nil) (.slot (list->LinkedArrayListCursor! cursor v))) + ;; A sorted set is also `set?`, so it MUST be checked before the generic + ;; hash-set branch or it would be shadowed and stored as a hash set. + (instance? PersistentTreeSet v) + (do + (when-not (default-sorted-comparator? v) + (throw (IllegalArgumentException. + "sorted-set-by with a custom comparator is not supported; only natural ordering is allowed."))) + (.write cursor nil) + (.slot (sorted-set->WriteSortedSetCursor! cursor v))) + (set? v) (do (.write cursor nil) @@ -280,6 +292,16 @@ (.write value-cursor (v->slot! value-cursor v)))) (.-cursor wsm))) +(defn ^WriteCursor sorted-set->WriteSortedSetCursor! + "Writes a Clojure sorted set `s` to a XitDB WriteSortedSet. + Members are encoded with the order-preserving codec. Returns the cursor of the + created WriteSortedSet." + [^WriteCursor cursor s] + (let [wss (WriteSortedSet. cursor)] + (doseq [member s] + (.put wss (sorted-key/encode-key member))) + (.-cursor wss))) + (defn ^WriteCursor set->WriteCursor! "Writes a Clojure set `s` to a XitDB WriteHashSet. Returns the cursor of the created WriteHashSet." diff --git a/src/xitdb/util/sorted_operations.clj b/src/xitdb/util/sorted_operations.clj index d52cf41..0c9c792 100644 --- a/src/xitdb/util/sorted_operations.clj +++ b/src/xitdb/util/sorted_operations.clj @@ -6,7 +6,7 @@ [xitdb.util.conversion :as conversion] [xitdb.util.sorted-key :as sorted-key]) (:import - [io.github.radarroark.xitdb ReadCursor WriteCursor ReadSortedMap WriteSortedMap])) + [io.github.radarroark.xitdb ReadCursor WriteCursor ReadSortedMap WriteSortedMap ReadSortedSet WriteSortedSet])) (defn smap-item-count "O(1) entry count, delegating to the rank-augmented B-tree." @@ -113,6 +113,102 @@ (cons (kvpair->entry kv read-from-cursor) (step (dec i))))))))] (step start))))) +;; --------------------------------------------------------------------------- +;; Sorted SET helpers. A SORTED_SET is a SortedMap with no values: the MEMBER +;; is the key. We decode the member from the key cursor of each entry. +;; --------------------------------------------------------------------------- + +(defn sset-item-count + "O(1) member count." + [^ReadSortedSet rss] + (.count rss)) + +(defn sset-contains? + [^ReadSortedSet rss member] + (.contains rss (sorted-key/encode-key member))) + +(defn sset-assoc-value! + "Adds `member` to the set (no-op if already present). Returns the WriteSortedSet." + [^WriteSortedSet wss member] + (.put wss (sorted-key/encode-key member)) + wss) + +(defn sset-disj-value! + "Removes `member` from the set (no-op if absent). Returns the WriteSortedSet." + [^WriteSortedSet wss member] + (.remove wss (sorted-key/encode-key member)) + wss) + +(defn sset-empty! + "Replaces contents with an empty sorted set, in place." + [^WriteSortedSet wss] + (let [^WriteCursor cursor (.-cursor wss)] + (.write cursor nil) + (WriteSortedSet. cursor)) + wss) + +(defn- member-from-cursor + "Decodes the member from a set entry cursor (its key cursor)." + [cursor] + (decode-key-cursor (.-keyCursor (.readKeyValuePair cursor)))) + +(defn- kvpair->member + "Decodes the member from a Java KeyValuePair (its key cursor)." + [kv] + (decode-key-cursor (.-keyCursor kv))) + +(defn sset-seq + "Lazy ascending seq of members, or nil if empty." + [^ReadSortedSet rss] + (let [it (.iterator rss)] + (when (.hasNext it) + (letfn [(step [] + (lazy-seq + (when (.hasNext it) + (cons (member-from-cursor (.next it)) (step)))))] + (step))))) + +(defn sset-seq-from + "Lazy ascending seq of members starting at the first member >= `member`, + using the engine's native O(log n) lower-bound seek. nil if none." + [^ReadSortedSet rss member] + (let [it (.iteratorFrom rss (sorted-key/encode-key member))] + (when (.hasNext it) + (letfn [(step [] + (lazy-seq + (when (.hasNext it) + (cons (member-from-cursor (.next it)) (step)))))] + (step))))) + +(defn sset-nth + "Member at rank `index` (negative counts from the end), or `not-found` when + out of range. O(log n)." + [^ReadSortedSet rss index not-found] + (let [kv (.getIndexKeyValuePair rss (long index))] + (if (nil? kv) + not-found + (kvpair->member kv)))) + +(defn sset-rank + "Number of members strictly less than `member`. O(log n)." + [^ReadSortedSet rss member] + (.rank rss (sorted-key/encode-key member))) + +(defn sset-rseq + "Lazy descending seq of members, walking `getIndexKeyValuePair` from index + `start` down to 0. Low-memory (one member at a time)." + ([^ReadSortedSet rss] + (sset-rseq rss (dec (.count rss)))) + ([^ReadSortedSet rss start] + (when (>= start 0) + (letfn [(step [i] + (lazy-seq + (when (>= i 0) + (let [kv (.getIndexKeyValuePair rss (long i))] + (when kv + (cons (kvpair->member kv) (step (dec i))))))))] + (step start))))) + (defn smap-kv-reduce [^ReadSortedMap rsm read-from-cursor f init] (let [it (.iterator rsm)] diff --git a/src/xitdb/xitdb_types.clj b/src/xitdb/xitdb_types.clj index 6be9d2d..b011b11 100644 --- a/src/xitdb/xitdb_types.clj +++ b/src/xitdb/xitdb_types.clj @@ -6,6 +6,7 @@ [xitdb.hash-set :as xhash-set] [xitdb.linked-list :as xlinked-list] [xitdb.sorted-map :as xsorted-map] + [xitdb.sorted-set :as xsorted-set] [xitdb.util.conversion :as conversion]) (:import [io.github.radarroark.xitdb ReadCursor Slot Tag WriteCursor])) @@ -57,6 +58,11 @@ (xsorted-map/xwrite-sorted-map cursor) (xsorted-map/xsorted-map cursor)) + (= value-tag Tag/SORTED_SET) + (if for-writing? + (xsorted-set/xwrite-sorted-set cursor) + (xsorted-set/xsorted-set cursor)) + (= value-tag Tag/ARRAY_LIST) (if for-writing? (xarray-list/xwrite-array-list cursor) diff --git a/test/xitdb/sorted_set_test.clj b/test/xitdb/sorted_set_test.clj new file mode 100644 index 0000000..b80004b --- /dev/null +++ b/test/xitdb/sorted_set_test.clj @@ -0,0 +1,211 @@ +(ns xitdb.sorted-set-test + (:require + [clojure.test :refer :all] + [xitdb.db :as xdb] + [xitdb.test-utils :as tu :refer [with-db]]) + (:import + [java.time Instant] + [java.util Date])) + +(deftest tracer-bullet-ordered-seq + (testing "a persisted sorted-set is stored as a sorted set and seqs in order" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 3 1 2)) + (is (instance? xitdb.sorted_set.XITDBSortedSet @db)) + (is (= [1 2 3] (seq @db)))))) + +(deftest membership-and-count + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 1 2 3)) + (let [s @db] + (testing "contains? / get / invoke" + (is (true? (contains? s 2))) + (is (false? (contains? s 9))) + (is (= 2 (get s 2))) + (is (nil? (get s 9))) + (is (= 3 (s 3))) + (is (nil? (s 9)))) + (testing "count is correct and O(1)" + (is (= 3 (count s))))))) + +(deftest mutation-keeps-order + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 3 1)) + (testing "conj inserts in order" + (swap! db conj 5) + (swap! db conj 2) + (is (= [1 2 3 5] (seq @db)))) + (testing "disj removes and preserves order" + (swap! db disj 3) + (is (= [1 2 5] (seq @db)))) + (testing "conj of a duplicate is a no-op and does not change count" + (swap! db conj 2) + (is (= 3 (count @db))) + (is (= [1 2 5] (seq @db)))))) + +(deftest materialize-returns-plain-sorted-set + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 3 1 2)) + (let [s (tu/materialize @db)] + (is (sorted? s)) + (is (not (instance? xitdb.sorted_set.XITDBSortedSet s))) + (is (= [1 2 3] (seq s))) + (is (= #{1 2 3} s))))) + +(deftest read-only-ops-return-plain-collections + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 1 2)) + (let [s @db] + (testing "conj outside a transaction returns a plain sorted set" + (let [r (conj s 3)] + (is (not (instance? xitdb.sorted_set.XITDBSortedSet r))) + (is (sorted? r)) + (is (= [1 2 3] (seq r))))) + (testing "disj outside a transaction returns a plain sorted set" + (let [r (disj s 1)] + (is (not (instance? xitdb.sorted_set.XITDBSortedSet r))) + (is (sorted? r)) + (is (= [2] (seq r)))))))) + +(deftest sorted-predicate-and-comparator + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 3 1 2)) + (testing "sorted? is true for a persisted sorted set" + (is (sorted? @db))) + (testing "comparator is consistent with iteration order" + (let [^java.util.Comparator c (.comparator ^clojure.lang.Sorted @db)] + (is (neg? (.compare c 1 2))) + (is (pos? (.compare c 2 1))) + (is (zero? (.compare c 2 2))) + ;; cross-type bound checks must agree with the engine (not core/compare) + (is (neg? (.compare c 5 "x"))))))) + +(deftest nth-indexed + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-set) (shuffle (range 20)))] + (reset! db oracle) + (let [s @db + ov (vec oracle)] + (testing "nth by positive index matches the oracle's member at that rank" + (doseq [i (range 20)] + (is (= (nth ov i) (nth s i)) (str "nth " i)))) + (testing "negative index counts from the end (-1 = last)" + (is (= (last ov) (nth s -1))) + (is (= (nth ov 18) (nth s -2)))) + (testing "out-of-range nth/2 returns not-found" + (is (= ::nf (nth s 100 ::nf))) + (is (= ::nf (nth s -100 ::nf)))) + (testing "out-of-range nth/1 throws like a vector" + (is (thrown? IndexOutOfBoundsException (nth s 100)))))))) + +(deftest subseq-matches-oracle + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-set) (shuffle (range 0 40 2)))] + (reset! db oracle) + (let [s @db] + (doseq [k [10 11 0 38 39 -1 50]] + (testing (str "single-bound subseq at " k) + (is (= (subseq oracle >= k) (subseq s >= k)) (str ">= " k)) + (is (= (subseq oracle > k) (subseq s > k)) (str "> " k)) + (is (= (subseq oracle <= k) (subseq s <= k)) (str "<= " k)) + (is (= (subseq oracle < k) (subseq s < k)) (str "< " k)))) + (testing "two-bound subseq" + (is (= (subseq oracle >= 10 <= 30) (subseq s >= 10 <= 30))) + (is (= (subseq oracle > 10 < 30) (subseq s > 10 < 30))) + (is (= (subseq oracle >= 11 <= 29) (subseq s >= 11 <= 29)))))))) + +(deftest rseq-and-rsubseq-match-oracle + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-set) (shuffle (range 0 40 2)))] + (reset! db oracle) + (let [s @db] + (testing "rseq is the full descending sequence" + (is (= (rseq oracle) (rseq s)))) + (doseq [k [10 11 0 38 39 -1 50]] + (testing (str "single-bound rsubseq at " k) + (is (= (rsubseq oracle >= k) (rsubseq s >= k)) (str ">= " k)) + (is (= (rsubseq oracle > k) (rsubseq s > k)) (str "> " k)) + (is (= (rsubseq oracle <= k) (rsubseq s <= k)) (str "<= " k)) + (is (= (rsubseq oracle < k) (rsubseq s < k)) (str "< " k)))) + (testing "two-bound rsubseq" + (is (= (rsubseq oracle >= 10 <= 30) (rsubseq s >= 10 <= 30))) + (is (= (rsubseq oracle > 10 < 30) (rsubseq s > 10 < 30))) + (is (= (rsubseq oracle >= 11 <= 29) (rsubseq s >= 11 <= 29)))))))) + +(deftest empty-sorted-set-range-queries + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set)) + (let [s @db] + (testing "range queries on an empty (none-cursor) sorted set yield nothing" + (is (nil? (seq s))) + (is (nil? (rseq s))) + (is (empty? (subseq s >= 5))) + (is (empty? (subseq s < 5))) + (is (empty? (rsubseq s >= 5))) + (is (empty? (rsubseq s <= 5))) + (is (= ::nf (nth s 0 ::nf))))))) + +(deftest member-types-iterate-in-natural-order + (testing "string members iterate lexicographically" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set "banana" "apple" "cherry")) + (is (= ["apple" "banana" "cherry"] (seq @db))) + (is (every? string? (seq @db))))) + (testing "keyword members iterate in natural order and round-trip" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set :c :a :b)) + (is (= [:a :b :c] (seq @db))) + (is (every? keyword? (seq @db))))) + (testing "long members iterate numerically, incl. extremes" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-set) [3 -5 0 Long/MIN_VALUE Long/MAX_VALUE])) + (is (= [Long/MIN_VALUE -5 0 3 Long/MAX_VALUE] (seq @db))))) + (testing "double members iterate numerically, incl. negatives and zero" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 3.5 -1.5 0.0 1.0e308 -1.0e308)) + (is (= [-1.0e308 -1.5 0.0 3.5 1.0e308] (seq @db))))) + (testing "Instant members iterate chronologically and round-trip to Instant" + (with-open [db (xdb/xit-db :memory)] + (let [t0 (Instant/ofEpochSecond 100) + t1 (Instant/ofEpochSecond 200 500) + t2 (Instant/ofEpochSecond 200 999)] + (reset! db (sorted-set t2 t0 t1)) + (is (= [t0 t1 t2] (seq @db))) + (is (every? #(instance? Instant %) (seq @db)))))) + (testing "Date members iterate chronologically and round-trip to Date" + (with-open [db (xdb/xit-db :memory)] + (let [d0 (Date. 0) d1 (Date. 1000) d2 (Date. 2000)] + (reset! db (sorted-set d2 d0 d1)) + (is (= [d0 d1 d2] (seq @db))) + (is (every? #(instance? Date %) (seq @db))))))) + +(deftest custom-comparator-rejected + (with-open [db (xdb/xit-db :memory)] + (is (thrown? IllegalArgumentException + (reset! db (sorted-set-by > 1 2 3)))))) + +(deftest print-method-ordered + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 3 1 2)) + (let [s (pr-str @db)] + (is (clojure.string/starts-with? s "#XITDBSortedSet")) + (is (clojure.string/includes? s "1 2 3"))))) + +(deftest nesting-and-round-trip + (testing "sorted set nests inside a hash map value" + (with-open [db (xdb/xit-db :memory)] + (reset! db {:idx (sorted-set 3 1 2)}) + (is (instance? xitdb.sorted_set.XITDBSortedSet (:idx @db))) + (is (= [1 2 3] (seq (:idx @db)))))) + (testing "nested sorted set round-trips against an in-memory atom" + (with-db [db (tu/test-db)] + (reset! db {:idx (sorted-set 3 1 2)}) + (is (tu/db-equal-to-atom? db)))) + (testing "empty clears the set in place" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 1 2 3)) + (swap! db empty) + (is (= 0 (count @db))) + (is (empty? (seq @db))) + (swap! db conj 7) + (is (= [7] (seq @db)))))) From 5c6f0ed373917f8d46d52cf87272a240ec268d3d Mon Sep 17 00:00:00 2001 From: Florin Braghis Date: Tue, 23 Jun 2026 18:22:40 +0200 Subject: [PATCH 05/15] Add rank + pagination public helpers (Issue 5) New public namespace xitdb.sorted exposing the rank-augmented B-tree's superpowers over both XITDBSortedMap and XITDBSortedSet: - (rank coll k) - O(log n) count of entries strictly less than k; the index of a present key/member, or the would-be insertion index of an absent one. Inverse of nth. - (from-index coll n) - lazy ordered seq starting at rank n, backed by the engine's iteratorFromIndex (O(log n) seek + streaming walk); does not materialise the whole collection. - (page coll offset limit) - lazy ordered page, stops cleanly at the end. For a sorted map elements are MapEntry pairs; for a sorted set, members. The public ns is a thin dispatch over common/-unwrap; the streaming work lives in sorted-operations (smap-seq-from-index / sset-seq-from-index). Documented and tested as a timestamp->id secondary index that is paged chronologically. rank/nth verified as inverses; pagination verified lazy against a 2000-entry collection. Co-Authored-By: Claude Opus 4.8 --- src/xitdb/sorted.clj | 70 +++++++++++++++ src/xitdb/util/sorted_operations.clj | 28 ++++++ test/xitdb/sorted_pagination_test.clj | 123 ++++++++++++++++++++++++++ 3 files changed, 221 insertions(+) create mode 100644 src/xitdb/sorted.clj create mode 100644 test/xitdb/sorted_pagination_test.clj diff --git a/src/xitdb/sorted.clj b/src/xitdb/sorted.clj new file mode 100644 index 0000000..5a3ee19 --- /dev/null +++ b/src/xitdb/sorted.clj @@ -0,0 +1,70 @@ +(ns xitdb.sorted + "Public helpers for on-disk sorted collections (`XITDBSortedMap` / + `XITDBSortedSet`) that go beyond `clojure.core`'s in-memory sorted + collections, exposing the rank-augmented B-tree's superpowers: + + - `rank` - O(log n) index of a key/member (inverse of indexed `nth`). + - `page` - lazy ordered page starting at a rank (offset/limit). + - `from-index` - lazy ordered seq starting at a rank. + + These are the recommended way to build and paginate on-disk secondary indexes. + For example, a timestamp -> id index over events: + + (reset! db (sorted-map)) + (doseq [e events] + (swap! db assoc (:ts e) (:id e))) + + ;; serve the page [offset, offset+limit) in chronological order, + ;; reading only that page from disk: + (xsorted/page @db offset limit) + + ;; or, starting from a known timestamp boundary: + (->> (subseq @db >= start-ts) + (take limit)) + + Both `rank` and the pagination helpers work on `XITDBSortedMap` (yielding + `MapEntry` pairs) and `XITDBSortedSet` (yielding members)." + (:require + [xitdb.common :as common] + [xitdb.util.sorted-operations :as sorted-ops]) + (:import + [io.github.radarroark.xitdb ReadSortedMap ReadSortedSet])) + +(defn rank + "Number of entries in the sorted collection `coll` strictly less than `k`, + in O(log n). For a present key/member this is its index (the inverse of + `nth`); for an absent one it is the would-be insertion index. Works on both + `XITDBSortedMap` and `XITDBSortedSet`." + [coll k] + (let [u (common/-unwrap coll)] + (cond + (instance? ReadSortedMap u) (sorted-ops/smap-rank u k) + (instance? ReadSortedSet u) (sorted-ops/sset-rank u k) + :else (throw (IllegalArgumentException. + (str "rank requires an XITDBSortedMap or XITDBSortedSet, got: " + (type coll))))))) + +(defn from-index + "Lazy ordered seq of the sorted collection `coll` starting at rank `n` + (0-based), backed by the engine's `iteratorFromIndex` (O(log n) seek, then a + streaming walk). Does not materialise the whole collection. For a sorted map + the elements are `MapEntry` pairs; for a sorted set they are members. Returns + nil when `n` is at or past the end." + [coll n] + (let [u (common/-unwrap coll)] + (cond + (instance? ReadSortedMap u) + (sorted-ops/smap-seq-from-index u common/-read-from-cursor n) + (instance? ReadSortedSet u) + (sorted-ops/sset-seq-from-index u n) + :else (throw (IllegalArgumentException. + (str "from-index requires an XITDBSortedMap or XITDBSortedSet, got: " + (type coll))))))) + +(defn page + "Lazy ordered page of `coll`: at most `limit` elements starting at rank + `offset`. Equivalent to `(take limit (from-index coll offset))`, and stops + cleanly at the end of the collection. Lazy and low-memory. For a sorted map + the elements are `MapEntry` pairs; for a sorted set they are members." + [coll offset limit] + (take limit (from-index coll offset))) diff --git a/src/xitdb/util/sorted_operations.clj b/src/xitdb/util/sorted_operations.clj index 0c9c792..7084f5d 100644 --- a/src/xitdb/util/sorted_operations.clj +++ b/src/xitdb/util/sorted_operations.clj @@ -98,6 +98,21 @@ [^ReadSortedMap rsm key] (.rank rsm (sorted-key/encode-key key))) +(defn smap-seq-from-index + "Lazy ascending seq of MapEntry pairs starting at rank `index` (0-based), + using the engine's native O(log n) `iteratorFromIndex` seek. nil if none. + Streams one entry at a time; does not materialise the whole collection." + [^ReadSortedMap rsm read-from-cursor index] + (let [it (.iteratorFromIndex rsm (long index))] + (when (.hasNext it) + (letfn [(step [] + (lazy-seq + (when (.hasNext it) + (cons (kvpair->entry (.readKeyValuePair (.next it)) + read-from-cursor) + (step)))))] + (step))))) + (defn smap-rseq "Lazy descending seq of MapEntry pairs, walking `getIndexKeyValuePair` from index `start` down to 0. Stays low-memory (one entry materialised at a time)." @@ -194,6 +209,19 @@ [^ReadSortedSet rss member] (.rank rss (sorted-key/encode-key member))) +(defn sset-seq-from-index + "Lazy ascending seq of members starting at rank `index` (0-based), using the + engine's native O(log n) `iteratorFromIndex` seek. nil if none. Streams one + member at a time; does not materialise the whole collection." + [^ReadSortedSet rss index] + (let [it (.iteratorFromIndex rss (long index))] + (when (.hasNext it) + (letfn [(step [] + (lazy-seq + (when (.hasNext it) + (cons (member-from-cursor (.next it)) (step)))))] + (step))))) + (defn sset-rseq "Lazy descending seq of members, walking `getIndexKeyValuePair` from index `start` down to 0. Low-memory (one member at a time)." diff --git a/test/xitdb/sorted_pagination_test.clj b/test/xitdb/sorted_pagination_test.clj new file mode 100644 index 0000000..5d050e6 --- /dev/null +++ b/test/xitdb/sorted_pagination_test.clj @@ -0,0 +1,123 @@ +(ns xitdb.sorted-pagination-test + (:require + [clojure.test :refer :all] + [xitdb.db :as xdb] + [xitdb.sorted :as xsorted] + [xitdb.test-utils :as tu :refer [with-db]]) + (:import + [java.time Instant])) + +(deftest rank-on-sorted-map + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-map) (map vector (range 0 40 2) (range)))] + (reset! db oracle) + (let [m @db] + (testing "rank of a present key is its index" + (doseq [[i k] (map-indexed vector (keys oracle))] + (is (= i (xsorted/rank m k)) (str "rank of present " k)))) + (testing "rank of an absent key is its would-be insertion index" + (is (= 0 (xsorted/rank m -1))) + (is (= 1 (xsorted/rank m 1))) + (is (= 20 (xsorted/rank m 100)))))))) + +(deftest rank-on-sorted-set + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-set) (range 0 40 2))] + (reset! db oracle) + (let [s @db] + (testing "rank of a present member is its index" + (doseq [[i k] (map-indexed vector (seq oracle))] + (is (= i (xsorted/rank s k)) (str "rank of present " k)))) + (testing "rank of an absent member is its would-be insertion index" + (is (= 0 (xsorted/rank s -1))) + (is (= 1 (xsorted/rank s 1))) + (is (= 20 (xsorted/rank s 100)))))))) + +(deftest rank-and-nth-are-inverses + (testing "on a sorted map: (= i (rank m (key (nth m i))))" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-map) (map vector (shuffle (range 30)) (range)))) + (let [m @db] + (doseq [i (range (count m))] + (is (= i (xsorted/rank m (key (nth m i)))) (str "i=" i)))))) + (testing "on a sorted set: (= i (rank s (nth s i)))" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-set) (shuffle (range 30)))) + (let [s @db] + (doseq [i (range (count s))] + (is (= i (xsorted/rank s (nth s i))) (str "i=" i))))))) + +(deftest pagination-on-map + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-map) (map vector (range 0 40 2) (range)))] + (reset! db oracle) + (let [m @db + ov (vec oracle)] + (testing "page returns the correct ordered window" + (is (= (subvec ov 5 10) (xsorted/page m 5 5))) + (is (= (take 3 ov) (xsorted/page m 0 3)))) + (testing "page stops cleanly at the end of the collection" + (is (= (subvec ov 18 20) (xsorted/page m 18 5))) + (is (= 2 (count (xsorted/page m 18 100))))) + (testing "from-index streams from a rank to the end" + (is (= (subvec ov 17 20) (xsorted/from-index m 17)))))))) + +(deftest pagination-on-set + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-set) (range 0 40 2))] + (reset! db oracle) + (let [s @db + ov (vec oracle)] + (testing "page returns the correct ordered window of members" + (is (= (subvec ov 5 10) (xsorted/page s 5 5))) + (is (= (take 3 ov) (xsorted/page s 0 3)))) + (testing "page stops cleanly at the end of the collection" + (is (= (subvec ov 18 20) (xsorted/page s 18 5))) + (is (= 2 (count (xsorted/page s 18 100))))) + (testing "from-index streams from a rank to the end" + (is (= (subvec ov 17 20) (xsorted/from-index s 17)))))))) + +(deftest pagination-is-lazy + (testing "from-index returns a lazy seq and does not realise a large collection" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-map) (map vector (range 2000) (range)))) + (let [m @db + p (xsorted/page m 0 5)] + (is (instance? clojure.lang.LazySeq (xsorted/from-index m 0))) + (is (= 5 (count p))) + (is (= (map vector (range 5) (range 5)) + (map (juxt key val) p)))))) + (testing "from-index on a set is lazy over a large collection" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-set) (range 2000))) + (let [s @db] + (is (instance? clojure.lang.LazySeq (xsorted/from-index s 0))) + (is (= (range 0 5) (xsorted/page s 0 5))))))) + +(deftest doc-example-timestamp-id-secondary-index + (testing "build a timestamp -> id secondary index and page through it" + (with-open [db (xdb/xit-db :memory)] + ;; Events arrive out of order; index them by their (unique) timestamp. + (let [base (Instant/parse "2024-01-01T00:00:00Z") + events (for [i (shuffle (range 100))] + {:id i :ts (.plusSeconds base i)})] + (reset! db (sorted-map)) + (doseq [e events] + (swap! db assoc (:ts e) (:id e))) + + (testing "rank gives the chronological position of a timestamp" + (is (= 0 (xsorted/rank @db base))) + (is (= 50 (xsorted/rank @db (.plusSeconds base 50))))) + + (testing "page serves a chronological window of [ts id] pairs" + (let [pg (xsorted/page @db 10 5)] + (is (= [(.plusSeconds base 10) + (.plusSeconds base 11) + (.plusSeconds base 12) + (.plusSeconds base 13) + (.plusSeconds base 14)] + (map key pg))) + (is (= [10 11 12 13 14] (map val pg))))) + + (testing "paging to the end stops cleanly" + (is (= 3 (count (xsorted/page @db 97 10))))))))) From 620b2a74fe7987361230885f3c7dc68ad091800f Mon Sep 17 00:00:00 2001 From: Florin Braghis Date: Tue, 23 Jun 2026 18:25:44 +0200 Subject: [PATCH 06/15] Fix sorted map/set nested directly in a vector/list stored as hash The array-list and linked-list element writers predate sorted-collection support and dispatched a PersistentTreeMap/PersistentTreeSet through their generic map?/set? branches, silently persisting it as a HASH_MAP/HASH_SET. Ordering was then lost (sorted? false, subseq/rank broken); it only looked correct when key hashes happened to iterate in order. Delegate the tree-type cases to v->slot! (which already checks the tree types before the hash branches and rejects custom comparators) in both coll->ArrayListCursor! and list->LinkedArrayListCursor!. Co-Authored-By: Claude Opus 4.8 --- src/xitdb/util/conversion.clj | 12 ++++++++++++ test/xitdb/sorted_map_test.clj | 15 ++++++++++++++- test/xitdb/sorted_set_test.clj | 7 +++++++ 3 files changed, 33 insertions(+), 1 deletion(-) diff --git a/src/xitdb/util/conversion.clj b/src/xitdb/util/conversion.clj index ccc41f8..d999037 100644 --- a/src/xitdb/util/conversion.clj +++ b/src/xitdb/util/conversion.clj @@ -215,6 +215,12 @@ (let [write-array (WriteArrayList. cursor)] (doseq [v coll] (cond + ;; Sorted map/set are also map?/set?, so delegate to v->slot! (which + ;; checks the tree types first) before the generic hash branches. + (or (instance? PersistentTreeMap v) (instance? PersistentTreeSet v)) + (let [v-cursor (.appendCursor write-array)] + (.write v-cursor (v->slot! v-cursor v))) + (map? v) (let [v-cursor (.appendCursor write-array)] (map->WriteHashMapCursor! v-cursor v)) @@ -245,6 +251,12 @@ (doseq [v coll] (when *debug?* (println "v=" v)) (cond + ;; Sorted map/set are also map?/set?, so delegate to v->slot! (which + ;; checks the tree types first) before the generic hash branches. + (or (instance? PersistentTreeMap v) (instance? PersistentTreeSet v)) + (let [v-cursor (.appendCursor write-list)] + (.write v-cursor (v->slot! v-cursor v))) + (map? v) (let [v-cursor (.appendCursor write-list)] (map->WriteHashMapCursor! v-cursor v)) diff --git a/test/xitdb/sorted_map_test.clj b/test/xitdb/sorted_map_test.clj index 7c039d4..c0b6a06 100644 --- a/test/xitdb/sorted_map_test.clj +++ b/test/xitdb/sorted_map_test.clj @@ -91,7 +91,20 @@ "set" #{:a :b})) (is (= [1 2 3] (tu/materialize (get @db "vec")))) (is (= {:x 1} (tu/materialize (get @db "map")))) - (is (= #{:a :b} (tu/materialize (get @db "set"))))))) + (is (= #{:a :b} (tu/materialize (get @db "set")))))) + (testing "a sorted map nested directly inside a vector stays sorted" + (with-open [db (xdb/xit-db :memory)] + (reset! db [(sorted-map 3 :c 1 :a 2 :b)]) + (let [m (first (seq @db))] + (is (instance? xitdb.sorted_map.XITDBSortedMap m)) + (is (sorted? m)) + (is (= [1 2 3] (map key (seq m))))))) + (testing "a sorted map nested inside a list stays sorted" + (with-open [db (xdb/xit-db :memory)] + (reset! db (list (sorted-map 3 :c 1 :a 2 :b))) + (let [m (first (seq @db))] + (is (instance? xitdb.sorted_map.XITDBSortedMap m)) + (is (= [1 2 3] (map key (seq m)))))))) (deftest empty-clears-map (with-open [db (xdb/xit-db :memory)] diff --git a/test/xitdb/sorted_set_test.clj b/test/xitdb/sorted_set_test.clj index b80004b..5d71e47 100644 --- a/test/xitdb/sorted_set_test.clj +++ b/test/xitdb/sorted_set_test.clj @@ -201,6 +201,13 @@ (with-db [db (tu/test-db)] (reset! db {:idx (sorted-set 3 1 2)}) (is (tu/db-equal-to-atom? db)))) + (testing "a sorted set nested directly inside a vector stays sorted" + (with-open [db (xdb/xit-db :memory)] + (reset! db [(sorted-set 30 10 20)]) + (let [s (first (seq @db))] + (is (instance? xitdb.sorted_set.XITDBSortedSet s)) + (is (sorted? s)) + (is (= [10 20 30] (seq s)))))) (testing "empty clears the set in place" (with-open [db (xdb/xit-db :memory)] (reset! db (sorted-set 1 2 3)) From e62bcb839ea9f39a8cd6f52b9072ba83f6907773 Mon Sep 17 00:00:00 2001 From: Florin Braghis Date: Tue, 23 Jun 2026 19:21:38 +0200 Subject: [PATCH 07/15] Remove sorted-collections PRD and issue planning docs These were planning artifacts for the sorted map/set work, not intended to live in the repo. Co-Authored-By: Claude Opus 4.8 --- doc/issues/01-walking-skeleton-sorted-map.md | 61 ---- doc/issues/02-sorted-protocol-map.md | 47 --- doc/issues/03-numeric-temporal-key-codec.md | 48 --- doc/issues/04-sorted-set.md | 51 --- doc/issues/05-rank-and-pagination.md | 40 --- doc/issues/README.md | 17 - doc/sorted-map-prd.md | 331 ------------------- 7 files changed, 595 deletions(-) delete mode 100644 doc/issues/01-walking-skeleton-sorted-map.md delete mode 100644 doc/issues/02-sorted-protocol-map.md delete mode 100644 doc/issues/03-numeric-temporal-key-codec.md delete mode 100644 doc/issues/04-sorted-set.md delete mode 100644 doc/issues/05-rank-and-pagination.md delete mode 100644 doc/issues/README.md delete mode 100644 doc/sorted-map-prd.md diff --git a/doc/issues/01-walking-skeleton-sorted-map.md b/doc/issues/01-walking-skeleton-sorted-map.md deleted file mode 100644 index eb771eb..0000000 --- a/doc/issues/01-walking-skeleton-sorted-map.md +++ /dev/null @@ -1,61 +0,0 @@ -# Issue 1: Walking skeleton — string/keyword-keyed sorted map (read + write) - -Type: AFK -Status: ready-for-agent - -## Parent - -[Sorted Map & Sorted Set PRD](../sorted-map-prd.md) - -## What to build - -The end-to-end walking skeleton that makes a persisted `sorted-map` a working, -ordered, on-disk Clojure collection — for **string and keyword keys only**. -This slice threads every integration layer once so the remaining slices can -extend it: - -- A small **order-preserving key codec** (`xitdb.util.sorted-key`) with a stable - 1-byte type tag per key type. This slice implements the tag infrastructure plus - the **string** and **keyword** encodings (UTF-8 bytes, which already sort in - code-point order). Interface: `encode-key ^bytes [k]` and `decode-key [^bytes]`. -- **Construction detection**: `conversion/v->slot!` (and the nested writers) - recognise `clojure.lang.PersistentTreeMap` and persist it as a `SORTED_MAP`. - The tree-map branch must be checked **before** the generic `map?` branch, since - a tree map is also a `map?`. If the tree map carries a non-default comparator, - throw `IllegalArgumentException` (custom comparators are not supported). -- **Read dispatch**: `xitdb-types/read-from-cursor` returns `XITDBSortedMap` (read) - or `XITDBWriteSortedMap` (write) for the `SORTED_MAP` tag, mirroring the - existing `HASH_MAP` cases. -- **Wrapper types** (`xitdb.sorted-map`), modelled on `xitdb.hash-map`: - - `XITDBSortedMap` (read): `ILookup`, `Associative`, `IPersistentMap`, - `Counted`, `Seqable` (ascending ordered `seq`), `IFn`, `Iterable`, - `IKVReduce`, plus `common/ISlot`/`IUnwrap`/`IMaterialize`/ - `IMaterializeShallow`. Read-only `assoc`/`dissoc`/`cons` materialise-shallow - and return a plain Clojure `sorted-map`. - - `XITDBWriteSortedMap` (write): mutating `assoc`/`without`/`cons`/`empty` - against the live `WriteSortedMap`, plus `IReadOnly`. -- A **sorted-operations** namespace (`xitdb.util.sorted-operations`) bridging the - types to the Java `Read/WriteSortedMap` (`put`/`remove`/`getCursor`/`count`/ - `iterator`, decoding keys on read). -- `print-method` for both types (ordered output, `#XITDBSortedMap`). - -`Sorted`/`Indexed`/`Reversible` (subseq, nth, rseq) are intentionally deferred to -Issue 2. Numeric/temporal keys are deferred to Issue 3. - -## Acceptance criteria - -- [ ] `(reset! db (sorted-map "b" 2 "a" 1))` then `@db` seqs as `(["a" 1] ["b" 2])` in key order. -- [ ] `(get @db "a")`, `(@db "a")`, `(:k ...)`-style lookup, `(contains? @db "a")`, `(find @db "a")` all work. -- [ ] `(count @db)` is correct and O(1) (delegates to `ReadSortedMap.count()`). -- [ ] `(swap! db assoc "c" 3)` keeps order; `(swap! db dissoc "a")` removes and preserves order; re-assoc of an existing key replaces the value without changing count. -- [ ] Keyword keys round-trip to keywords and sort correctly; string keys round-trip to strings. -- [ ] `(sorted? @db)` is **not** required yet, but `(materialize @db)` returns a plain Clojure `sorted-map` with matching order. -- [ ] Read-only `assoc`/`dissoc` (outside a transaction) returns a plain Clojure sorted collection, not an `XITDB*` type — consistent with `XITDBHashMap`. -- [ ] Persisting a `sorted-map-by` with a custom comparator throws `IllegalArgumentException`. -- [ ] A sorted map nests inside a hash map value and round-trips; values may be vectors/maps/sets. -- [ ] `(tu/db-equal-to-atom? db)` style round-trip holds for a string/keyword-keyed sorted map. -- [ ] `print-method` renders ordered, distinguishable output. - -## Blocked by - -None - can start immediately. diff --git a/doc/issues/02-sorted-protocol-map.md b/doc/issues/02-sorted-protocol-map.md deleted file mode 100644 index 7f77694..0000000 --- a/doc/issues/02-sorted-protocol-map.md +++ /dev/null @@ -1,47 +0,0 @@ -# Issue 2: `clojure.lang.Sorted` for the sorted map — subseq / rsubseq / rseq / nth / sorted? - -Type: AFK -Status: ready-for-agent - -## Parent - -[Sorted Map & Sorted Set PRD](../sorted-map-prd.md) - -## What to build - -Make `XITDBSortedMap` a fully sorted Clojure collection by implementing the -three interfaces that `clojure.core` builds its ordered operations on, so -`sorted?`, `subseq`, `rsubseq`, `rseq`, and indexed `nth` all work against disk. - -- `clojure.lang.Sorted`: - - `comparator` → a comparator consistent with the codec's natural ordering, so - `subseq`'s own bound checks agree with the engine. - - `entryKey` → `key` of the MapEntry. - - `seq(ascending?)` → ascending uses `iterator()`; descending uses a - rank-based index walk (there is no native reverse iterator). - - `seqFrom(k, ascending?)` → ascending maps directly to - `ReadSortedMap.iteratorFrom(encode k)` (native O(log n) lower-bound seek); - descending uses `rank(encode k)` + a descending `getIndexKeyValuePair` walk. -- `clojure.lang.Indexed`: - - `nth(i)` / `nth(i, not-found)` → `getIndexKeyValuePair(i)` returning a - MapEntry (decode key, read value). Support negative indices per Java - semantics (`-1` = last). -- `clojure.lang.Reversible`: - - `rseq` → descending lazy seq (index walk from `count-1` down). - -Descending seqs must stay lazy and low-memory (step via `getIndexKeyValuePair`, -do not materialise the whole map). - -## Acceptance criteria - -- [ ] `(sorted? @db)` returns `true` for a persisted sorted map. -- [ ] `(subseq @db >= k)`, `> k`, `<= k`, `< k`, and the two-bound form all return the same entries (in order) as the equivalent plain-Clojure `sorted-map` oracle. -- [ ] `(rsubseq @db ...)` mirrors the plain-Clojure oracle for all test/bound forms. -- [ ] `(seq @db)` is ascending; `(rseq @db)` is descending; both lazy. -- [ ] `(nth @db i)` returns the entry at rank `i` in O(log n); `(nth @db -1)` returns the last entry; out-of-range honours `not-found`/throws like a vector. -- [ ] `subseq`/`rsubseq` on an empty (none-cursor) sorted map yield nothing. -- [ ] `(comparator @db)` is consistent with iteration order (subseq bound filtering agrees with engine order). - -## Blocked by - -- [Issue 1: Walking skeleton — string/keyword-keyed sorted map](01-walking-skeleton-sorted-map.md) diff --git a/doc/issues/03-numeric-temporal-key-codec.md b/doc/issues/03-numeric-temporal-key-codec.md deleted file mode 100644 index fbf9021..0000000 --- a/doc/issues/03-numeric-temporal-key-codec.md +++ /dev/null @@ -1,48 +0,0 @@ -# Issue 3: Numeric & temporal key codec — long, double, inst/date - -Type: AFK -Status: ready-for-agent - -## Parent - -[Sorted Map & Sorted Set PRD](../sorted-map-prd.md) - -## What to build - -Extend the order-preserving key codec (`xitdb.util.sorted-key`) with tagged -encodings for the remaining v1 key types, so that numeric and temporal keys sort -in their natural order on disk. No wrapper-type changes are needed — the sorted -map (and later the sorted set) call `encode-key`/`decode-key`, so they gain these -key types automatically once the codec supports them. - -Encodings (each carries its own type tag; the tag also defines a stable -cross-type order so heterogeneous keys never throw): - -- **Long / integer** → tag + 8-byte big-endian with the **sign bit flipped** - (XOR `0x80` on the top byte). Makes signed integers sort correctly as unsigned - bytes: negatives before positives, ascending within each. (Same technique the - Java library uses for its creation-time index example.) -- **Double** → tag + IEEE-754 8-byte big-endian with the order-preserving bit - flip: if the sign bit is set, flip all bits; otherwise flip only the sign bit. -- **Instant** → tag + big-endian epoch encoding (e.g. epoch-second + nanos) so - chronological order equals byte order; decodes back to `Instant`. -- **Date** → tag + big-endian epoch encoding (distinct tag from `Instant`); - decodes back to `java.util.Date`. - -This slice is the correctness-critical one and must ship with property-based -ordering tests (see Testing Decisions in the PRD). - -## Acceptance criteria - -- [ ] `(reset! db (sorted-map 9 :a 10 :b 1 :c))` iterates numerically as `1, 9, 10` (not lexically). -- [ ] Negative and positive long keys sort correctly together (e.g. `-5 < 0 < 3`), including `Long/MIN_VALUE` and `Long/MAX_VALUE`. -- [ ] Double keys sort numerically, including negatives, zero, and large magnitudes. -- [ ] `Instant` keys iterate in chronological order and round-trip to `Instant`; `Date` keys likewise round-trip to `Date`. -- [ ] Round-trip property: `(= k (decode-key (encode-key k)))` for every supported type. -- [ ] Order-preservation property (generative): for random same-type pairs `a`,`b`, `sign(compareUnsigned(encode a, encode b)) == sign(compare a b)`. -- [ ] Cross-type ordering is total and never throws. -- [ ] Unsupported key types throw a clear error. - -## Blocked by - -- [Issue 1: Walking skeleton — string/keyword-keyed sorted map](01-walking-skeleton-sorted-map.md) diff --git a/doc/issues/04-sorted-set.md b/doc/issues/04-sorted-set.md deleted file mode 100644 index 93668c3..0000000 --- a/doc/issues/04-sorted-set.md +++ /dev/null @@ -1,51 +0,0 @@ -# Issue 4: Sorted set end-to-end — `XITDBSortedSet` / `XITDBWriteSortedSet` - -Type: AFK -Status: ready-for-agent - -## Parent - -[Sorted Map & Sorted Set PRD](../sorted-map-prd.md) - -## What to build - -The set counterpart to the sorted map: persist a `clojure.lang.PersistentTreeSet` -as an on-disk `SORTED_SET` and expose it as a fully ordered Clojure set. Reuses -the key codec (Issues 1 + 3) and the `Sorted`/`Indexed`/`Reversible` machinery -established for the map (Issue 2). - -- **Construction detection**: `conversion/v->slot!` recognises - `clojure.lang.PersistentTreeSet` (checked before the generic `set?` branch) and - writes a `SORTED_SET`. Reject non-default comparators with `IllegalArgumentException`. -- **Read dispatch**: `read-from-cursor` returns `XITDBSortedSet` / - `XITDBWriteSortedSet` for the `SORTED_SET` tag. -- **Wrapper types** (`xitdb.sorted-set`), modelled on `xitdb.hash-set`: - - `XITDBSortedSet` (read): `IPersistentSet` (`contains?`/`get`/`disjoin`), - `Counted`, `Seqable` (ordered), `IFn`, `Iterable`, plus `ISlot`/`IUnwrap`/ - `IMaterialize`/`IMaterializeShallow`, **and** `Sorted`/`Indexed`/`Reversible` - so `subseq`/`rsubseq`/`rseq`/`nth`/`sorted?` work over the set. Read-only - `conj`/`disj` return a plain Clojure `sorted-set`. - - `XITDBWriteSortedSet` (write): mutating `conj`/`disjoin`/`empty` against the - live `WriteSortedSet`, plus `IReadOnly`. -- Set operations in `xitdb.util.sorted-operations` over the Java - `Read/WriteSortedSet` (`put`/`remove`/`contains`/`count`/`iterator`/ - `getIndexKeyValuePair`), decoding members on read. -- `print-method` (`#XITDBSortedSet`, ordered). -- `materialize` returns a plain Clojure `sorted-set` with matching order. - -## Acceptance criteria - -- [ ] `(reset! db (sorted-set 3 1 2))` then `@db` seqs as `(1 2 3)`. -- [ ] `(contains? @db 2)` works; `(get @db 2)` returns the member. -- [ ] `(swap! db conj 5)` and `(swap! db disj 1)` keep order; adding a duplicate is a no-op and does not change count. -- [ ] `(count @db)` is correct and O(1). -- [ ] `(sorted? @db)` is `true`; `(subseq @db >= 2)`, `(rsubseq ...)`, `(nth @db 0)`, `(rseq @db)` all match the plain-Clojure `sorted-set` oracle. -- [ ] String, keyword, long, double, and inst/date members each iterate in correct natural order. -- [ ] Read-only `conj`/`disj` (outside a transaction) returns a plain Clojure sorted set, not an `XITDB*` type. -- [ ] `(materialize @db)` returns a plain `sorted-set` with matching order. -- [ ] A sorted set nests inside other structures and round-trips. - -## Blocked by - -- [Issue 2: `clojure.lang.Sorted` for the sorted map](02-sorted-protocol-map.md) -- [Issue 3: Numeric & temporal key codec](03-numeric-temporal-key-codec.md) diff --git a/doc/issues/05-rank-and-pagination.md b/doc/issues/05-rank-and-pagination.md deleted file mode 100644 index bbeb7c2..0000000 --- a/doc/issues/05-rank-and-pagination.md +++ /dev/null @@ -1,40 +0,0 @@ -# Issue 5: `rank` + pagination public helpers - -Type: AFK -Status: ready-for-agent - -## Parent - -[Sorted Map & Sorted Set PRD](../sorted-map-prd.md) - -## What to build - -Expose the rank-augmented B-tree "superpowers" that go beyond `clojure.core`'s -in-memory sorted collections, as a small public surface usable on both -`XITDBSortedMap` and `XITDBSortedSet`. - -- **`rank`** — given a key/member, return the number of entries strictly less - than it (its index), in O(log n). Backed by `ReadSortedMap.rank` / - `ReadSortedSet.rank`. It is the inverse of indexed `nth`. -- **Pagination helper** — an offset/limit (or "from index N, take K") accessor - backed by `ReadSortedMap.iteratorFromIndex` / `iteratorFromIndex`, returning a - lazy ordered seq starting at a rank. This makes serving ordered, paged queries - from disk efficient (the motivating secondary-index use case in the PRD). - -Place these in a stable namespace (e.g. `xitdb.sorted` or extend `xitdb.db`) -and document them as the recommended way to build/paginate on-disk secondary -indexes. - -## Acceptance criteria - -- [ ] `(rank m k)` returns the correct index for present keys/members, and the would-be insertion index for absent ones, in O(log n). -- [ ] `rank` and indexed `nth` are inverses: `(= i (rank m (key (nth m i))))` for all `i`. -- [ ] The pagination helper returns the correct ordered page for a given offset/limit and stops at the end of the collection. -- [ ] Both helpers work on `XITDBSortedMap` and `XITDBSortedSet`. -- [ ] Pagination is lazy and does not materialise the whole collection. -- [ ] A doc example shows building a timestamp→id secondary index and paging through it. - -## Blocked by - -- [Issue 1: Walking skeleton — string/keyword-keyed sorted map](01-walking-skeleton-sorted-map.md) -- [Issue 4: Sorted set end-to-end](04-sorted-set.md) diff --git a/doc/issues/README.md b/doc/issues/README.md deleted file mode 100644 index 59af80c..0000000 --- a/doc/issues/README.md +++ /dev/null @@ -1,17 +0,0 @@ -# Sorted Map & Sorted Set — implementation issues - -Tracer-bullet slices for [the PRD](../sorted-map-prd.md). Each is a thin vertical -slice through every layer (codec → construction detection → read dispatch → -wrapper type → tests) and is independently verifiable. All are AFK. - -| # | Slice | Blocked by | -|---|-------|-----------| -| [1](01-walking-skeleton-sorted-map.md) | Walking skeleton: string/keyword-keyed sorted map (read + write) | — | -| [2](02-sorted-protocol-map.md) | `clojure.lang.Sorted` for the map — subseq/rsubseq/rseq/nth/sorted? | 1 | -| [3](03-numeric-temporal-key-codec.md) | Numeric & temporal key codec — long, double, inst/date | 1 | -| [4](04-sorted-set.md) | Sorted set end-to-end (`XITDBSortedSet`/`XITDBWriteSortedSet`) | 2, 3 | -| [5](05-rank-and-pagination.md) | `rank` + pagination public helpers | 1, 4 | - -## Suggested order - -1 → (2 and 3 in parallel) → 4 → 5 diff --git a/doc/sorted-map-prd.md b/doc/sorted-map-prd.md deleted file mode 100644 index 3401781..0000000 --- a/doc/sorted-map-prd.md +++ /dev/null @@ -1,331 +0,0 @@ -# PRD: Sorted Map & Sorted Set support for xitdb-clj - -Status: ready-for-implementation -Date: 2026-06-23 - -## Problem Statement - -As a user of xitdb-clj, I can persist hash maps, hash sets, array lists and -linked lists, but I have no way to keep keys (or set members) **in order** on -disk. When I need range queries, ordered iteration, pagination, or "the entry -at position N", I have to load the whole collection into memory and sort it in -Clojure on every read. That defeats the point of an embedded, immutable, -on-disk database — and it does not scale to large collections or secondary -indexes (e.g. "all posts created between T1 and T2, page 3"). - -The upstream Java library (`io.github.radarroark.xitdb`) now ships a -rank-augmented B-tree exposed as `SortedMap` and `SortedSet`. xitdb-clj has no -Clojure types that wrap them, so none of this capability is reachable from -Clojure today. - -## Solution - -Add two new pairs of wrapper types — `XITDBSortedMap` / `XITDBWriteSortedMap` -and `XITDBSortedSet` / `XITDBWriteSortedSet` — that wrap the Java -`ReadSortedMap`/`WriteSortedMap` and `ReadSortedSet`/`WriteSortedSet`. These -behave like first-class Clojure sorted collections: they implement -`clojure.lang.Sorted`, so `sorted?`, `subseq`, `rsubseq`, `seq`, and `rseq` -work out of the box, and they additionally implement `clojure.lang.Indexed` -(O(log n) `nth` by rank) and `clojure.lang.Reversible`. - -Construction is fully idiomatic and requires **no new public API**: when a value -written to the database is a `clojure.lang.PersistentTreeMap` (i.e. a -`sorted-map`) or a `clojure.lang.PersistentTreeSet` (a `sorted-set`), xitdb-clj -persists it as an on-disk `SORTED_MAP` / `SORTED_SET`. Reading it back returns -the corresponding `XITDBSorted*` type. So: - -```clojure -(reset! db (sorted-map 3 :c 1 :a 2 :b)) -(subseq @db >= 2) ;; => ([2 :b] [3 :c]) -(nth @db 0) ;; => [1 :a] ; O(log n), not O(n) -(rseq @db) ;; => ([3 :c] [2 :b] [1 :a]) -(sorted? @db) ;; => true -``` - -The single honest limitation: ordering is the engine's fixed natural ordering -(produced by an order-preserving key codec). **Custom comparators -(`sorted-map-by` / `sorted-set-by`) are not supported** — the comparison lives -in the Java B-tree as unsigned byte comparison, not in a pluggable Clojure fn. - -## User Stories - -1. As a developer, I want to write a `(sorted-map ...)` to the db, so that it is - persisted as an ordered on-disk structure without me learning a new API. -2. As a developer, I want to write a `(sorted-set ...)` to the db, so that set - members are kept in sorted order on disk. -3. As a developer, I want `(sorted? db-value)` to return `true` for a persisted - sorted map/set, so that generic code can detect orderedness. -4. As a developer, I want to call `(subseq m >= k)`, `(subseq m > k)`, - `(subseq m <= k)`, `(subseq m < k)` and the two-bound form, so that I can run - ascending range queries directly against disk. -5. As a developer, I want `(rsubseq m ...)` with the same test/bound forms, so - that I can run descending range queries. -6. As a developer, I want `(seq m)` to iterate entries in ascending key order, so - that ordered traversal is the default. -7. As a developer, I want `(rseq m)` to iterate entries in descending key order, - so that I can walk from the largest key down. -8. As a developer, I want `(nth m i)` to return the entry at rank `i` in - O(log n), so that positional access and pagination are cheap even for large - maps. -9. As a developer, I want `(nth m -1)`/negative indexing semantics surfaced via a - helper, so that I can get the last/last-k entries without counting. -10. As a developer, I want `(get m k)` / `(m k)` / `(:k m)` lookups, so that a - sorted map is a drop-in associative read. -11. As a developer, I want `(contains? m k)` and `(find m k)`, so that presence - checks and entry retrieval work like any map. -12. As a developer, I want `(count m)` to be O(1), so that size checks are cheap. -13. As a developer, I want to `(swap! db assoc k v)` a sorted map inside a - transaction, so that inserts keep the structure ordered and persistent. -14. As a developer, I want to `(swap! db dissoc k)` a sorted map, so that I can - remove keys while preserving order. -15. As a developer, I want `(swap! db conj v)` / `(swap! db disj v)` on a sorted - set, so that membership edits preserve order. -16. As a developer, I want re-assoc'ing an existing key to replace the value and - not change the count, so that updates behave like a normal map. -17. As a developer, I want string and keyword keys to sort in their natural - (code-point) order, so that text indexes read correctly. -18. As a developer, I want long/integer keys to sort numerically (so `9 < 10`, - and negatives before positives), so that numeric indexes behave intuitively. -19. As a developer, I want double keys to sort numerically, so that floating - point indexes are ordered correctly. -20. As a developer, I want `java.time.Instant` and `java.util.Date` keys to sort - chronologically, so that I can build time-ordered secondary indexes. -21. As a developer, I want keys to round-trip to their exact original Clojure - type on read, so that `(keys m)` and entry keys are not stringly-typed. -22. As a developer, I want to build a timestamp→id secondary index and paginate - it (offset/limit) efficiently, so that I can serve ordered, paged queries - from disk. -23. As a developer, I want a `rank` operation ("how many keys are strictly less - than k") in O(log n), so that I can compute a key's position / build - pagination cursors. -24. As a developer, I want sorted maps/sets to nest inside other structures - (e.g. a hash map whose value is a sorted map), so that I can model rich - documents. -25. As a developer, I want a sorted map value to nest arbitrary values (vectors, - maps, sets) as its values, so that the value side is as flexible as a hash - map's. -26. As a developer, I want `(materialize sorted-map-value)` to return a plain - Clojure `sorted-map` with the same ordering, so that I can fully realise it - in memory. -27. As a developer, I want `(empty sorted-map)` semantics to produce an empty - ordered structure, so that clearing works in a transaction. -28. As a developer, I want `=` / `equiv` to compare a persisted sorted map to a - plain Clojure map by contents, so that test assertions read naturally. -29. As a developer reading the result of `assoc`/`dissoc` on a **read-only** - sorted map (outside a transaction), I want a plain Clojure sorted collection - back, so that the immutable-read contract matches the existing hash - map/set types. -30. As a developer, I want a clear, early error if I try to persist a - `sorted-map-by`/`sorted-set-by` with a custom comparator, so that I am not - silently given a different ordering. -31. As a developer, I want a clear error if I use an unsupported key type, so - that I fail fast instead of getting corrupt ordering. -32. As a developer, I want the print representation of a persisted sorted map/set - to be distinguishable (e.g. `#XITDBSortedMap{...}`) and ordered, so that REPL - output is legible. -33. As a developer using multiple threads, I want sorted-map reads to work from - reader threads like the other types, so that concurrency behaves consistently. - -## Implementation Decisions - -### Construction trigger (no new public API) - -- `conversion/v->slot!` and the nested writers (`coll->ArrayListCursor!`, - `map->WriteHashMapCursor!`, etc.) gain branches that detect - `clojure.lang.PersistentTreeMap` and `clojure.lang.PersistentTreeSet` **before** - the generic `map?` / `set?` branches (a tree map is also a `map?`, so order of - checks matters). These write `SORTED_MAP` / `SORTED_SET` respectively. -- If the detected tree map/set carries a **non-default comparator**, throw - `IllegalArgumentException` ("custom comparators are not supported; sorted - collections use natural ordering"). Detection: compare `.comparator` against - `clojure.lang.RT/DEFAULT_COMPARATOR` / `compare`. - -### Read dispatch - -- `xitdb-types/read-from-cursor` gains `SORTED_MAP` and `SORTED_SET` cases that - return `XITDBSortedMap`/`XITDBSortedSet` (read) or the `Write` variants when - `for-writing?` is true — mirroring the existing `HASH_MAP`/`HASH_SET` cases. - -### New module: key codec (deep module — the heart of this PRD) - -A new namespace (e.g. `xitdb.util.sorted-key`) provides a bijective, -**order-preserving** encoding between supported Clojure key values and `byte[]`, -such that `Arrays.compareUnsigned(encode(a), encode(b))` has the same sign as the -natural ordering of `a` and `b`. - -- Interface (small, stable): - - `encode-key ^bytes [k]` — Clojure key → order-preserving bytes. - - `decode-key [^bytes b]` — bytes → original Clojure key (exact type). -- Each encoding is prefixed with a 1-byte **type tag** that also defines a stable - cross-type ordering, so heterogeneous keys never throw (a strict improvement - over Clojure's `compare`, which throws across classes). Even though v1's - primary contract is single-type maps, the tag makes the encoding total. -- Supported key types for v1 (per decision): - - **String** → tag + UTF-8 bytes. UTF-8 byte order equals Unicode code-point - order, so no transformation needed. - - **Keyword** → tag + UTF-8 of the (namespace-qualified) name. Reuses - `conversion/keyname`. - - **Long / integer** → tag + 8-byte big-endian with the **sign bit flipped** - (XOR `0x80` on the top byte), which makes signed integers sort correctly as - unsigned bytes (negatives < positives, ascending within each). This is the - same big-endian-with-flipped-sign technique used in the Java library's own - `testSortedMap` example for a creation-time index. - - **Double** → tag + IEEE-754 8-byte big-endian with the order-preserving bit - flip (if sign bit set, flip all bits; else flip only the sign bit). Handles - negative/positive ordering. (NaN handling: documented as undefined / rejected.) - - **Instant / Date** → tag + big-endian epoch encoding (e.g. epoch-second + - nanos, or epoch-milli) so chronological order = byte order. `Date` decodes - back to `Date`, `Instant` back to `Instant` (distinct tags). -- The codec is the single place ordering correctness lives; it is pure - (no DB handle needed) and unit-testable in isolation. - -> Note: this is intentionally **separate** from the existing -> `conversion/db-key-hash`, which SHA-1-hashes keys for hash maps. Hashing -> destroys order and identity; sorted keys must be stored as their -> order-preserving bytes and recovered via the key cursor. - -### New module: sorted-map operations - -A namespace parallel to `xitdb.util.operations` (e.g. `xitdb.util.sorted-operations`) -holding the imperative bridge between the wrapper types and the Java API: - -- `sorted-map-assoc-value!` — `encode-key` then `WriteSortedMap.putCursor(key)` - + write value slot via `conversion/v->slot!`. -- `sorted-map-dissoc-key!` — `WriteSortedMap.remove(encoded)`. -- `sorted-map-get-cursor` — `ReadSortedMap.getCursor(encoded)` (nil when absent). -- `sorted-map-contains?` — non-nil cursor / `getKeyValuePair`. -- `sorted-map-count` — `ReadSortedMap.count()` (O(1)). -- `sorted-map-rank` — `ReadSortedMap.rank(encoded)`. -- `sorted-map-nth` — `ReadSortedMap.getIndexKeyValuePair(i)` → MapEntry - (decode key, read value); supports negative indices per Java semantics. -- `sorted-map-seq` — lazy seq of `MapEntry` from `iterator()` (ascending), - decoding keys. -- `sorted-map-seq-from` — ascending lazy seq from `iteratorFrom(encoded)`. -- `sorted-map-rseq` / descending-from — built on `rank` + descending - `getIndexKeyValuePair` walk (no native reverse iterator exists; this is the - agreed implementation strategy). Lazy and low-memory. -- Set variants (`sorted-set-*`) over `ReadSortedSet`/`WriteSortedSet` - (`put`/`remove`/`contains`/`rank`/`getIndexKeyValuePair`/iterators); members - are decoded keys, no value side. - -### New module: the wrapper types - -`xitdb.sorted-map` and `xitdb.sorted-set`, modelled on `xitdb.hash-map` / -`xitdb.hash-set`. - -- `XITDBSortedMap` (read) implements: - - `clojure.lang.ILookup`, `Associative`, `IPersistentMap`, - `IPersistentCollection`, `Counted`, `Seqable`, `IFn`, `Iterable`, - `IKVReduce`, plus `common/ISlot` / `IUnwrap` / `IMaterialize` / - `IMaterializeShallow`. - - `clojure.lang.Sorted`: `comparator`, `entryKey`, `seq(ascending?)`, - `seqFrom(k, ascending?)` — this is what powers `subseq`/`rsubseq`/`sorted?`. - - `clojure.lang.Indexed`: `nth` → rank-based `getIndexKeyValuePair`. - - `clojure.lang.Reversible`: `rseq`. - - Read-only `assoc`/`dissoc`/`cons` materialise-shallow then operate (return a - plain Clojure `sorted-map`), matching `XITDBHashMap` behaviour. -- `XITDBWriteSortedMap` (write) implements the mutating `assoc`/`without`/`cons`/ - `empty` against the live `WriteSortedMap`, plus `IReadOnly`, mirroring - `XITDBWriteHashMap`. -- `XITDBSortedSet` / `XITDBWriteSortedSet` analogously implement - `IPersistentSet` + `Sorted` + `Indexed` + `Reversible`. -- `print-method` registered for each, ordered output, distinct tags - (`#XITDBSortedMap` / `#XITDBSortedSet`). - -### `clojure.lang.Sorted` contract mapping (the load-bearing detail) - -- `seqFrom(k, true)` → `iteratorFrom(encode k)` (native O(log n) lower bound). -- `seqFrom(k, false)` → `rank(encode k)` then descending index walk. -- `seq(true)` → `iterator()`; `seq(false)` → descending index walk. -- `comparator` → a comparator consistent with the codec's natural ordering - (so `subseq`'s own bound checks agree with the engine). -- `entryKey` → `key` of the MapEntry (map) / identity (set). - -### Public surface for the "superpowers" - -Expose, from a stable namespace (e.g. `xitdb.db` or a new `xitdb.sorted`): -- `rank` — key → index, O(log n). -- (Optional) a `nth`/`get-by-index` convenience already covered by `Indexed`. -- (Optional) a paginate helper built on `iteratorFromIndex`. - -## Testing Decisions - -Good tests here verify **external behavior** — what a user observes through the -Clojure collection API — not the internal byte layout of the B-tree or the -private shape of the operations namespace. Assertions compare against plain -Clojure `sorted-map`/`sorted-set` built with the same data, which is the -ground-truth oracle for ordering. - -Prior art to mirror: -- `test/xitdb/set_test.clj` and `test/xitdb/map_test.clj` — `with-db` fixture, - `reset!`/`swap!`, `(tu/db-equal-to-atom? db)` round-trip checks, - read-only-return-type assertions. -- `test/xitdb/data_types_test.clj` — per-key-type round-tripping. -- `test/xitdb/generated_data_test.clj` / `gen_map.clj` — generative coverage. - -Modules and what to test: - -1. **Key codec (`xitdb.util.sorted-key`)** — the priority for unit tests, in - isolation, no DB: - - Round-trip: `(= k (decode-key (encode-key k)))` for each supported type. - - **Order preservation (property-based)**: for random pairs `a`, `b` of the - same type, `sign(compareUnsigned(encode a, encode b)) = sign(compare a b)`. - Cover negatives, zero, large longs, `Long/MIN_VALUE`/`MAX_VALUE`, negative - and positive doubles, sub-second instants. - - Cross-type total ordering is stable (no exceptions across types). - - Unsupported type / custom-comparator → throws. -2. **Sorted map/set integration tests** (with `with-db`), mirroring the Java - `testSortedMap` scenario and the existing set/map tests: - - Build from `(sorted-map ...)` / `(sorted-set ...)`; `@db` equals the plain - sorted collection (order-sensitive comparison). - - `subseq`/`rsubseq` for all six test/bound forms vs. the plain-Clojure oracle. - - `seq` ascending, `rseq` descending, `nth` (including negative), `count` O(1). - - `assoc`/`dissoc` in a `swap!` keep order; re-assoc replaces without changing - count; `disj`/`conj` on the set. - - Key-type matrix: string, keyword, long, double, inst/date keys each iterate - in correct natural order. - - `sorted?` is true; `materialize` returns a plain `sorted-map` with matching - order; read-only `assoc`/`dissoc` returns a plain Clojure sorted collection. - - `rank` returns correct positions and is the inverse of `nth`. - - Empty / none-cursor cases: `subseq` and iteration on an empty sorted map - yield nothing. - - Nesting: sorted map as a value inside a hash map, and rich values inside a - sorted map. -3. **Multi-threaded read** — a light check that reader threads can `subseq`/read a - sorted map, consistent with `multi_threaded_test.clj`. - -Generative tests (`test.check`) are the recommended vehicle for the codec -ordering property and for "insert a random key set, assert iteration order == -`(sort ...)`". - -## Out of Scope - -- **Custom comparators** (`sorted-map-by` / `sorted-set-by`). The engine's order - is fixed; custom comparators are rejected with a clear error, not supported. -- Key types beyond strings, keywords, longs, doubles, `Instant`/`Date` in v1 - (e.g. booleans, `nil`, `BigInteger`/`BigDecimal`, vectors/tuples as keys, - `ratio`). These can be added later by extending the codec's tag table. -- A native streaming **reverse iterator** in the Java layer — descending is - implemented via rank + index walk in Clojure; we do not modify the Java lib. -- Changing how hash maps/sets are stored or their key hashing. -- A bespoke public constructor API (decision: reuse `sorted-map`/`sorted-set`). - -## Further Notes - -- **Why the codec is the risk center**: every ordering guarantee in the feature - reduces to "does `encode-key` preserve order". It is pure and isolated - specifically so it can be proven correct independently before the wrapper types - are trusted. De-risking the codec first (round-trip + property tests) is the - recommended build order. -- **Headline win over `clojure.core`**: `nth`/positional access and `rank` are - O(log n) here, whereas Clojure's in-memory `sorted-map` is O(n) for positional - access. Combined with `iteratorFromIndex`, this makes `XITDBSortedMap` an - excellent fit for **on-disk secondary indexes** with efficient pagination — the - motivating use case demonstrated in the Java library's own tests. -- **Check ordering of type checks** in `v->slot!`: `PersistentTreeMap` satisfies - `map?` and `PersistentTreeSet` satisfies `set?`, so the sorted branches must be - evaluated first or the generic hash branches will shadow them. -- **UTF-8 ordering caveat**: UTF-8 byte order matches Unicode code-point order, - which matches Clojure `compare` on strings for the entire BMP and beyond except - for the surrogate-pair region edge cases; this is acceptable and should be - noted in docs. ASCII keys (the common case) are exact. From 43b71492ab04c67279125791f6abb1d8ec09299d Mon Sep 17 00:00:00 2001 From: Florin Braghis Date: Wed, 24 Jun 2026 18:09:20 +0200 Subject: [PATCH 08/15] Fix sorted-collection reviewer issues: keyword codec, heterogeneous keys, cursor writes, negative offsets - Keyword keys now encode with a namespace-presence flag + ns/name layout so byte order matches Clojure's default comparator (non-namespaced before namespaced) and (keyword nil "a/b") no longer collides with :a/b. - Materialization/printing/empty rebuild with (sorted-map-by key-comparator) / (sorted-set-by key-comparator) so heterogeneous key types don't throw. - write-cursor-for-key handles Tag/SORTED_MAP so keypath writes can traverse into sorted maps; xit-tag->keyword maps the sorted tags. - from-index/page reject negative ranks eagerly with IllegalArgumentException. Co-Authored-By: Claude Opus 4.8 --- src/xitdb/sorted.clj | 3 ++ src/xitdb/sorted_map.clj | 12 ++--- src/xitdb/sorted_set.clj | 12 ++--- src/xitdb/util/conversion.clj | 8 ++++ src/xitdb/util/sorted_key.clj | 67 ++++++++++++++++++++++----- test/xitdb/cursor_test.clj | 22 +++++++++ test/xitdb/sorted_key_test.clj | 25 ++++++++++ test/xitdb/sorted_map_test.clj | 29 ++++++++++++ test/xitdb/sorted_pagination_test.clj | 13 ++++++ test/xitdb/sorted_set_test.clj | 14 ++++++ 10 files changed, 182 insertions(+), 23 deletions(-) diff --git a/src/xitdb/sorted.clj b/src/xitdb/sorted.clj index 5a3ee19..923d9c6 100644 --- a/src/xitdb/sorted.clj +++ b/src/xitdb/sorted.clj @@ -51,6 +51,9 @@ the elements are `MapEntry` pairs; for a sorted set they are members. Returns nil when `n` is at or past the end." [coll n] + (when (neg? n) + (throw (IllegalArgumentException. + (str "from-index requires a non-negative rank, got: " n)))) (let [u (common/-unwrap coll)] (cond (instance? ReadSortedMap u) diff --git a/src/xitdb/sorted_map.clj b/src/xitdb/sorted_map.clj index 799b892..b27e338 100644 --- a/src/xitdb/sorted_map.clj +++ b/src/xitdb/sorted_map.clj @@ -64,7 +64,7 @@ (. clojure.lang.RT (conj (common/-materialize-shallow this) o))) (empty [this] - (sorted-map)) + (sorted-map-by sorted-key/key-comparator)) (equiv [this other] (and (instance? clojure.lang.IPersistentMap other) @@ -139,20 +139,20 @@ common/IMaterialize (-materialize [this] (reduce (fn [m [k v]] - (assoc m k (common/materialize v))) (sorted-map) (seq this))) + (assoc m k (common/materialize v))) (sorted-map-by sorted-key/key-comparator) (seq this))) common/IMaterializeShallow (-materialize-shallow [this] (reduce (fn [m [k v]] - (assoc m k v)) (sorted-map) (seq this))) + (assoc m k v)) (sorted-map-by sorted-key/key-comparator) (seq this))) Object (toString [this] - (str (into (sorted-map) this)))) + (str (into (sorted-map-by sorted-key/key-comparator) this)))) (defmethod print-method XITDBSortedMap [o ^java.io.Writer w] (.write w "#XITDBSortedMap") - (print-method (into (sorted-map) o) w)) + (print-method (into (sorted-map-by sorted-key/key-comparator) o) w)) ;--------------------------------------------------- @@ -239,7 +239,7 @@ (defmethod print-method XITDBWriteSortedMap [o ^java.io.Writer w] (.write w "#XITDBWriteSortedMap") - (print-method (into (sorted-map) (common/-read-only o)) w)) + (print-method (into (sorted-map-by sorted-key/key-comparator) (common/-read-only o)) w)) (defn xwrite-sorted-map [^WriteCursor write-cursor] (->XITDBWriteSortedMap (WriteSortedMap. write-cursor))) diff --git a/src/xitdb/sorted_set.clj b/src/xitdb/sorted_set.clj index 62eaf52..e69733a 100644 --- a/src/xitdb/sorted_set.clj +++ b/src/xitdb/sorted_set.clj @@ -42,7 +42,7 @@ (. clojure.lang.RT (conj (common/-materialize-shallow this) o))) (empty [this] - (sorted-set)) + (sorted-set-by sorted-key/key-comparator)) (equiv [this other] (and (instance? clojure.lang.IPersistentSet other) @@ -124,19 +124,19 @@ common/IMaterialize (-materialize [this] - (into (sorted-set) (map common/materialize (seq this)))) + (into (sorted-set-by sorted-key/key-comparator) (map common/materialize (seq this)))) common/IMaterializeShallow (-materialize-shallow [this] - (into (sorted-set) (seq this))) + (into (sorted-set-by sorted-key/key-comparator) (seq this))) Object (toString [this] - (str (into (sorted-set) this)))) + (str (into (sorted-set-by sorted-key/key-comparator) this)))) (defmethod print-method XITDBSortedSet [o ^java.io.Writer w] (.write w "#XITDBSortedSet") - (print-method (into (sorted-set) o) w)) + (print-method (into (sorted-set-by sorted-key/key-comparator) o) w)) ;--------------------------------------------------- @@ -201,7 +201,7 @@ (defmethod print-method XITDBWriteSortedSet [o ^java.io.Writer w] (.write w "#XITDBWriteSortedSet") - (print-method (into (sorted-set) (common/-read-only o)) w)) + (print-method (into (sorted-set-by sorted-key/key-comparator) (common/-read-only o)) w)) ;; Constructor functions (defn xwrite-sorted-set [^WriteCursor write-cursor] diff --git a/src/xitdb/util/conversion.clj b/src/xitdb/util/conversion.clj index d999037..1c1acce 100644 --- a/src/xitdb/util/conversion.clj +++ b/src/xitdb/util/conversion.clj @@ -20,6 +20,8 @@ (= tag Tag/ARRAY_LIST) :array-list (= tag Tag/LINKED_ARRAY_LIST) :linked-array-list (= tag Tag/HASH_MAP) :hash-map + (= tag Tag/SORTED_MAP) :sorted-map + (= tag Tag/SORTED_SET) :sorted-set (= tag Tag/KV_PAIR) :kv-pair (= tag Tag/BYTES) :bytes (= tag Tag/SHORT_BYTES) :short-bytes @@ -390,6 +392,12 @@ (= value-tag Tag/COUNTED_HASH_MAP) (map-write-cursor (WriteCountedHashMap. cursor) current-key) + ;; Sorted maps store the real key bytes (order-preserving codec), so a + ;; keypath write resolves a value cursor by the encoded key, mirroring the + ;; read-side dispatch in `read-from-cursor`. + (= value-tag Tag/SORTED_MAP) + (.putCursor (WriteSortedMap. cursor) (sorted-key/encode-key current-key)) + (= value-tag Tag/HASH_SET) (set-write-cursor (WriteHashSet. cursor) current-key) diff --git a/src/xitdb/util/sorted_key.clj b/src/xitdb/util/sorted_key.clj index c8d5288..320a44a 100644 --- a/src/xitdb/util/sorted_key.clj +++ b/src/xitdb/util/sorted_key.clj @@ -14,9 +14,11 @@ type on decode and establishes a total order across types, so heterogeneous keys never throw. - This namespace currently implements string and keyword keys (UTF-8 bytes, - which already sort in code-point order). Numeric/temporal keys are added in a - later slice." + Supported key types: string, keyword, long, double, Instant and Date. Strings + encode as their UTF-8 bytes (already code-point ordered); keywords use a flag + + namespace + name layout so they sort like Clojure's default comparator (see + `keyword->bytes`); numeric/temporal keys use order-preserving big-endian + encodings." (:import [java.io ByteArrayOutputStream] [java.nio ByteBuffer] @@ -107,12 +109,40 @@ (defn- ^Date bytes->date [^bytes ba off] (Date. (bit-xor (.getLong (ByteBuffer/wrap ba (int off) 8)) Long/MIN_VALUE))) -(defn ^String keyname - "String form of a keyword key, namespace-qualified when present." - [k] - (if (namespace k) - (str (namespace k) "/" (name k)) - (name k))) +;; Keyword presence-of-namespace flag (the first body byte). 0 sorts before 1, +;; so non-namespaced keywords sort before namespaced ones, matching Clojure's +;; default comparator (clojure.lang.Symbol.compareTo). +(def ^:const kw-no-ns (int 0x00)) +(def ^:const kw-has-ns (int 0x01)) + +(defn- keyword->bytes + "Order-preserving, collision-free encoding of a keyword, matching Clojure's + default comparator: non-namespaced keywords sort before namespaced ones, then + by namespace, then by name. + + Layout (after the type tag): a flag byte, then the parts. + - no namespace : `kw-no-ns` ++ name-utf8 + - namespaced : `kw-has-ns` ++ ns-utf8 ++ 0x00 ++ name-utf8 + + The 0x00 separator can never appear inside UTF-8 keyword text (NUL is not a + legal keyword character), so it sorts below every namespace byte and cleanly + delimits namespace from name. The flag byte also keeps `(keyword nil \"a/b\")` + (no namespace, name \"a/b\") distinct from `:a/b` (namespace \"a\", name + \"b\"), which would otherwise both flatten to \"a/b\"." + ^bytes [k] + (let [out (ByteArrayOutputStream.) + ns (namespace k) + nm ^bytes (utf8 (name k))] + (if ns + (let [nsb ^bytes (utf8 ns)] + (.write out kw-has-ns) + (.write out nsb 0 (alength nsb)) + (.write out (int 0x00)) + (.write out nm 0 (alength nm))) + (do + (.write out kw-no-ns) + (.write out nm 0 (alength nm)))) + (.toByteArray out))) (defn encode-key "Encodes Clojure key `k` to an order-preserving, reversible byte array." @@ -122,7 +152,7 @@ (tagged tag-string (utf8 k)) (keyword? k) - (tagged tag-keyword (utf8 (keyname k))) + (tagged tag-keyword (keyword->bytes k)) (integer? k) (tagged tag-long (long->bytes (long k))) @@ -147,6 +177,21 @@ (defn- ^String utf8-body [^bytes ba] (String. ba 1 (dec (alength ba)) StandardCharsets/UTF_8)) +(defn- decode-keyword + "Inverse of `keyword->bytes`. `ba[0]` is the type tag, `ba[1]` is the + namespace-presence flag, the remainder is the part(s)." + [^bytes ba] + (let [flag (bit-and (int (aget ba 1)) 0xff)] + (if (= flag kw-no-ns) + ;; Use the 2-arg form with a nil namespace so a name containing \"/\" + ;; is not re-parsed into a namespace (which would corrupt the key). + (keyword nil (String. ba 2 (- (alength ba) 2) StandardCharsets/UTF_8)) + (let [sep (loop [i 2] + (if (zero? (aget ba i)) i (recur (inc i)))) + ns (String. ba 2 (- sep 2) StandardCharsets/UTF_8) + nm (String. ba (inc sep) (- (alength ba) (inc sep)) StandardCharsets/UTF_8)] + (keyword ns nm))))) + (def key-comparator "A `java.util.Comparator` consistent with the engine's natural ordering: compares two keys by `Arrays.compareUnsigned` over their encoded bytes. Use @@ -162,7 +207,7 @@ (let [tag (bit-and (int (aget ba 0)) 0xff)] (condp = tag tag-string (utf8-body ba) - tag-keyword (keyword (utf8-body ba)) + tag-keyword (decode-keyword ba) tag-long (bytes->long ba 1) tag-double (bytes->double ba 1) tag-instant (bytes->instant ba 1) diff --git a/test/xitdb/cursor_test.clj b/test/xitdb/cursor_test.clj index acbe375..7677819 100644 --- a/test/xitdb/cursor_test.clj +++ b/test/xitdb/cursor_test.clj @@ -28,3 +28,25 @@ (testing "Correctly handles invalid cursor path" (is (thrown? IndexOutOfBoundsException @(xdb/xdb-cursor db [:foo :bar 999]))))))) + +(deftest cursor-into-sorted-map + (with-open [db (xdb/xit-db :memory)] + (reset! db {:idx (sorted-map 1 {:name "a"} 2 {:name "b"})}) + (let [c (xdb/xdb-cursor db [:idx 1 :name])] + (testing "read through a sorted-map key" + (is (= "a" @c))) + (testing "reset! through a sorted-map key writes back to the db" + (reset! c "A") + (is (= "A" @c)) + (is (= "A" (get-in (xdb/materialize @db) [:idx 1 :name]))) + (is (= "b" (get-in (xdb/materialize @db) [:idx 2 :name]))))))) + +(deftest cursor-into-sorted-set + (with-open [db (xdb/xit-db :memory)] + (reset! db {:tags (sorted-set "a" "b" "c")}) + (let [c (xdb/xdb-cursor db [:tags])] + (testing "read a sorted set through a cursor" + (is (= ["a" "b" "c"] (seq (xdb/materialize @c))))) + (testing "swap! mutates the sorted set at the cursor" + (swap! c conj "d") + (is (= ["a" "b" "c" "d"] (seq (xdb/materialize @c)))))))) diff --git a/test/xitdb/sorted_key_test.clj b/test/xitdb/sorted_key_test.clj index 44e8413..ea1caf2 100644 --- a/test/xitdb/sorted_key_test.clj +++ b/test/xitdb/sorted_key_test.clj @@ -19,6 +19,31 @@ (doseq [k [:a :foo/bar :a-much-longer-keyword]] (is (= k (sk/decode-key (sk/encode-key k))))))) +(deftest keyword-order-matches-clojure + (testing "byte order matches Clojure's default keyword comparator: + non-namespaced keywords sort before namespaced ones" + (doseq [[a b] [[:a :aa] [:aa :b] + ;; every non-namespaced keyword sorts before any namespaced + [:b :a/a] [:zzz :a/a] + ;; among namespaced: by namespace then name + [:a/a :a/b] [:a/x :ab/a] [:a/b :b/a]]] + (is (neg? (cmp-unsigned (sk/encode-key a) (sk/encode-key b))) + (str a " < " b)) + ;; and consistent with clojure.core/compare on keywords + (is (= (Integer/signum (compare a b)) + (Integer/signum (cmp-unsigned (sk/encode-key a) (sk/encode-key b)))) + (str "order-agrees " a " " b))))) + +(deftest keyword-namespace-no-collision + (testing "(keyword nil \"a/b\") and :a/b are distinct keys that both round-trip" + (let [k1 (keyword nil "a/b") ;; no namespace, name contains a slash + k2 :a/b] ;; namespace \"a\", name \"b\" + (is (not= k1 k2)) + (is (= k1 (sk/decode-key (sk/encode-key k1)))) + (is (= k2 (sk/decode-key (sk/encode-key k2)))) + (is (not (java.util.Arrays/equals (sk/encode-key k1) (sk/encode-key k2))) + "encodings must differ so the keys do not collide on disk")))) + (deftest string-order-preserved (testing "byte order matches code-point order for strings" (doseq [[a b] [["a" "b"] ["a" "ab"] ["abc" "abd"] ["" "a"] ["k0009" "k0010"]]] diff --git a/test/xitdb/sorted_map_test.clj b/test/xitdb/sorted_map_test.clj index c0b6a06..b5491b2 100644 --- a/test/xitdb/sorted_map_test.clj +++ b/test/xitdb/sorted_map_test.clj @@ -45,6 +45,35 @@ (is (every? keyword? (map key (seq @db)))) (is (= 1 (get @db :apple)))))) +(deftest namespaced-keyword-keys-match-clojure-order + (with-open [db (xdb/xit-db :memory)] + (let [oracle (sorted-map :b 2 :a/a 3 :a 1 :aa 4)] + (reset! db oracle) + (testing "namespaced keywords sort like Clojure's default comparator + (non-namespaced before namespaced), not as flattened strings" + (is (= (keys oracle) (map key (seq @db)))) + (is (= [:a :aa :b :a/a] (map key (seq @db))))) + (testing "subseq agrees with the Clojure oracle" + (is (= (vec (subseq oracle >= :aa)) + (vec (subseq @db >= :aa))))) + (testing "values round-trip under namespaced keys" + (is (= 3 (get @db :a/a))))))) + +(deftest heterogeneous-keys-materialize-and-print + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map)) + (swap! db assoc 1 :one) + (swap! db assoc "x" :ex) + (testing "seq works with mixed key types" + (is (= [1 "x"] (map key (seq @db))))) + (testing "materialize does not throw on mixed key types" + (let [m (tu/materialize @db)] + (is (sorted? m)) + (is (= [1 "x"] (keys m))) + (is (= {1 :one "x" :ex} (into {} m))))) + (testing "pr-str does not throw on mixed key types" + (is (string? (pr-str @db)))))) + (deftest materialize-returns-plain-sorted-map (with-open [db (xdb/xit-db :memory)] (reset! db (sorted-map "b" 2 "a" 1)) diff --git a/test/xitdb/sorted_pagination_test.clj b/test/xitdb/sorted_pagination_test.clj index 5d050e6..b9d7ee9 100644 --- a/test/xitdb/sorted_pagination_test.clj +++ b/test/xitdb/sorted_pagination_test.clj @@ -77,6 +77,19 @@ (testing "from-index streams from a rank to the end" (is (= (subvec ov 17 20) (xsorted/from-index s 17)))))))) +(deftest negative-offset-rejected + (testing "from-index/page reject a negative rank eagerly, not on realisation" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-map) (map vector (range 10) (range)))) + (let [m @db] + (is (thrown? IllegalArgumentException (xsorted/from-index m -1))) + (is (thrown? IllegalArgumentException (xsorted/page m -1 5))))) + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-set) (range 10))) + (let [s @db] + (is (thrown? IllegalArgumentException (xsorted/from-index s -1))) + (is (thrown? IllegalArgumentException (xsorted/page s -1 5))))))) + (deftest pagination-is-lazy (testing "from-index returns a lazy seq and does not realise a large collection" (with-open [db (xdb/xit-db :memory)] diff --git a/test/xitdb/sorted_set_test.clj b/test/xitdb/sorted_set_test.clj index 5d71e47..0a5a3a3 100644 --- a/test/xitdb/sorted_set_test.clj +++ b/test/xitdb/sorted_set_test.clj @@ -52,6 +52,20 @@ (is (= [1 2 3] (seq s))) (is (= #{1 2 3} s))))) +(deftest heterogeneous-members-materialize-and-print + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set)) + (swap! db conj 1) + (swap! db conj "x") + (testing "seq works with mixed member types" + (is (= [1 "x"] (seq @db)))) + (testing "materialize does not throw on mixed member types" + (let [s (tu/materialize @db)] + (is (sorted? s)) + (is (= [1 "x"] (seq s))))) + (testing "pr-str does not throw on mixed member types" + (is (string? (pr-str @db)))))) + (deftest read-only-ops-return-plain-collections (with-open [db (xdb/xit-db :memory)] (reset! db (sorted-set 1 2)) From c88e6804e6bc1ce70f9f568ea058afc21ef61d3e Mon Sep 17 00:00:00 2001 From: Florin Braghis Date: Wed, 24 Jun 2026 18:38:47 +0200 Subject: [PATCH 09/15] Address follow-up review: writable materialized sorted colls + clear sorted-set member error MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - default-sorted-comparator? now also accepts sorted-key/key-comparator (the engine's own byte ordering), so a materialized sorted map/set — which materialize stamps with that comparator — can be written back into a db instead of being rejected as a custom comparator. - write-cursor-for-key now handles Tag/SORTED_SET with a clear, specific error: sorted-set members are immutable B-tree keys (the engine only exposes a writeable value slot, unused by sets), so there is no in-place member cursor. The message points to conj/disj on the set, replacing the confusing generic "Cannot get cursor ... tag ':sorted-set'". Co-Authored-By: Claude Opus 4.8 --- src/xitdb/util/conversion.clj | 23 +++++++++++++++++++---- test/xitdb/cursor_test.clj | 9 +++++++++ test/xitdb/sorted_map_test.clj | 20 ++++++++++++++++++++ test/xitdb/sorted_set_test.clj | 10 ++++++++++ 4 files changed, 58 insertions(+), 4 deletions(-) diff --git a/src/xitdb/util/conversion.clj b/src/xitdb/util/conversion.clj index 1c1acce..3b345c0 100644 --- a/src/xitdb/util/conversion.clj +++ b/src/xitdb/util/conversion.clj @@ -141,11 +141,15 @@ (declare ^WriteCursor sorted-set->WriteSortedSetCursor!) (defn default-sorted-comparator? - "True if `coll` (a PersistentTreeMap or PersistentTreeSet) uses Clojure's - natural ordering (no custom comparator). Custom comparators cannot be honoured - by the engine's fixed byte ordering." + "True if `coll` (a PersistentTreeMap or PersistentTreeSet) is ordered in a way + the engine can honour with its fixed unsigned byte ordering: either Clojure's + natural ordering (no custom comparator) or `sorted-key/key-comparator`, which + *is* that byte ordering and is what `materialize` stamps onto the collections + it rebuilds. Any other custom comparator is rejected." [coll] - (identical? clojure.lang.RT/DEFAULT_COMPARATOR (.comparator ^clojure.lang.Sorted coll))) + (let [cmp (.comparator ^clojure.lang.Sorted coll)] + (or (identical? clojure.lang.RT/DEFAULT_COMPARATOR cmp) + (identical? sorted-key/key-comparator cmp)))) (defn ^Slot v->slot! "Converts a value to a XitDB slot. @@ -401,6 +405,17 @@ (= value-tag Tag/HASH_SET) (set-write-cursor (WriteHashSet. cursor) current-key) + ;; A sorted-set member is stored as an immutable B-tree key (the engine + ;; only exposes a writeable value slot, which a set never uses), so there + ;; is no in-place "member cursor" to hand back the way a hash set has. + ;; Mutating membership goes through conj/disj on the set itself. + (= value-tag Tag/SORTED_SET) + (throw (IllegalArgumentException. + (format (str "Cannot get a write cursor to sorted-set member '%s': " + "sorted-set members are immutable keys. Use conj/disj " + "on the sorted set itself to change membership.") + current-key))) + (= value-tag Tag/COUNTED_HASH_SET) (set-write-cursor (WriteCountedHashSet. cursor) current-key) diff --git a/test/xitdb/cursor_test.clj b/test/xitdb/cursor_test.clj index 7677819..9473093 100644 --- a/test/xitdb/cursor_test.clj +++ b/test/xitdb/cursor_test.clj @@ -50,3 +50,12 @@ (testing "swap! mutates the sorted set at the cursor" (swap! c conj "d") (is (= ["a" "b" "c" "d"] (seq (xdb/materialize @c)))))))) + +(deftest cursor-into-sorted-set-member-is-rejected + (with-open [db (xdb/xit-db :memory)] + (reset! db {:tags (sorted-set "a" "b" "c")}) + (testing "writing into a sorted-set member throws a clear, specific error + (members are immutable keys; use conj/disj on the set itself)" + (let [c (xdb/xdb-cursor db [:tags "a"]) + ex (is (thrown? IllegalArgumentException (reset! c "z")))] + (is (re-find #"sorted-set member" (.getMessage ex))))))) diff --git a/test/xitdb/sorted_map_test.clj b/test/xitdb/sorted_map_test.clj index b5491b2..a60ae2a 100644 --- a/test/xitdb/sorted_map_test.clj +++ b/test/xitdb/sorted_map_test.clj @@ -74,6 +74,26 @@ (testing "pr-str does not throw on mixed key types" (is (string? (pr-str @db)))))) +(deftest materialized-sorted-map-can-be-written-back + (testing "a materialized sorted map (which carries key-comparator) can be + stored into another db without being rejected as a custom comparator" + (with-open [db1 (xdb/xit-db :memory) + db2 (xdb/xit-db :memory)] + (reset! db1 (sorted-map "b" 2 "a" 1)) + (let [m (tu/materialize @db1)] + (reset! db2 m) + (is (= ["a" "b"] (map key (seq @db2)))) + (is (= 1 (get @db2 "a")))))) + (testing "round-trips through materialize even with heterogeneous keys" + (with-open [db1 (xdb/xit-db :memory) + db2 (xdb/xit-db :memory)] + (reset! db1 (sorted-map)) + (swap! db1 assoc 1 :one) + (swap! db1 assoc "x" :ex) + (let [m (tu/materialize @db1)] + (reset! db2 m) + (is (= [1 "x"] (map key (seq @db2)))))))) + (deftest materialize-returns-plain-sorted-map (with-open [db (xdb/xit-db :memory)] (reset! db (sorted-map "b" 2 "a" 1)) diff --git a/test/xitdb/sorted_set_test.clj b/test/xitdb/sorted_set_test.clj index 0a5a3a3..91d1c90 100644 --- a/test/xitdb/sorted_set_test.clj +++ b/test/xitdb/sorted_set_test.clj @@ -66,6 +66,16 @@ (testing "pr-str does not throw on mixed member types" (is (string? (pr-str @db)))))) +(deftest materialized-sorted-set-can-be-written-back + (testing "a materialized sorted set (which carries key-comparator) can be + stored into another db without being rejected as a custom comparator" + (with-open [db1 (xdb/xit-db :memory) + db2 (xdb/xit-db :memory)] + (reset! db1 (sorted-set 3 1 2)) + (let [s (tu/materialize @db1)] + (reset! db2 s) + (is (= [1 2 3] (seq @db2))))))) + (deftest read-only-ops-return-plain-collections (with-open [db (xdb/xit-db :memory)] (reset! db (sorted-set 1 2)) From fe35c274d05e0569a7e55151ed1bc8155ce58fba Mon Sep 17 00:00:00 2001 From: Claude Date: Fri, 26 Jun 2026 12:15:50 +0000 Subject: [PATCH 10/15] Address sorted-collection review: unified numeric ordering + 5 fixes Implements the six items from the code review, developed test-first (TDD). 1. Numeric keys now share one order-preserving space so longs and doubles interleave by value (1 < 1.5 < 2) instead of being split into two adjacent type-tagged ranges. Previously a double bound on a long-keyed map (e.g. `(subseq m >= 1.5)`) returned nothing, and storing a sorted-map with mixed numeric keys silently reordered it on disk. A single `tag-number` now carries a double-precision sort key + subtype + exact bytes (reversible, type- preserving). Same-type ordering is exact; the only residual is cross-type ordering of values differing beyond 2^53, documented in `number-body`. NOTE: changes the on-disk encoding of numeric sorted keys (unreleased feature). 2. Remove unused `xitdb.util.conversion` require from `xitdb.sorted-map`. 3. `smap-empty!` / `sset-empty!` now mirror `operations/map-empty!` and write a fresh empty SORTED_MAP/SET slot. The previous discarded-constructor approach silently degraded the collection to a hash map/set after `(swap! db empty)`, so keys reinserted afterwards lost sorted semantics. Now verified by tests. 4. `XITDBWriteSortedMap` / `XITDBWriteSortedSet` implement Sorted/Indexed/ Reversible, so `nth`/`subseq`/`rseq`/`comparator` work on the value handed to `swap!` (the write types are Read* subclasses, so ops delegate directly). 5. Integer keys outside the signed 64-bit long range now fail fast with a clear, key-specific message instead of a raw numeric-cast error. 6. README: new "Sorted collections" section (ordered ops + `xitdb.sorted` rank/page/from-index) and sorted maps/sets added to Supported Data Types. Full suite: 164 tests, 1050 assertions, 0 failures. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01XprXbzsgFtSJnXpR9G12dH --- README.md | 60 ++++++++++++++++++++++++++ src/xitdb/sorted_map.clj | 41 ++++++++++++++++-- src/xitdb/sorted_set.clj | 34 +++++++++++++++ src/xitdb/util/sorted_key.clj | 63 ++++++++++++++++++++++++---- src/xitdb/util/sorted_operations.clj | 16 +++---- test/xitdb/sorted_key_test.clj | 43 +++++++++++++++++++ test/xitdb/sorted_map_test.clj | 55 ++++++++++++++++++++++++ test/xitdb/sorted_set_test.clj | 32 +++++++++++++- 8 files changed, 325 insertions(+), 19 deletions(-) diff --git a/README.md b/README.md index b5734e4..34cc3f8 100644 --- a/README.md +++ b/README.md @@ -108,6 +108,64 @@ Here's a taste of how your queries could look like: ``` +## Sorted collections + +In addition to (unordered) hash maps and sets, xitdb supports **on-disk sorted +maps and sets**, backed by the engine's rank-augmented B-tree. Store a Clojure +`sorted-map` / `sorted-set` and it is persisted as a sorted collection that keeps +its keys/members ordered on disk: + +```clojure +(reset! db (sorted-map "banana" 2 "apple" 1 "cherry" 3)) + +@db +;; => #XITDBSortedMap{"apple" 1, "banana" 2, "cherry" 3} + +(swap! db assoc "date" 4) ;; inserted in order, not appended +``` + +Reading back yields an `XITDBSortedMap` / `XITDBSortedSet` that implements +Clojure's ordered interfaces, so `seq`, `rseq`, `nth`, `subseq` and `rsubseq` +all work and read only what they touch from disk: + +```clojure +(reset! db (into (sorted-map) (map vector (range 0 100 2) (range)))) + +(nth @db 10) ;; => [20 10] ;; O(log n), no full scan +(subseq @db >= 90) ;; => ([90 45] [92 46] [94 47] [96 48] [98 49]) +(rseq @db) ;; => lazy descending seq of entries +``` + +Supported key/member types are strings, keywords, longs, doubles, `Instant` +and `Date`. They are stored with an order-preserving codec, so they iterate in +natural order — numeric for numbers, chronological for temporals, lexicographic +(by code point) for strings. Longs and doubles share a single numeric ordering, +so they interleave by value (e.g. `1 < 1.5 < 2`). Only the default ordering is +supported: `sorted-map-by` / `sorted-set-by` with a custom comparator is +rejected. + +### Ranking & pagination + +The `xitdb.sorted` namespace exposes the B-tree's O(log n) superpowers, which +are handy for building and paging on-disk secondary indexes: + +- `(rank coll k)` — number of entries strictly less than `k` (i.e. the index of + `k`, or its would-be insertion index if absent). +- `(from-index coll n)` — lazy ordered seq starting at rank `n`. +- `(page coll offset limit)` — lazy ordered page `[offset, offset+limit)`. + +```clojure +(require '[xitdb.sorted :as xsorted]) + +;; build a timestamp -> id index; events can arrive out of order +(reset! db (sorted-map)) +(doseq [e events] + (swap! db assoc (:ts e) (:id e))) + +(xsorted/rank @db some-ts) ;; chronological position of some-ts +(xsorted/page @db 100 20) ;; the 20 entries at ranks [100, 120) +``` + ## History Since the database is immutable, all previous values are accessed by reading @@ -199,8 +257,10 @@ The Clojure wrapper adds: ### Supported Data Types - **Maps** - Hash maps with efficient key-value access +- **Sorted maps** - On-disk B-tree maps with ordered iteration, `subseq`/`nth`/`rank` - **Vectors** - Array lists with indexed access - **Sets** - Hash sets with unique element storage +- **Sorted sets** - On-disk B-tree sets with ordered iteration and ranking - **Lists** - Linked lists and RRB tree-based linked array lists - **Primitives** - Numbers, strings, keywords, booleans, dates. diff --git a/src/xitdb/sorted_map.clj b/src/xitdb/sorted_map.clj index b27e338..9f09d37 100644 --- a/src/xitdb/sorted_map.clj +++ b/src/xitdb/sorted_map.clj @@ -5,11 +5,11 @@ used inside a transaction. Ordering is by the engine's unsigned byte comparison over order-preserving encoded keys (see `xitdb.util.sorted-key`). - The `clojure.lang.Sorted`/`Indexed`/`Reversible` protocols (subseq, nth, rseq) - are added in a later slice; this slice provides ascending ordered `seq` only." + Both views implement `clojure.lang.Sorted`/`Indexed`/`Reversible` (subseq, + nth, rseq) on top of the rank-augmented B-tree, in addition to ascending + ordered `seq`." (:require [xitdb.common :as common] - [xitdb.util.conversion :as conversion] [xitdb.util.sorted-key :as sorted-key] [xitdb.util.sorted-operations :as sorted-ops]) (:import @@ -217,6 +217,41 @@ (seq [_] (smap-seq wsm)) + ;; The same ordered read machinery as XITDBSortedMap, reading the + ;; in-transaction (uncommitted) state. `WriteSortedMap` is a `ReadSortedMap` + ;; subclass, so the rank/index-based ops apply directly to `wsm`. + clojure.lang.Sorted + (comparator [_] + sorted-key/key-comparator) + + (entryKey [_ entry] + (key entry)) + + (seq [_ ascending?] + (if ascending? + (smap-seq wsm) + (sorted-ops/smap-rseq wsm common/-read-from-cursor))) + + (seqFrom [_ key ascending?] + (if ascending? + (sorted-ops/smap-seq-from wsm common/-read-from-cursor key) + (sorted-ops/smap-rseq wsm common/-read-from-cursor + (descending-start-index wsm key)))) + + clojure.lang.Indexed + (nth [this i] + (let [e (.nth this i ::not-found)] + (if (identical? e ::not-found) + (throw (IndexOutOfBoundsException.)) + e))) + + (nth [_ i not-found] + (sorted-ops/smap-nth wsm common/-read-from-cursor i not-found)) + + clojure.lang.Reversible + (rseq [_] + (sorted-ops/smap-rseq wsm common/-read-from-cursor)) + clojure.core.protocols/IKVReduce (kv-reduce [this f init] (sorted-ops/smap-kv-reduce wsm common/-read-from-cursor f init)) diff --git a/src/xitdb/sorted_set.clj b/src/xitdb/sorted_set.clj index e69733a..0134d99 100644 --- a/src/xitdb/sorted_set.clj +++ b/src/xitdb/sorted_set.clj @@ -174,6 +174,40 @@ (seq [_] (sorted-ops/sset-seq wss)) + ;; The same ordered read machinery as XITDBSortedSet, reading the + ;; in-transaction (uncommitted) state. `WriteSortedSet` is a `ReadSortedSet` + ;; subclass, so the rank/index-based ops apply directly to `wss`. + clojure.lang.Sorted + (comparator [_] + sorted-key/key-comparator) + + (entryKey [_ entry] + entry) + + (seq [_ ascending?] + (if ascending? + (sorted-ops/sset-seq wss) + (sorted-ops/sset-rseq wss))) + + (seqFrom [_ member ascending?] + (if ascending? + (sorted-ops/sset-seq-from wss member) + (sorted-ops/sset-rseq wss (descending-start-index wss member)))) + + clojure.lang.Indexed + (nth [this i] + (let [e (.nth this i ::not-found)] + (if (identical? e ::not-found) + (throw (IndexOutOfBoundsException.)) + e))) + + (nth [_ i not-found] + (sorted-ops/sset-nth wss i not-found)) + + clojure.lang.Reversible + (rseq [_] + (sorted-ops/sset-rseq wss)) + clojure.lang.ILookup (valAt [this k] (.valAt this k nil)) diff --git a/src/xitdb/util/sorted_key.clj b/src/xitdb/util/sorted_key.clj index 320a44a..14be05d 100644 --- a/src/xitdb/util/sorted_key.clj +++ b/src/xitdb/util/sorted_key.clj @@ -29,15 +29,22 @@ ;; Type tags. Ordering of the tag values defines the cross-type order; they are ;; intentionally sparse to leave room for additional types in later slices. ;; Current cross-type order (by ascending tag byte): -;; long (0x10) < double (0x11) < instant (0x18) < date (0x19) -;; < string (0x20) < keyword (0x21) -(def ^:const tag-long (int 0x10)) -(def ^:const tag-double (int 0x11)) +;; number (0x10) < instant (0x18) < date (0x19) < string (0x20) < keyword (0x21) +;; Longs and doubles share the single `tag-number` so they interleave by numeric +;; value (see `number-body`) instead of being split into two adjacent ranges. +(def ^:const tag-number (int 0x10)) (def ^:const tag-instant (int 0x18)) (def ^:const tag-date (int 0x19)) (def ^:const tag-string (int 0x20)) (def ^:const tag-keyword (int 0x21)) +;; Numeric subtype byte (the body byte right after the 8-byte sort key). +;; `num-long` sorts before `num-double`, so a long and a double of equal numeric +;; value order deterministically (long first); it is only ever consulted as a +;; tie-breaker between values that share a sort key. +(def ^:const num-long (int 0x00)) +(def ^:const num-double (int 0x01)) + (defn- ^bytes utf8 [^String s] (.getBytes s StandardCharsets/UTF_8)) @@ -81,6 +88,37 @@ (bit-not flipped))] (Double/longBitsToDouble bits))) +(defn- ^bytes number-body + "Order-preserving, reversible body shared by long and double keys, so the two + types interleave by numeric value in one space. + + Layout (17 bytes, after the type tag): + [8-byte sort key][1-byte subtype][8-byte exact value] + + The sort key is the value rendered through `double->bytes`, so unsigned byte + comparison of two sort keys matches numeric comparison to double precision + (~53 significant bits). The `subtype` byte (long < double) and the exact 8 + bytes break ties between values that share a sort key, keeping the order total + and making the original long/double recoverable on decode. + + Caveat: because the sort key has double precision, two values that differ only + beyond 2^53 *and* have different types (one long, one double) can order by the + tie-break rather than strictly by value. Same-type ordering is always exact." + [subtype ^bytes sortkey ^bytes exact] + (let [out (ByteArrayOutputStream. 17)] + (.write out sortkey 0 8) + (.write out (int subtype)) + (.write out exact 0 8) + (.toByteArray out))) + +(defn- decode-number + "Inverse of `number-body`. The subtype byte is at offset 9 (1 type tag + 8 + sort-key bytes) and the exact value occupies the 8 bytes at offset 10." + [^bytes ba] + (if (= (bit-and (int (aget ba 9)) 0xff) num-long) + (bytes->long ba 10) + (bytes->double ba 10))) + (defn- ^bytes instant->bytes "12 bytes: epoch-second (8-byte big-endian, sign-flipped so negative epochs sort first) followed by nano-of-second (4-byte big-endian; always 0..1e9-1, so @@ -155,14 +193,24 @@ (tagged tag-keyword (keyword->bytes k)) (integer? k) - (tagged tag-long (long->bytes (long k))) + (let [n (try + (long k) + (catch IllegalArgumentException _ + (throw (IllegalArgumentException. + (str "Integer sorted-map key out of range: " k + ". Sorted collections encode integer keys as signed " + "64-bit longs, so keys must be within " + "[" Long/MIN_VALUE ", " Long/MAX_VALUE "].")))))] + (tagged tag-number + (number-body num-long (double->bytes (double n)) (long->bytes n)))) (float? k) (let [d (double k)] (when (Double/isNaN d) (throw (IllegalArgumentException. "NaN is not a valid sorted-map key (ordering undefined)"))) - (tagged tag-double (double->bytes d))) + (tagged tag-number + (number-body num-double (double->bytes d) (double->bytes d)))) (instance? Instant k) (tagged tag-instant (instant->bytes k)) @@ -208,8 +256,7 @@ (condp = tag tag-string (utf8-body ba) tag-keyword (decode-keyword ba) - tag-long (bytes->long ba 1) - tag-double (bytes->double ba 1) + tag-number (decode-number ba) tag-instant (bytes->instant ba 1) tag-date (bytes->date ba 1) (throw (IllegalArgumentException. (str "Unknown sorted-key tag: " tag)))))) diff --git a/src/xitdb/util/sorted_operations.clj b/src/xitdb/util/sorted_operations.clj index 7084f5d..466b7db 100644 --- a/src/xitdb/util/sorted_operations.clj +++ b/src/xitdb/util/sorted_operations.clj @@ -38,12 +38,13 @@ wsm) (defn smap-empty! - "Replaces contents with an empty sorted map, in place." + "Replaces contents with an empty sorted map, in place. Returns the + WriteSortedMap. Mirrors `operations/map-empty!`: writes a fresh empty + SORTED_MAP slot at the cursor so the value stays a sorted map (the + `key-comparator` makes `v->slot!` accept it as a default-ordered tree)." [^WriteSortedMap wsm] (let [^WriteCursor cursor (.-cursor wsm)] - (.write cursor nil) - ;; re-init an empty sorted map at the same cursor the wsm holds - (WriteSortedMap. cursor)) + (.write cursor (conversion/v->slot! cursor (sorted-map-by sorted-key/key-comparator)))) wsm) (defn smap-seq @@ -155,11 +156,12 @@ wss) (defn sset-empty! - "Replaces contents with an empty sorted set, in place." + "Replaces contents with an empty sorted set, in place. Returns the + WriteSortedSet. Mirrors `operations/set-empty!`: writes a fresh empty + SORTED_SET slot at the cursor so the value stays a sorted set." [^WriteSortedSet wss] (let [^WriteCursor cursor (.-cursor wss)] - (.write cursor nil) - (WriteSortedSet. cursor)) + (.write cursor (conversion/v->slot! cursor (sorted-set-by sorted-key/key-comparator)))) wss) (defn- member-from-cursor diff --git a/test/xitdb/sorted_key_test.clj b/test/xitdb/sorted_key_test.clj index ea1caf2..60bfebe 100644 --- a/test/xitdb/sorted_key_test.clj +++ b/test/xitdb/sorted_key_test.clj @@ -101,6 +101,37 @@ (= (Integer/signum (compare a b)) (Integer/signum (cmp-unsigned (sk/encode-key a) (sk/encode-key b))))) +(deftest cross-type-numeric-order + (testing "longs and doubles interleave by numeric value, not by type tag" + (doseq [[a b] [[1 1.5] [1.5 2] [1 2.0] [2.0 3] + [-1 -0.5] [-1.5 -1] [-1.0 0] [0 0.5] + ;; magnitudes that exercise the gap between the two old tags + [3 3.5] [-3.5 -3] + ;; a long below and a double above zero and vice-versa + [-2 1.0] [-1.5 2]]] + (is (neg? (cmp-unsigned (sk/encode-key a) (sk/encode-key b))) + (str a " < " b)) + (is (pos? (cmp-unsigned (sk/encode-key b) (sk/encode-key a))) + (str b " > " a))))) + +(deftest prop-cross-type-numeric-order + (testing "for 4000 random long/double pairs of distinct value, byte order + agrees in sign with clojure.core/compare (the numeric order)" + (let [r (java.util.Random. 77) + rand-num (fn [] (if (.nextBoolean r) + (long (.nextInt r 2000000)) ;; small long + (* (.nextDouble r) 1.0e6 + (if (.nextBoolean r) 1.0 -1.0))))] ;; small double + (is (every? + (fn [_] + (let [a (rand-num) b (rand-num) + c (compare a b)] + (or (zero? c) ;; numerically equal (e.g. 1 vs 1.0): order is unspecified + (= (Integer/signum c) + (Integer/signum (cmp-unsigned (sk/encode-key a) + (sk/encode-key b))))))) + (range 4000)))))) + (deftest prop-long-order (testing "for 2000 random long pairs, byte order == numeric order" (let [r (java.util.Random. 42)] @@ -154,6 +185,18 @@ (is (thrown? IllegalArgumentException (sk/encode-key true))) (is (thrown? IllegalArgumentException (sk/encode-key Double/NaN)))) +(deftest out-of-range-integer-key-throws-clearly + (testing "an integer key past the 64-bit long range is rejected eagerly with a + clear, key-specific message rather than a raw numeric-cast error" + (doseq [big [(.pow (java.math.BigInteger. "2") 100) + (.negate (.pow (java.math.BigInteger. "2") 100)) + (bigint "99999999999999999999999")]] + (let [ex (is (thrown? IllegalArgumentException (sk/encode-key big)))] + (is (re-find #"(?i)key" (.getMessage ex)) + "message should mention it is about a key") + (is (re-find #"(?i)long" (.getMessage ex)) + "message should mention the long range"))))) + (deftest instant-roundtrip (testing "instants round-trip to Instant, preserving sub-second precision" (doseq [i [(Instant/ofEpochSecond 0) diff --git a/test/xitdb/sorted_map_test.clj b/test/xitdb/sorted_map_test.clj index a60ae2a..77a0fa7 100644 --- a/test/xitdb/sorted_map_test.clj +++ b/test/xitdb/sorted_map_test.clj @@ -164,6 +164,20 @@ (swap! db assoc "c" 3) (is (= ["c"] (map key (seq @db)))))) +(deftest empty-then-reassoc-stays-a-sorted-map + (testing "after (swap! db empty) the value is still a sorted map, so keys + re-inserted afterwards keep sorted (not hash-map) semantics" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "a" 1 "b" 2)) + (swap! db empty) + (is (instance? xitdb.sorted_map.XITDBSortedMap @db)) + (is (sorted? @db)) + (is (= 0 (count @db))) + (swap! db assoc "c" 3 "a" 1) + (is (instance? xitdb.sorted_map.XITDBSortedMap @db)) + (is (sorted? @db)) + (is (= ["a" "c"] (map key (seq @db))))))) + (deftest print-method-ordered (with-open [db (xdb/xit-db :memory)] (reset! db (sorted-map "b" 2 "a" 1)) @@ -266,6 +280,28 @@ (reset! db (sorted-map 3.5 :a -1.5 :b 0.0 :c 1.0e308 :d -1.0e308 :e)) (is (= [-1.0e308 -1.5 0.0 3.5 1.0e308] (map key (seq @db))))))) +(deftest mixed-long-double-keys-interleave-numerically + (testing "long and double keys sort by numeric value and round-trip with type" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-map) + (map vector [1 0.5 2 1.5 3 -1.5] (range)))) + (is (= [-1.5 0.5 1 1.5 2 3] (map key (seq @db)))) + (is (some #(instance? Double %) (map key (seq @db)))) + (is (some integer? (map key (seq @db)))))) + (testing "matches the in-memory Clojure sorted-map oracle for mixed keys" + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-map) + (map vector [10 2.5 7 0.25 -3 -3.5 100.0 4] (range)))] + (reset! db oracle) + (is (= (keys oracle) (map key (seq @db))))))) + (testing "a double bound queries a long-keyed map at the right place (subseq)" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-map) (map vector [1 2 3 4 5] (range)))) + (let [m @db] + (is (= [2 3 4 5] (map key (subseq m >= 1.5)))) + (is (= [1 2 3] (map key (subseq m <= 3.5)))) + (is (= [3 4] (map key (subseq m > 2.5 < 4.5)))))))) + (deftest temporal-keys-iterate-chronologically (testing "Instant keys iterate chronologically and round-trip to Instant" (with-open [db (xdb/xit-db :memory)] @@ -282,6 +318,25 @@ (is (= [d0 d1 d2] (map key (seq @db)))) (is (every? #(instance? Date %) (map key (seq @db)))))))) +(deftest write-view-supports-sorted-indexed-reversible + (testing "the writeable sorted map handed to swap! supports nth/subseq/rseq + and exposes the same comparator as the read view" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-map) (map vector (range 5) (range 5)))) + (swap! db + (fn [m] + (is (= (clojure.lang.MapEntry. 0 0) (nth m 0))) + (is (= 2 (key (nth m 2)))) + (is (= ::nf (nth m 99 ::nf))) + (is (= [2 3 4] (map key (subseq m >= 2)))) + (is (= [0 1] (map key (subseq m < 2)))) + (is (= [4 3 2 1 0] (map key (rseq m)))) + (is (instance? java.util.Comparator + (.comparator ^clojure.lang.Sorted m))) + m)) + (testing "the data is unchanged after read-only queries in the txn" + (is (= [0 1 2 3 4] (map key (seq @db)))))))) + (deftest tracer-bullet-ordered-seq (testing "a persisted sorted-map is stored as a sorted map and seqs in key order" (with-open [db (xdb/xit-db :memory)] diff --git a/test/xitdb/sorted_set_test.clj b/test/xitdb/sorted_set_test.clj index 91d1c90..a904a14 100644 --- a/test/xitdb/sorted_set_test.clj +++ b/test/xitdb/sorted_set_test.clj @@ -215,6 +215,25 @@ (is (clojure.string/starts-with? s "#XITDBSortedSet")) (is (clojure.string/includes? s "1 2 3"))))) +(deftest write-view-supports-sorted-indexed-reversible + (testing "the writeable sorted set handed to swap! supports nth/subseq/rseq + and exposes the same comparator as the read view" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-set) (range 5))) + (swap! db + (fn [s] + (is (= 0 (nth s 0))) + (is (= 2 (nth s 2))) + (is (= ::nf (nth s 99 ::nf))) + (is (= [2 3 4] (subseq s >= 2))) + (is (= [0 1] (subseq s < 2))) + (is (= [4 3 2 1 0] (rseq s))) + (is (instance? java.util.Comparator + (.comparator ^clojure.lang.Sorted s))) + s)) + (testing "the data is unchanged after read-only queries in the txn" + (is (= [0 1 2 3 4] (seq @db))))))) + (deftest nesting-and-round-trip (testing "sorted set nests inside a hash map value" (with-open [db (xdb/xit-db :memory)] @@ -239,4 +258,15 @@ (is (= 0 (count @db))) (is (empty? (seq @db))) (swap! db conj 7) - (is (= [7] (seq @db)))))) + (is (= [7] (seq @db))))) + (testing "after empty the value is still a sorted set, so re-inserted members + keep sorted (not hash-set) semantics" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 1 2 3)) + (swap! db empty) + (is (instance? xitdb.sorted_set.XITDBSortedSet @db)) + (is (sorted? @db)) + (swap! db conj 5 1 3) + (is (instance? xitdb.sorted_set.XITDBSortedSet @db)) + (is (sorted? @db)) + (is (= [1 3 5] (seq @db)))))) From b8cbdbb77168711a521542bac532dc82b37d50b7 Mon Sep 17 00:00:00 2001 From: Claude Date: Fri, 26 Jun 2026 12:25:25 +0000 Subject: [PATCH 11/15] Remove reflection warnings from sorted read path Type-hint the per-element decode helpers so `seq`/iteration over sorted maps and sets no longer reflect: - sorted_operations.clj: hint `kvpair->entry`, `member-from-cursor` and `kvpair->member` params with `ReadCursor$KeyValuePairCursor` / `ReadCursor` (added the inner class to the import). These run once per element. - sorted_key.clj: coerce the `String(byte[], int, int, Charset)` offset/length args to int in `decode-keyword`'s namespaced branch. Both feature namespaces now compile reflection-clean. Full suite: 164 tests, 1050 assertions, 0 failures. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01XprXbzsgFtSJnXpR9G12dH --- src/xitdb/util/sorted_key.clj | 4 ++-- src/xitdb/util/sorted_operations.clj | 8 ++++---- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/src/xitdb/util/sorted_key.clj b/src/xitdb/util/sorted_key.clj index 14be05d..9de283d 100644 --- a/src/xitdb/util/sorted_key.clj +++ b/src/xitdb/util/sorted_key.clj @@ -236,8 +236,8 @@ (keyword nil (String. ba 2 (- (alength ba) 2) StandardCharsets/UTF_8)) (let [sep (loop [i 2] (if (zero? (aget ba i)) i (recur (inc i)))) - ns (String. ba 2 (- sep 2) StandardCharsets/UTF_8) - nm (String. ba (inc sep) (- (alength ba) (inc sep)) StandardCharsets/UTF_8)] + ns (String. ba (int 2) (int (- sep 2)) StandardCharsets/UTF_8) + nm (String. ba (int (inc sep)) (int (- (alength ba) (inc sep))) StandardCharsets/UTF_8)] (keyword ns nm))))) (def key-comparator diff --git a/src/xitdb/util/sorted_operations.clj b/src/xitdb/util/sorted_operations.clj index 466b7db..147723d 100644 --- a/src/xitdb/util/sorted_operations.clj +++ b/src/xitdb/util/sorted_operations.clj @@ -6,7 +6,7 @@ [xitdb.util.conversion :as conversion] [xitdb.util.sorted-key :as sorted-key]) (:import - [io.github.radarroark.xitdb ReadCursor WriteCursor ReadSortedMap WriteSortedMap ReadSortedSet WriteSortedSet])) + [io.github.radarroark.xitdb ReadCursor ReadCursor$KeyValuePairCursor WriteCursor ReadSortedMap WriteSortedMap ReadSortedSet WriteSortedSet])) (defn smap-item-count "O(1) entry count, delegating to the rank-augmented B-tree." @@ -66,7 +66,7 @@ (defn- kvpair->entry "Turns a Java KeyValuePair (with .-keyCursor/.-valueCursor) into a Clojure MapEntry (decoded key, read value)." - [kv read-from-cursor] + [^ReadCursor$KeyValuePairCursor kv read-from-cursor] (clojure.lang.MapEntry. (decode-key-cursor (.-keyCursor kv)) (read-from-cursor (.-valueCursor kv)))) @@ -166,12 +166,12 @@ (defn- member-from-cursor "Decodes the member from a set entry cursor (its key cursor)." - [cursor] + [^ReadCursor cursor] (decode-key-cursor (.-keyCursor (.readKeyValuePair cursor)))) (defn- kvpair->member "Decodes the member from a Java KeyValuePair (its key cursor)." - [kv] + [^ReadCursor$KeyValuePairCursor kv] (decode-key-cursor (.-keyCursor kv))) (defn sset-seq From 04b277aca88631c3bac561f15a5f712f8c1e79d1 Mon Sep 17 00:00:00 2001 From: radar roark <122068506+xeubie@users.noreply.github.com> Date: Fri, 26 Jun 2026 10:34:31 -0400 Subject: [PATCH 12/15] readme: update architecture section --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 34cc3f8..901cd76 100644 --- a/README.md +++ b/README.md @@ -243,8 +243,8 @@ Note that this is not doing an expensive copy of the fruits vector. We are benef ### Architecture `xitdb-clj` builds on [xitdb-java](https://github.com/xit-vcs/xitdb-java) which implements: -- **Hash Array Mapped Trie (HAMT)** - For efficient map and set operations -- **RRB Trees** - For vector operations with good concatenation performance +- **Hash Array Mapped Trie (HAMT)** - For HashMap/Set and ArrayList +- **B-trees** - For SortedMap/Set and LinkedArrayList (ArrayList with efficient slice and concat) - **Structural Sharing** - Minimizes memory usage across versions - **Copy-on-Write** - Ensures immutability while maintaining performance From 4fa853c76a155b8e94f295ff7a393eb5efee2d3f Mon Sep 17 00:00:00 2001 From: radar roark <122068506+xeubie@users.noreply.github.com> Date: Fri, 26 Jun 2026 11:12:27 -0400 Subject: [PATCH 13/15] readme: use clojure terminology and fix another RRB reference --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 901cd76..5c4593e 100644 --- a/README.md +++ b/README.md @@ -243,8 +243,8 @@ Note that this is not doing an expensive copy of the fruits vector. We are benef ### Architecture `xitdb-clj` builds on [xitdb-java](https://github.com/xit-vcs/xitdb-java) which implements: -- **Hash Array Mapped Trie (HAMT)** - For HashMap/Set and ArrayList -- **B-trees** - For SortedMap/Set and LinkedArrayList (ArrayList with efficient slice and concat) +- **Hash Array Mapped Trie (HAMT)** - For hash map/set and vector +- **B-trees** - For sorted map/set and linked list - **Structural Sharing** - Minimizes memory usage across versions - **Copy-on-Write** - Ensures immutability while maintaining performance @@ -261,7 +261,7 @@ The Clojure wrapper adds: - **Vectors** - Array lists with indexed access - **Sets** - Hash sets with unique element storage - **Sorted sets** - On-disk B-tree sets with ordered iteration and ranking -- **Lists** - Linked lists and RRB tree-based linked array lists +- **Lists** - B-tree-based linked array lists - **Primitives** - Numbers, strings, keywords, booleans, dates. ## Performance Characteristics From 6e2330c665818fbdd22e806017ffee32f5c09889 Mon Sep 17 00:00:00 2001 From: radar roark <122068506+xeubie@users.noreply.github.com> Date: Sat, 27 Jun 2026 20:08:44 -0400 Subject: [PATCH 14/15] bump xitdb --- deps.edn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/deps.edn b/deps.edn index 6cee079..a1a9e7f 100644 --- a/deps.edn +++ b/deps.edn @@ -1,6 +1,6 @@ {:paths ["src" "test"] :deps {org.clojure/clojure {:mvn/version "1.12.0"} - io.github.radarroark/xitdb {:mvn/version "0.30.0"}} + io.github.radarroark/xitdb {:mvn/version "0.31.0"}} :aliases {:test {:extra-deps {io.github.cognitect-labs/test-runner From 1d67b03127fcf8998c7761ba12f950782d650650 Mon Sep 17 00:00:00 2001 From: radar roark <122068506+xeubie@users.noreply.github.com> Date: Sun, 28 Jun 2026 09:33:36 -0400 Subject: [PATCH 15/15] bump xitdb again --- deps.edn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/deps.edn b/deps.edn index a1a9e7f..2c831a9 100644 --- a/deps.edn +++ b/deps.edn @@ -1,6 +1,6 @@ {:paths ["src" "test"] :deps {org.clojure/clojure {:mvn/version "1.12.0"} - io.github.radarroark/xitdb {:mvn/version "0.31.0"}} + io.github.radarroark/xitdb {:mvn/version "0.32.0"}} :aliases {:test {:extra-deps {io.github.cognitect-labs/test-runner