diff --git a/README.md b/README.md index b5734e4..5c4593e 100644 --- a/README.md +++ b/README.md @@ -108,6 +108,64 @@ Here's a taste of how your queries could look like: ``` +## Sorted collections + +In addition to (unordered) hash maps and sets, xitdb supports **on-disk sorted +maps and sets**, backed by the engine's rank-augmented B-tree. Store a Clojure +`sorted-map` / `sorted-set` and it is persisted as a sorted collection that keeps +its keys/members ordered on disk: + +```clojure +(reset! db (sorted-map "banana" 2 "apple" 1 "cherry" 3)) + +@db +;; => #XITDBSortedMap{"apple" 1, "banana" 2, "cherry" 3} + +(swap! db assoc "date" 4) ;; inserted in order, not appended +``` + +Reading back yields an `XITDBSortedMap` / `XITDBSortedSet` that implements +Clojure's ordered interfaces, so `seq`, `rseq`, `nth`, `subseq` and `rsubseq` +all work and read only what they touch from disk: + +```clojure +(reset! db (into (sorted-map) (map vector (range 0 100 2) (range)))) + +(nth @db 10) ;; => [20 10] ;; O(log n), no full scan +(subseq @db >= 90) ;; => ([90 45] [92 46] [94 47] [96 48] [98 49]) +(rseq @db) ;; => lazy descending seq of entries +``` + +Supported key/member types are strings, keywords, longs, doubles, `Instant` +and `Date`. They are stored with an order-preserving codec, so they iterate in +natural order — numeric for numbers, chronological for temporals, lexicographic +(by code point) for strings. Longs and doubles share a single numeric ordering, +so they interleave by value (e.g. `1 < 1.5 < 2`). Only the default ordering is +supported: `sorted-map-by` / `sorted-set-by` with a custom comparator is +rejected. + +### Ranking & pagination + +The `xitdb.sorted` namespace exposes the B-tree's O(log n) superpowers, which +are handy for building and paging on-disk secondary indexes: + +- `(rank coll k)` — number of entries strictly less than `k` (i.e. the index of + `k`, or its would-be insertion index if absent). +- `(from-index coll n)` — lazy ordered seq starting at rank `n`. +- `(page coll offset limit)` — lazy ordered page `[offset, offset+limit)`. + +```clojure +(require '[xitdb.sorted :as xsorted]) + +;; build a timestamp -> id index; events can arrive out of order +(reset! db (sorted-map)) +(doseq [e events] + (swap! db assoc (:ts e) (:id e))) + +(xsorted/rank @db some-ts) ;; chronological position of some-ts +(xsorted/page @db 100 20) ;; the 20 entries at ranks [100, 120) +``` + ## History Since the database is immutable, all previous values are accessed by reading @@ -185,8 +243,8 @@ Note that this is not doing an expensive copy of the fruits vector. We are benef ### Architecture `xitdb-clj` builds on [xitdb-java](https://github.com/xit-vcs/xitdb-java) which implements: -- **Hash Array Mapped Trie (HAMT)** - For efficient map and set operations -- **RRB Trees** - For vector operations with good concatenation performance +- **Hash Array Mapped Trie (HAMT)** - For hash map/set and vector +- **B-trees** - For sorted map/set and linked list - **Structural Sharing** - Minimizes memory usage across versions - **Copy-on-Write** - Ensures immutability while maintaining performance @@ -199,9 +257,11 @@ The Clojure wrapper adds: ### Supported Data Types - **Maps** - Hash maps with efficient key-value access +- **Sorted maps** - On-disk B-tree maps with ordered iteration, `subseq`/`nth`/`rank` - **Vectors** - Array lists with indexed access - **Sets** - Hash sets with unique element storage -- **Lists** - Linked lists and RRB tree-based linked array lists +- **Sorted sets** - On-disk B-tree sets with ordered iteration and ranking +- **Lists** - B-tree-based linked array lists - **Primitives** - Numbers, strings, keywords, booleans, dates. ## Performance Characteristics diff --git a/deps.edn b/deps.edn index 6cee079..2c831a9 100644 --- a/deps.edn +++ b/deps.edn @@ -1,6 +1,6 @@ {:paths ["src" "test"] :deps {org.clojure/clojure {:mvn/version "1.12.0"} - io.github.radarroark/xitdb {:mvn/version "0.30.0"}} + io.github.radarroark/xitdb {:mvn/version "0.32.0"}} :aliases {:test {:extra-deps {io.github.cognitect-labs/test-runner diff --git a/src/xitdb/sorted.clj b/src/xitdb/sorted.clj new file mode 100644 index 0000000..923d9c6 --- /dev/null +++ b/src/xitdb/sorted.clj @@ -0,0 +1,73 @@ +(ns xitdb.sorted + "Public helpers for on-disk sorted collections (`XITDBSortedMap` / + `XITDBSortedSet`) that go beyond `clojure.core`'s in-memory sorted + collections, exposing the rank-augmented B-tree's superpowers: + + - `rank` - O(log n) index of a key/member (inverse of indexed `nth`). + - `page` - lazy ordered page starting at a rank (offset/limit). + - `from-index` - lazy ordered seq starting at a rank. + + These are the recommended way to build and paginate on-disk secondary indexes. + For example, a timestamp -> id index over events: + + (reset! db (sorted-map)) + (doseq [e events] + (swap! db assoc (:ts e) (:id e))) + + ;; serve the page [offset, offset+limit) in chronological order, + ;; reading only that page from disk: + (xsorted/page @db offset limit) + + ;; or, starting from a known timestamp boundary: + (->> (subseq @db >= start-ts) + (take limit)) + + Both `rank` and the pagination helpers work on `XITDBSortedMap` (yielding + `MapEntry` pairs) and `XITDBSortedSet` (yielding members)." + (:require + [xitdb.common :as common] + [xitdb.util.sorted-operations :as sorted-ops]) + (:import + [io.github.radarroark.xitdb ReadSortedMap ReadSortedSet])) + +(defn rank + "Number of entries in the sorted collection `coll` strictly less than `k`, + in O(log n). For a present key/member this is its index (the inverse of + `nth`); for an absent one it is the would-be insertion index. Works on both + `XITDBSortedMap` and `XITDBSortedSet`." + [coll k] + (let [u (common/-unwrap coll)] + (cond + (instance? ReadSortedMap u) (sorted-ops/smap-rank u k) + (instance? ReadSortedSet u) (sorted-ops/sset-rank u k) + :else (throw (IllegalArgumentException. + (str "rank requires an XITDBSortedMap or XITDBSortedSet, got: " + (type coll))))))) + +(defn from-index + "Lazy ordered seq of the sorted collection `coll` starting at rank `n` + (0-based), backed by the engine's `iteratorFromIndex` (O(log n) seek, then a + streaming walk). Does not materialise the whole collection. For a sorted map + the elements are `MapEntry` pairs; for a sorted set they are members. Returns + nil when `n` is at or past the end." + [coll n] + (when (neg? n) + (throw (IllegalArgumentException. + (str "from-index requires a non-negative rank, got: " n)))) + (let [u (common/-unwrap coll)] + (cond + (instance? ReadSortedMap u) + (sorted-ops/smap-seq-from-index u common/-read-from-cursor n) + (instance? ReadSortedSet u) + (sorted-ops/sset-seq-from-index u n) + :else (throw (IllegalArgumentException. + (str "from-index requires an XITDBSortedMap or XITDBSortedSet, got: " + (type coll))))))) + +(defn page + "Lazy ordered page of `coll`: at most `limit` elements starting at rank + `offset`. Equivalent to `(take limit (from-index coll offset))`, and stops + cleanly at the end of the collection. Lazy and low-memory. For a sorted map + the elements are `MapEntry` pairs; for a sorted set they are members." + [coll offset limit] + (take limit (from-index coll offset))) diff --git a/src/xitdb/sorted_map.clj b/src/xitdb/sorted_map.clj new file mode 100644 index 0000000..9f09d37 --- /dev/null +++ b/src/xitdb/sorted_map.clj @@ -0,0 +1,283 @@ +(ns xitdb.sorted-map + "On-disk sorted map wrapper types, modelled on `xitdb.hash-map`. + + `XITDBSortedMap` is the read view; `XITDBWriteSortedMap` is the mutable view + used inside a transaction. Ordering is by the engine's unsigned byte + comparison over order-preserving encoded keys (see `xitdb.util.sorted-key`). + + Both views implement `clojure.lang.Sorted`/`Indexed`/`Reversible` (subseq, + nth, rseq) on top of the rank-augmented B-tree, in addition to ascending + ordered `seq`." + (:require + [xitdb.common :as common] + [xitdb.util.sorted-key :as sorted-key] + [xitdb.util.sorted-operations :as sorted-ops]) + (:import + [io.github.radarroark.xitdb + ReadCursor ReadSortedMap WriteCursor WriteSortedMap])) + +(defn smap-seq [rsm] + (sorted-ops/smap-seq rsm common/-read-from-cursor)) + +(defn- descending-start-index + "Index to begin a descending walk for `seqFrom(key, false)`: the largest rank + whose key is <= `key`. Uses `rank` (count of keys strictly < key); if `key` + itself is present, include it." + [^ReadSortedMap rsm key] + (let [r (sorted-ops/smap-rank rsm key)] + (if (sorted-ops/smap-contains-key? rsm key) + r + (dec r)))) + +(deftype XITDBSortedMap [^ReadSortedMap rsm] + + clojure.lang.ILookup + (valAt [this key] + (.valAt this key nil)) + + (valAt [this key not-found] + (let [cursor (sorted-ops/smap-read-cursor rsm key)] + (if (nil? cursor) + not-found + (common/-read-from-cursor cursor)))) + + clojure.lang.Associative + (containsKey [this key] + (sorted-ops/smap-contains-key? rsm key)) + + (entryAt [this key] + (when (.containsKey this key) + (clojure.lang.MapEntry. key (.valAt this key nil)))) + + (assoc [this k v] + (assoc (common/-materialize-shallow this) k v)) + + clojure.lang.IPersistentMap + (without [this k] + (dissoc (common/-materialize-shallow this) k)) + + (count [this] + (sorted-ops/smap-item-count rsm)) + + clojure.lang.IPersistentCollection + (cons [this o] + (. clojure.lang.RT (conj (common/-materialize-shallow this) o))) + + (empty [this] + (sorted-map-by sorted-key/key-comparator)) + + (equiv [this other] + (and (instance? clojure.lang.IPersistentMap other) + (= (into {} this) (into {} other)))) + + clojure.lang.Seqable + (seq [_] + (smap-seq rsm)) + + clojure.lang.Sorted + (comparator [_] + sorted-key/key-comparator) + + (entryKey [_ entry] + (key entry)) + + (seq [_ ascending?] + (if ascending? + (smap-seq rsm) + (sorted-ops/smap-rseq rsm common/-read-from-cursor))) + + (seqFrom [_ key ascending?] + (if ascending? + (sorted-ops/smap-seq-from rsm common/-read-from-cursor key) + (sorted-ops/smap-rseq rsm common/-read-from-cursor + (descending-start-index rsm key)))) + + clojure.lang.Indexed + (nth [this i] + (let [e (.nth this i ::not-found)] + (if (identical? e ::not-found) + (throw (IndexOutOfBoundsException.)) + e))) + + (nth [_ i not-found] + (sorted-ops/smap-nth rsm common/-read-from-cursor i not-found)) + + clojure.lang.Reversible + (rseq [_] + (sorted-ops/smap-rseq rsm common/-read-from-cursor)) + + clojure.lang.IFn + (invoke [this k] + (.valAt this k)) + + (invoke [this k not-found] + (.valAt this k not-found)) + + java.lang.Iterable + (iterator [this] + (let [iter (clojure.lang.SeqIterator. (seq this))] + (reify java.util.Iterator + (hasNext [_] + (.hasNext iter)) + (next [_] + (.next iter)) + (remove [_] + (throw (UnsupportedOperationException. "XITDBSortedMap iterator is read-only")))))) + + clojure.core.protocols/IKVReduce + (kv-reduce [this f init] + (sorted-ops/smap-kv-reduce rsm common/-read-from-cursor f init)) + + common/ISlot + (-slot [this] + (-> rsm .cursor .slot)) + + common/IUnwrap + (-unwrap [this] + rsm) + + common/IMaterialize + (-materialize [this] + (reduce (fn [m [k v]] + (assoc m k (common/materialize v))) (sorted-map-by sorted-key/key-comparator) (seq this))) + + common/IMaterializeShallow + (-materialize-shallow [this] + (reduce (fn [m [k v]] + (assoc m k v)) (sorted-map-by sorted-key/key-comparator) (seq this))) + + Object + (toString [this] + (str (into (sorted-map-by sorted-key/key-comparator) this)))) + +(defmethod print-method XITDBSortedMap [o ^java.io.Writer w] + (.write w "#XITDBSortedMap") + (print-method (into (sorted-map-by sorted-key/key-comparator) o) w)) + +;--------------------------------------------------- + +(deftype XITDBWriteSortedMap [^WriteSortedMap wsm] + clojure.lang.IPersistentCollection + (cons [this o] + (cond + (instance? clojure.lang.MapEntry o) + (.assoc this (key o) (val o)) + + (map? o) + (doseq [[k v] (seq o)] + (.assoc this k v)) + + (and (sequential? o) (= 2 (count o))) + (.assoc this (first o) (second o)) + + :else + (throw (IllegalArgumentException. "Can only cons MapEntries or key-value pairs onto maps"))) + this) + + (empty [this] + (sorted-ops/smap-empty! wsm) + this) + + (equiv [this other] + (and (= (count this) (count other)) + (every? (fn [[k v]] (= v (get other k ::not-found))) + (seq this)))) + + clojure.lang.Associative + (assoc [this k v] + (sorted-ops/smap-assoc-value! wsm k (common/unwrap v)) + this) + + (containsKey [this key] + (sorted-ops/smap-contains-key? wsm key)) + + (entryAt [this key] + (when (.containsKey this key) + (clojure.lang.MapEntry. key (.valAt this key nil)))) + + clojure.lang.IPersistentMap + (without [this key] + (sorted-ops/smap-dissoc-key! wsm key) + this) + + (count [this] + (sorted-ops/smap-item-count wsm)) + + clojure.lang.ILookup + (valAt [this key] + (.valAt this key nil)) + + (valAt [this key not-found] + (let [cursor (sorted-ops/smap-read-cursor wsm key)] + (if (nil? cursor) + not-found + (common/-read-from-cursor cursor)))) + + clojure.lang.Seqable + (seq [_] + (smap-seq wsm)) + + ;; The same ordered read machinery as XITDBSortedMap, reading the + ;; in-transaction (uncommitted) state. `WriteSortedMap` is a `ReadSortedMap` + ;; subclass, so the rank/index-based ops apply directly to `wsm`. + clojure.lang.Sorted + (comparator [_] + sorted-key/key-comparator) + + (entryKey [_ entry] + (key entry)) + + (seq [_ ascending?] + (if ascending? + (smap-seq wsm) + (sorted-ops/smap-rseq wsm common/-read-from-cursor))) + + (seqFrom [_ key ascending?] + (if ascending? + (sorted-ops/smap-seq-from wsm common/-read-from-cursor key) + (sorted-ops/smap-rseq wsm common/-read-from-cursor + (descending-start-index wsm key)))) + + clojure.lang.Indexed + (nth [this i] + (let [e (.nth this i ::not-found)] + (if (identical? e ::not-found) + (throw (IndexOutOfBoundsException.)) + e))) + + (nth [_ i not-found] + (sorted-ops/smap-nth wsm common/-read-from-cursor i not-found)) + + clojure.lang.Reversible + (rseq [_] + (sorted-ops/smap-rseq wsm common/-read-from-cursor)) + + clojure.core.protocols/IKVReduce + (kv-reduce [this f init] + (sorted-ops/smap-kv-reduce wsm common/-read-from-cursor f init)) + + common/ISlot + (-slot [this] + (-> wsm .cursor .slot)) + + common/IUnwrap + (-unwrap [this] + wsm) + + common/IReadOnly + (-read-only [this] + (XITDBSortedMap. wsm)) + + Object + (toString [this] + (str "XITDBWriteSortedMap"))) + +(defmethod print-method XITDBWriteSortedMap [o ^java.io.Writer w] + (.write w "#XITDBWriteSortedMap") + (print-method (into (sorted-map-by sorted-key/key-comparator) (common/-read-only o)) w)) + +(defn xwrite-sorted-map [^WriteCursor write-cursor] + (->XITDBWriteSortedMap (WriteSortedMap. write-cursor))) + +(defn xsorted-map [^ReadCursor read-cursor] + (->XITDBSortedMap (ReadSortedMap. read-cursor))) diff --git a/src/xitdb/sorted_set.clj b/src/xitdb/sorted_set.clj new file mode 100644 index 0000000..0134d99 --- /dev/null +++ b/src/xitdb/sorted_set.clj @@ -0,0 +1,245 @@ +(ns xitdb.sorted-set + "On-disk sorted set wrapper types, modelled on `xitdb.hash-set` (set shape) and + `xitdb.sorted-map` (the `Sorted`/`Indexed`/`Reversible` machinery). + + A `SORTED_SET` is a `SORTED_MAP` with no values: each member is its own key. + `XITDBSortedSet` is the read view; `XITDBWriteSortedSet` is the mutable view + used inside a transaction. Ordering is by the engine's unsigned byte + comparison over order-preserving encoded members (see `xitdb.util.sorted-key`)." + (:require + [xitdb.common :as common] + [xitdb.util.sorted-key :as sorted-key] + [xitdb.util.sorted-operations :as sorted-ops]) + (:import + [io.github.radarroark.xitdb + ReadCursor ReadSortedSet WriteCursor WriteSortedSet])) + +(defn- descending-start-index + "Index to begin a descending walk for `seqFrom(member, false)`: the largest + rank whose member is <= `member`. Uses `rank` (count of members strictly < + member); if `member` itself is present, include it." + [^ReadSortedSet rss member] + (let [r (sorted-ops/sset-rank rss member)] + (if (sorted-ops/sset-contains? rss member) + r + (dec r)))) + +(deftype XITDBSortedSet [^ReadSortedSet rss] + + clojure.lang.IPersistentSet + (disjoin [this k] + (disj (common/-materialize-shallow this) k)) + + (contains [this k] + (sorted-ops/sset-contains? rss k)) + + (get [this k] + (when (.contains this k) + k)) + + clojure.lang.IPersistentCollection + (cons [this o] + (. clojure.lang.RT (conj (common/-materialize-shallow this) o))) + + (empty [this] + (sorted-set-by sorted-key/key-comparator)) + + (equiv [this other] + (and (instance? clojure.lang.IPersistentSet other) + (= (count this) (count other)) + (every? #(.contains this %) other))) + + (count [_] + (sorted-ops/sset-item-count rss)) + + clojure.lang.Seqable + (seq [_] + (sorted-ops/sset-seq rss)) + + clojure.lang.Sorted + (comparator [_] + sorted-key/key-comparator) + + (entryKey [_ entry] + entry) + + (seq [_ ascending?] + (if ascending? + (sorted-ops/sset-seq rss) + (sorted-ops/sset-rseq rss))) + + (seqFrom [_ member ascending?] + (if ascending? + (sorted-ops/sset-seq-from rss member) + (sorted-ops/sset-rseq rss (descending-start-index rss member)))) + + clojure.lang.Indexed + (nth [this i] + (let [e (.nth this i ::not-found)] + (if (identical? e ::not-found) + (throw (IndexOutOfBoundsException.)) + e))) + + (nth [_ i not-found] + (sorted-ops/sset-nth rss i not-found)) + + clojure.lang.Reversible + (rseq [_] + (sorted-ops/sset-rseq rss)) + + clojure.lang.ILookup + (valAt [this k] + (.valAt this k nil)) + + (valAt [this k not-found] + (if (.contains this k) + k + not-found)) + + clojure.lang.IFn + (invoke [this k] + (.valAt this k)) + + (invoke [this k not-found] + (.valAt this k not-found)) + + java.lang.Iterable + (iterator [this] + (let [iter (clojure.lang.SeqIterator. (seq this))] + (reify java.util.Iterator + (hasNext [_] + (.hasNext iter)) + (next [_] + (.next iter)) + (remove [_] + (throw (UnsupportedOperationException. "XITDBSortedSet iterator is read-only")))))) + + common/ISlot + (-slot [this] + (-> rss .cursor .slot)) + + common/IUnwrap + (-unwrap [this] + rss) + + common/IMaterialize + (-materialize [this] + (into (sorted-set-by sorted-key/key-comparator) (map common/materialize (seq this)))) + + common/IMaterializeShallow + (-materialize-shallow [this] + (into (sorted-set-by sorted-key/key-comparator) (seq this))) + + Object + (toString [this] + (str (into (sorted-set-by sorted-key/key-comparator) this)))) + +(defmethod print-method XITDBSortedSet [o ^java.io.Writer w] + (.write w "#XITDBSortedSet") + (print-method (into (sorted-set-by sorted-key/key-comparator) o) w)) + +;--------------------------------------------------- + +(deftype XITDBWriteSortedSet [^WriteSortedSet wss] + clojure.lang.IPersistentSet + (disjoin [this v] + (sorted-ops/sset-disj-value! wss (common/unwrap v)) + this) + + (contains [this v] + (sorted-ops/sset-contains? wss (common/unwrap v))) + + (get [this k] + (when (.contains this (common/unwrap k)) + k)) + + clojure.lang.IPersistentCollection + (cons [this o] + (sorted-ops/sset-assoc-value! wss (common/unwrap o)) + this) + + (empty [this] + (sorted-ops/sset-empty! wss) + this) + + (equiv [this other] + (and (instance? clojure.lang.IPersistentSet other) + (= (count this) (count other)) + (every? #(.contains this %) other))) + + (count [_] + (sorted-ops/sset-item-count wss)) + + clojure.lang.Seqable + (seq [_] + (sorted-ops/sset-seq wss)) + + ;; The same ordered read machinery as XITDBSortedSet, reading the + ;; in-transaction (uncommitted) state. `WriteSortedSet` is a `ReadSortedSet` + ;; subclass, so the rank/index-based ops apply directly to `wss`. + clojure.lang.Sorted + (comparator [_] + sorted-key/key-comparator) + + (entryKey [_ entry] + entry) + + (seq [_ ascending?] + (if ascending? + (sorted-ops/sset-seq wss) + (sorted-ops/sset-rseq wss))) + + (seqFrom [_ member ascending?] + (if ascending? + (sorted-ops/sset-seq-from wss member) + (sorted-ops/sset-rseq wss (descending-start-index wss member)))) + + clojure.lang.Indexed + (nth [this i] + (let [e (.nth this i ::not-found)] + (if (identical? e ::not-found) + (throw (IndexOutOfBoundsException.)) + e))) + + (nth [_ i not-found] + (sorted-ops/sset-nth wss i not-found)) + + clojure.lang.Reversible + (rseq [_] + (sorted-ops/sset-rseq wss)) + + clojure.lang.ILookup + (valAt [this k] + (.valAt this k nil)) + + (valAt [this k not-found] + (if (.contains this k) + k + not-found)) + + common/ISlot + (-slot [_] + (-> wss .cursor .slot)) + + common/IUnwrap + (-unwrap [_] + wss) + + common/IReadOnly + (-read-only [this] + (XITDBSortedSet. wss)) + + Object + (toString [_] + (str "XITDBWriteSortedSet"))) + +(defmethod print-method XITDBWriteSortedSet [o ^java.io.Writer w] + (.write w "#XITDBWriteSortedSet") + (print-method (into (sorted-set-by sorted-key/key-comparator) (common/-read-only o)) w)) + +;; Constructor functions +(defn xwrite-sorted-set [^WriteCursor write-cursor] + (->XITDBWriteSortedSet (WriteSortedSet. write-cursor))) + +(defn xsorted-set [^ReadCursor read-cursor] + (->XITDBSortedSet (ReadSortedSet. read-cursor))) diff --git a/src/xitdb/util/conversion.clj b/src/xitdb/util/conversion.clj index 1774bc6..3b345c0 100644 --- a/src/xitdb/util/conversion.clj +++ b/src/xitdb/util/conversion.clj @@ -1,11 +1,13 @@ (ns xitdb.util.conversion (:require + [xitdb.util.sorted-key :as sorted-key] [xitdb.util.validation :as validation]) (:import + [clojure.lang PersistentTreeMap PersistentTreeSet] [io.github.radarroark.xitdb Database Database$Bytes Database$Float Database$Int ReadCursor Slot Slotted Tag WriteArrayList WriteCountedHashMap WriteCountedHashSet WriteCursor - WriteHashMap WriteHashSet WriteLinkedArrayList] + WriteHashMap WriteHashSet WriteLinkedArrayList WriteSortedMap WriteSortedSet] [java.io OutputStream OutputStreamWriter] [java.security DigestOutputStream])) @@ -18,6 +20,8 @@ (= tag Tag/ARRAY_LIST) :array-list (= tag Tag/LINKED_ARRAY_LIST) :linked-array-list (= tag Tag/HASH_MAP) :hash-map + (= tag Tag/SORTED_MAP) :sorted-map + (= tag Tag/SORTED_SET) :sorted-set (= tag Tag/KV_PAIR) :kv-pair (= tag Tag/BYTES) :bytes (= tag Tag/SHORT_BYTES) :short-bytes @@ -133,6 +137,19 @@ (declare ^WriteCursor coll->ArrayListCursor!) (declare ^WriteCursor list->LinkedArrayListCursor!) (declare ^WriteCursor set->WriteCursor!) +(declare ^WriteCursor sorted-map->WriteSortedMapCursor!) +(declare ^WriteCursor sorted-set->WriteSortedSetCursor!) + +(defn default-sorted-comparator? + "True if `coll` (a PersistentTreeMap or PersistentTreeSet) is ordered in a way + the engine can honour with its fixed unsigned byte ordering: either Clojure's + natural ordering (no custom comparator) or `sorted-key/key-comparator`, which + *is* that byte ordering and is what `materialize` stamps onto the collections + it rebuilds. Any other custom comparator is rejected." + [coll] + (let [cmp (.comparator ^clojure.lang.Sorted coll)] + (or (identical? clojure.lang.RT/DEFAULT_COMPARATOR cmp) + (identical? sorted-key/key-comparator cmp)))) (defn ^Slot v->slot! "Converts a value to a XitDB slot. @@ -148,6 +165,16 @@ (instance? Slotted v) (.slot ^Slotted v) + ;; A sorted map is also `map?`, so it MUST be checked before the generic + ;; hash-map branch or it would be shadowed and stored as a hash map. + (instance? PersistentTreeMap v) + (do + (when-not (default-sorted-comparator? v) + (throw (IllegalArgumentException. + "sorted-map-by with a custom comparator is not supported; only natural ordering is allowed."))) + (.write cursor nil) + (.slot (sorted-map->WriteSortedMapCursor! cursor v))) + (map? v) (do (.write cursor nil) @@ -158,6 +185,16 @@ (.write cursor nil) (.slot (list->LinkedArrayListCursor! cursor v))) + ;; A sorted set is also `set?`, so it MUST be checked before the generic + ;; hash-set branch or it would be shadowed and stored as a hash set. + (instance? PersistentTreeSet v) + (do + (when-not (default-sorted-comparator? v) + (throw (IllegalArgumentException. + "sorted-set-by with a custom comparator is not supported; only natural ordering is allowed."))) + (.write cursor nil) + (.slot (sorted-set->WriteSortedSetCursor! cursor v))) + (set? v) (do (.write cursor nil) @@ -184,6 +221,12 @@ (let [write-array (WriteArrayList. cursor)] (doseq [v coll] (cond + ;; Sorted map/set are also map?/set?, so delegate to v->slot! (which + ;; checks the tree types first) before the generic hash branches. + (or (instance? PersistentTreeMap v) (instance? PersistentTreeSet v)) + (let [v-cursor (.appendCursor write-array)] + (.write v-cursor (v->slot! v-cursor v))) + (map? v) (let [v-cursor (.appendCursor write-array)] (map->WriteHashMapCursor! v-cursor v)) @@ -214,6 +257,12 @@ (doseq [v coll] (when *debug?* (println "v=" v)) (cond + ;; Sorted map/set are also map?/set?, so delegate to v->slot! (which + ;; checks the tree types first) before the generic hash branches. + (or (instance? PersistentTreeMap v) (instance? PersistentTreeSet v)) + (let [v-cursor (.appendCursor write-list)] + (.write v-cursor (v->slot! v-cursor v))) + (map? v) (let [v-cursor (.appendCursor write-list)] (map->WriteHashMapCursor! v-cursor v)) @@ -250,6 +299,27 @@ (.write cursor (v->slot! cursor v)))) (.-cursor whm))) +(defn ^WriteCursor sorted-map->WriteSortedMapCursor! + "Writes a Clojure sorted map `m` to a XitDB WriteSortedMap. + Keys are encoded with the order-preserving codec; values are written + recursively via `v->slot!`. Returns the cursor of the created WriteSortedMap." + [^WriteCursor cursor m] + (let [wsm (WriteSortedMap. cursor)] + (doseq [[k v] m] + (let [value-cursor (.putCursor wsm (sorted-key/encode-key k))] + (.write value-cursor (v->slot! value-cursor v)))) + (.-cursor wsm))) + +(defn ^WriteCursor sorted-set->WriteSortedSetCursor! + "Writes a Clojure sorted set `s` to a XitDB WriteSortedSet. + Members are encoded with the order-preserving codec. Returns the cursor of the + created WriteSortedSet." + [^WriteCursor cursor s] + (let [wss (WriteSortedSet. cursor)] + (doseq [member s] + (.put wss (sorted-key/encode-key member))) + (.-cursor wss))) + (defn ^WriteCursor set->WriteCursor! "Writes a Clojure set `s` to a XitDB WriteHashSet. Returns the cursor of the created WriteHashSet." @@ -326,9 +396,26 @@ (= value-tag Tag/COUNTED_HASH_MAP) (map-write-cursor (WriteCountedHashMap. cursor) current-key) + ;; Sorted maps store the real key bytes (order-preserving codec), so a + ;; keypath write resolves a value cursor by the encoded key, mirroring the + ;; read-side dispatch in `read-from-cursor`. + (= value-tag Tag/SORTED_MAP) + (.putCursor (WriteSortedMap. cursor) (sorted-key/encode-key current-key)) + (= value-tag Tag/HASH_SET) (set-write-cursor (WriteHashSet. cursor) current-key) + ;; A sorted-set member is stored as an immutable B-tree key (the engine + ;; only exposes a writeable value slot, which a set never uses), so there + ;; is no in-place "member cursor" to hand back the way a hash set has. + ;; Mutating membership goes through conj/disj on the set itself. + (= value-tag Tag/SORTED_SET) + (throw (IllegalArgumentException. + (format (str "Cannot get a write cursor to sorted-set member '%s': " + "sorted-set members are immutable keys. Use conj/disj " + "on the sorted set itself to change membership.") + current-key))) + (= value-tag Tag/COUNTED_HASH_SET) (set-write-cursor (WriteCountedHashSet. cursor) current-key) diff --git a/src/xitdb/util/sorted_key.clj b/src/xitdb/util/sorted_key.clj new file mode 100644 index 0000000..9de283d --- /dev/null +++ b/src/xitdb/util/sorted_key.clj @@ -0,0 +1,262 @@ +(ns xitdb.util.sorted-key + "Order-preserving, reversible key codec for on-disk sorted maps/sets. + + Unlike hash maps (which SHA-1-hash their keys), sorted collections store the + real key bytes so they can be recovered on read and compared by the engine's + unsigned lexicographic byte comparison (`Arrays.compareUnsigned`). The codec + must therefore be: + + 1. reversible - `decode-key (encode-key k)` == k + 2. order-preserving - `sign(compareUnsigned (encode a) (encode b))` + == `sign(compare a b)` for any two keys. + + Every encoding carries a leading 1-byte type tag. The tag both identifies the + type on decode and establishes a total order across types, so heterogeneous + keys never throw. + + Supported key types: string, keyword, long, double, Instant and Date. Strings + encode as their UTF-8 bytes (already code-point ordered); keywords use a flag + + namespace + name layout so they sort like Clojure's default comparator (see + `keyword->bytes`); numeric/temporal keys use order-preserving big-endian + encodings." + (:import + [java.io ByteArrayOutputStream] + [java.nio ByteBuffer] + [java.nio.charset StandardCharsets] + [java.time Instant] + [java.util Date])) + +;; Type tags. Ordering of the tag values defines the cross-type order; they are +;; intentionally sparse to leave room for additional types in later slices. +;; Current cross-type order (by ascending tag byte): +;; number (0x10) < instant (0x18) < date (0x19) < string (0x20) < keyword (0x21) +;; Longs and doubles share the single `tag-number` so they interleave by numeric +;; value (see `number-body`) instead of being split into two adjacent ranges. +(def ^:const tag-number (int 0x10)) +(def ^:const tag-instant (int 0x18)) +(def ^:const tag-date (int 0x19)) +(def ^:const tag-string (int 0x20)) +(def ^:const tag-keyword (int 0x21)) + +;; Numeric subtype byte (the body byte right after the 8-byte sort key). +;; `num-long` sorts before `num-double`, so a long and a double of equal numeric +;; value order deterministically (long first); it is only ever consulted as a +;; tie-breaker between values that share a sort key. +(def ^:const num-long (int 0x00)) +(def ^:const num-double (int 0x01)) + +(defn- ^bytes utf8 [^String s] + (.getBytes s StandardCharsets/UTF_8)) + +(defn- ^bytes tagged [tag ^bytes body] + (let [out (ByteArrayOutputStream. (inc (alength body)))] + (.write out (int tag)) + (.write out body 0 (alength body)) + (.toByteArray out))) + +(defn- ^bytes long->bytes + "8-byte big-endian with the sign bit flipped, so signed longs sort correctly + under unsigned byte comparison (negatives before positives)." + [^long n] + (let [buf (doto (ByteBuffer/allocate 8) (.putLong (bit-xor n Long/MIN_VALUE)))] + (.array buf))) + +(defn- bytes->long + "Inverse of `long->bytes`. Reads 8 big-endian bytes starting at `off`." + [^bytes ba off] + (bit-xor (.getLong (ByteBuffer/wrap ba (int off) 8)) Long/MIN_VALUE)) + +(defn- ^bytes double->bytes + "IEEE-754 8-byte big-endian with an order-preserving bit flip: if the sign bit + is set, flip all bits; otherwise flip only the sign bit. This makes doubles + sort numerically under unsigned byte comparison. NaN is rejected by + `encode-key` (its ordering is undefined), so it never reaches here." + [^double d] + (let [bits (Double/doubleToLongBits d) + flipped (if (neg? bits) + (bit-not bits) + (bit-or bits Long/MIN_VALUE)) + buf (doto (ByteBuffer/allocate 8) (.putLong flipped))] + (.array buf))) + +(defn- bytes->double + "Inverse of `double->bytes`. Reads 8 big-endian bytes starting at `off`." + [^bytes ba off] + (let [flipped (.getLong (ByteBuffer/wrap ba (int off) 8)) + bits (if (neg? flipped) + (bit-and flipped Long/MAX_VALUE) + (bit-not flipped))] + (Double/longBitsToDouble bits))) + +(defn- ^bytes number-body + "Order-preserving, reversible body shared by long and double keys, so the two + types interleave by numeric value in one space. + + Layout (17 bytes, after the type tag): + [8-byte sort key][1-byte subtype][8-byte exact value] + + The sort key is the value rendered through `double->bytes`, so unsigned byte + comparison of two sort keys matches numeric comparison to double precision + (~53 significant bits). The `subtype` byte (long < double) and the exact 8 + bytes break ties between values that share a sort key, keeping the order total + and making the original long/double recoverable on decode. + + Caveat: because the sort key has double precision, two values that differ only + beyond 2^53 *and* have different types (one long, one double) can order by the + tie-break rather than strictly by value. Same-type ordering is always exact." + [subtype ^bytes sortkey ^bytes exact] + (let [out (ByteArrayOutputStream. 17)] + (.write out sortkey 0 8) + (.write out (int subtype)) + (.write out exact 0 8) + (.toByteArray out))) + +(defn- decode-number + "Inverse of `number-body`. The subtype byte is at offset 9 (1 type tag + 8 + sort-key bytes) and the exact value occupies the 8 bytes at offset 10." + [^bytes ba] + (if (= (bit-and (int (aget ba 9)) 0xff) num-long) + (bytes->long ba 10) + (bytes->double ba 10))) + +(defn- ^bytes instant->bytes + "12 bytes: epoch-second (8-byte big-endian, sign-flipped so negative epochs + sort first) followed by nano-of-second (4-byte big-endian; always 0..1e9-1, so + unsigned order is chronological). Byte order therefore equals chronological + order across the full Instant range." + [^Instant i] + (let [buf (doto (ByteBuffer/allocate 12) + (.putLong (bit-xor (.getEpochSecond i) Long/MIN_VALUE)) + (.putInt (int (.getNano i))))] + (.array buf))) + +(defn- ^Instant bytes->instant [^bytes ba off] + (let [bb (ByteBuffer/wrap ba (int off) 12) + secs (bit-xor (.getLong bb) Long/MIN_VALUE) + nano (.getInt bb)] + (Instant/ofEpochSecond secs nano))) + +(defn- ^bytes date->bytes + "8-byte big-endian epoch-millis with the sign bit flipped, so pre-epoch dates + sort before the epoch. Byte order equals chronological order." + [^Date d] + (let [buf (doto (ByteBuffer/allocate 8) + (.putLong (bit-xor (.getTime d) Long/MIN_VALUE)))] + (.array buf))) + +(defn- ^Date bytes->date [^bytes ba off] + (Date. (bit-xor (.getLong (ByteBuffer/wrap ba (int off) 8)) Long/MIN_VALUE))) + +;; Keyword presence-of-namespace flag (the first body byte). 0 sorts before 1, +;; so non-namespaced keywords sort before namespaced ones, matching Clojure's +;; default comparator (clojure.lang.Symbol.compareTo). +(def ^:const kw-no-ns (int 0x00)) +(def ^:const kw-has-ns (int 0x01)) + +(defn- keyword->bytes + "Order-preserving, collision-free encoding of a keyword, matching Clojure's + default comparator: non-namespaced keywords sort before namespaced ones, then + by namespace, then by name. + + Layout (after the type tag): a flag byte, then the parts. + - no namespace : `kw-no-ns` ++ name-utf8 + - namespaced : `kw-has-ns` ++ ns-utf8 ++ 0x00 ++ name-utf8 + + The 0x00 separator can never appear inside UTF-8 keyword text (NUL is not a + legal keyword character), so it sorts below every namespace byte and cleanly + delimits namespace from name. The flag byte also keeps `(keyword nil \"a/b\")` + (no namespace, name \"a/b\") distinct from `:a/b` (namespace \"a\", name + \"b\"), which would otherwise both flatten to \"a/b\"." + ^bytes [k] + (let [out (ByteArrayOutputStream.) + ns (namespace k) + nm ^bytes (utf8 (name k))] + (if ns + (let [nsb ^bytes (utf8 ns)] + (.write out kw-has-ns) + (.write out nsb 0 (alength nsb)) + (.write out (int 0x00)) + (.write out nm 0 (alength nm))) + (do + (.write out kw-no-ns) + (.write out nm 0 (alength nm)))) + (.toByteArray out))) + +(defn encode-key + "Encodes Clojure key `k` to an order-preserving, reversible byte array." + ^bytes [k] + (cond + (string? k) + (tagged tag-string (utf8 k)) + + (keyword? k) + (tagged tag-keyword (keyword->bytes k)) + + (integer? k) + (let [n (try + (long k) + (catch IllegalArgumentException _ + (throw (IllegalArgumentException. + (str "Integer sorted-map key out of range: " k + ". Sorted collections encode integer keys as signed " + "64-bit longs, so keys must be within " + "[" Long/MIN_VALUE ", " Long/MAX_VALUE "].")))))] + (tagged tag-number + (number-body num-long (double->bytes (double n)) (long->bytes n)))) + + (float? k) + (let [d (double k)] + (when (Double/isNaN d) + (throw (IllegalArgumentException. + "NaN is not a valid sorted-map key (ordering undefined)"))) + (tagged tag-number + (number-body num-double (double->bytes d) (double->bytes d)))) + + (instance? Instant k) + (tagged tag-instant (instant->bytes k)) + + (instance? Date k) + (tagged tag-date (date->bytes k)) + + :else + (throw (IllegalArgumentException. + (str "Unsupported sorted-map key type: " (type k)))))) + +(defn- ^String utf8-body [^bytes ba] + (String. ba 1 (dec (alength ba)) StandardCharsets/UTF_8)) + +(defn- decode-keyword + "Inverse of `keyword->bytes`. `ba[0]` is the type tag, `ba[1]` is the + namespace-presence flag, the remainder is the part(s)." + [^bytes ba] + (let [flag (bit-and (int (aget ba 1)) 0xff)] + (if (= flag kw-no-ns) + ;; Use the 2-arg form with a nil namespace so a name containing \"/\" + ;; is not re-parsed into a namespace (which would corrupt the key). + (keyword nil (String. ba 2 (- (alength ba) 2) StandardCharsets/UTF_8)) + (let [sep (loop [i 2] + (if (zero? (aget ba i)) i (recur (inc i)))) + ns (String. ba (int 2) (int (- sep 2)) StandardCharsets/UTF_8) + nm (String. ba (int (inc sep)) (int (- (alength ba) (inc sep))) StandardCharsets/UTF_8)] + (keyword ns nm))))) + +(def key-comparator + "A `java.util.Comparator` consistent with the engine's natural ordering: + compares two keys by `Arrays.compareUnsigned` over their encoded bytes. Use + this (not `clojure.core/compare`) so `subseq`/`rsubseq` bound checks agree with + on-disk order across all supported types, including heterogeneous keys." + (reify java.util.Comparator + (compare [_ a b] + (java.util.Arrays/compareUnsigned (encode-key a) (encode-key b))))) + +(defn decode-key + "Decodes a byte array produced by `encode-key` back to the Clojure key." + [^bytes ba] + (let [tag (bit-and (int (aget ba 0)) 0xff)] + (condp = tag + tag-string (utf8-body ba) + tag-keyword (decode-keyword ba) + tag-number (decode-number ba) + tag-instant (bytes->instant ba 1) + tag-date (bytes->date ba 1) + (throw (IllegalArgumentException. (str "Unknown sorted-key tag: " tag)))))) diff --git a/src/xitdb/util/sorted_operations.clj b/src/xitdb/util/sorted_operations.clj new file mode 100644 index 0000000..147723d --- /dev/null +++ b/src/xitdb/util/sorted_operations.clj @@ -0,0 +1,255 @@ +(ns xitdb.util.sorted-operations + "Bridges the XITDBSorted* wrapper types to the Java Read/WriteSortedMap. + Keys are encoded/decoded through `xitdb.util.sorted-key` (order-preserving, + reversible) rather than hashed, so the real key is recoverable on read." + (:require + [xitdb.util.conversion :as conversion] + [xitdb.util.sorted-key :as sorted-key]) + (:import + [io.github.radarroark.xitdb ReadCursor ReadCursor$KeyValuePairCursor WriteCursor ReadSortedMap WriteSortedMap ReadSortedSet WriteSortedSet])) + +(defn smap-item-count + "O(1) entry count, delegating to the rank-augmented B-tree." + [^ReadSortedMap rsm] + (.count rsm)) + +(defn- decode-key-cursor [^ReadCursor key-cursor] + (sorted-key/decode-key (.readBytes key-cursor nil))) + +(defn smap-read-cursor + "Read cursor for `key`, or nil if absent." + [^ReadSortedMap rsm key] + (.getCursor rsm (sorted-key/encode-key key))) + +(defn smap-contains-key? + [^ReadSortedMap rsm key] + (some? (smap-read-cursor rsm key))) + +(defn smap-assoc-value! + "Encodes `k`, writes value `v` at its cursor. Returns the WriteSortedMap." + [^WriteSortedMap wsm k v] + (let [value-cursor (.putCursor wsm (sorted-key/encode-key k))] + (.write value-cursor (conversion/v->slot! value-cursor v)) + wsm)) + +(defn smap-dissoc-key! + [^WriteSortedMap wsm k] + (.remove wsm (sorted-key/encode-key k)) + wsm) + +(defn smap-empty! + "Replaces contents with an empty sorted map, in place. Returns the + WriteSortedMap. Mirrors `operations/map-empty!`: writes a fresh empty + SORTED_MAP slot at the cursor so the value stays a sorted map (the + `key-comparator` makes `v->slot!` accept it as a default-ordered tree)." + [^WriteSortedMap wsm] + (let [^WriteCursor cursor (.-cursor wsm)] + (.write cursor (conversion/v->slot! cursor (sorted-map-by sorted-key/key-comparator)))) + wsm) + +(defn smap-seq + "Lazy ascending seq of key-value MapEntry pairs, or nil if empty. + `read-from-cursor` converts a value cursor to a Clojure value." + [^ReadSortedMap rsm read-from-cursor] + (let [it (.iterator rsm)] + (when (.hasNext it) + (letfn [(step [] + (lazy-seq + (when (.hasNext it) + (let [cursor (.next it) + kv (.readKeyValuePair cursor) + k (decode-key-cursor (.-keyCursor kv)) + v (read-from-cursor (.-valueCursor kv))] + (cons (clojure.lang.MapEntry. k v) (step))))))] + (step))))) + +(defn- kvpair->entry + "Turns a Java KeyValuePair (with .-keyCursor/.-valueCursor) into a Clojure + MapEntry (decoded key, read value)." + [^ReadCursor$KeyValuePairCursor kv read-from-cursor] + (clojure.lang.MapEntry. + (decode-key-cursor (.-keyCursor kv)) + (read-from-cursor (.-valueCursor kv)))) + +(defn smap-seq-from + "Lazy ascending seq of MapEntry pairs starting at the first key >= `key`, + using the engine's native O(log n) lower-bound seek. nil if none." + [^ReadSortedMap rsm read-from-cursor key] + (let [it (.iteratorFrom rsm (sorted-key/encode-key key))] + (when (.hasNext it) + (letfn [(step [] + (lazy-seq + (when (.hasNext it) + (cons (kvpair->entry (.readKeyValuePair (.next it)) + read-from-cursor) + (step)))))] + (step))))) + +(defn smap-nth + "MapEntry at rank `index` (negative counts from the end), or `not-found` when + out of range. O(log n) via the rank-augmented B-tree." + [^ReadSortedMap rsm read-from-cursor index not-found] + (let [kv (.getIndexKeyValuePair rsm (long index))] + (if (nil? kv) + not-found + (kvpair->entry kv read-from-cursor)))) + +(defn smap-rank + "Number of keys strictly less than `key`. O(log n)." + [^ReadSortedMap rsm key] + (.rank rsm (sorted-key/encode-key key))) + +(defn smap-seq-from-index + "Lazy ascending seq of MapEntry pairs starting at rank `index` (0-based), + using the engine's native O(log n) `iteratorFromIndex` seek. nil if none. + Streams one entry at a time; does not materialise the whole collection." + [^ReadSortedMap rsm read-from-cursor index] + (let [it (.iteratorFromIndex rsm (long index))] + (when (.hasNext it) + (letfn [(step [] + (lazy-seq + (when (.hasNext it) + (cons (kvpair->entry (.readKeyValuePair (.next it)) + read-from-cursor) + (step)))))] + (step))))) + +(defn smap-rseq + "Lazy descending seq of MapEntry pairs, walking `getIndexKeyValuePair` from + index `start` down to 0. Stays low-memory (one entry materialised at a time)." + ([^ReadSortedMap rsm read-from-cursor] + (smap-rseq rsm read-from-cursor (dec (.count rsm)))) + ([^ReadSortedMap rsm read-from-cursor start] + (when (>= start 0) + (letfn [(step [i] + (lazy-seq + (when (>= i 0) + (let [kv (.getIndexKeyValuePair rsm (long i))] + (when kv + (cons (kvpair->entry kv read-from-cursor) (step (dec i))))))))] + (step start))))) + +;; --------------------------------------------------------------------------- +;; Sorted SET helpers. A SORTED_SET is a SortedMap with no values: the MEMBER +;; is the key. We decode the member from the key cursor of each entry. +;; --------------------------------------------------------------------------- + +(defn sset-item-count + "O(1) member count." + [^ReadSortedSet rss] + (.count rss)) + +(defn sset-contains? + [^ReadSortedSet rss member] + (.contains rss (sorted-key/encode-key member))) + +(defn sset-assoc-value! + "Adds `member` to the set (no-op if already present). Returns the WriteSortedSet." + [^WriteSortedSet wss member] + (.put wss (sorted-key/encode-key member)) + wss) + +(defn sset-disj-value! + "Removes `member` from the set (no-op if absent). Returns the WriteSortedSet." + [^WriteSortedSet wss member] + (.remove wss (sorted-key/encode-key member)) + wss) + +(defn sset-empty! + "Replaces contents with an empty sorted set, in place. Returns the + WriteSortedSet. Mirrors `operations/set-empty!`: writes a fresh empty + SORTED_SET slot at the cursor so the value stays a sorted set." + [^WriteSortedSet wss] + (let [^WriteCursor cursor (.-cursor wss)] + (.write cursor (conversion/v->slot! cursor (sorted-set-by sorted-key/key-comparator)))) + wss) + +(defn- member-from-cursor + "Decodes the member from a set entry cursor (its key cursor)." + [^ReadCursor cursor] + (decode-key-cursor (.-keyCursor (.readKeyValuePair cursor)))) + +(defn- kvpair->member + "Decodes the member from a Java KeyValuePair (its key cursor)." + [^ReadCursor$KeyValuePairCursor kv] + (decode-key-cursor (.-keyCursor kv))) + +(defn sset-seq + "Lazy ascending seq of members, or nil if empty." + [^ReadSortedSet rss] + (let [it (.iterator rss)] + (when (.hasNext it) + (letfn [(step [] + (lazy-seq + (when (.hasNext it) + (cons (member-from-cursor (.next it)) (step)))))] + (step))))) + +(defn sset-seq-from + "Lazy ascending seq of members starting at the first member >= `member`, + using the engine's native O(log n) lower-bound seek. nil if none." + [^ReadSortedSet rss member] + (let [it (.iteratorFrom rss (sorted-key/encode-key member))] + (when (.hasNext it) + (letfn [(step [] + (lazy-seq + (when (.hasNext it) + (cons (member-from-cursor (.next it)) (step)))))] + (step))))) + +(defn sset-nth + "Member at rank `index` (negative counts from the end), or `not-found` when + out of range. O(log n)." + [^ReadSortedSet rss index not-found] + (let [kv (.getIndexKeyValuePair rss (long index))] + (if (nil? kv) + not-found + (kvpair->member kv)))) + +(defn sset-rank + "Number of members strictly less than `member`. O(log n)." + [^ReadSortedSet rss member] + (.rank rss (sorted-key/encode-key member))) + +(defn sset-seq-from-index + "Lazy ascending seq of members starting at rank `index` (0-based), using the + engine's native O(log n) `iteratorFromIndex` seek. nil if none. Streams one + member at a time; does not materialise the whole collection." + [^ReadSortedSet rss index] + (let [it (.iteratorFromIndex rss (long index))] + (when (.hasNext it) + (letfn [(step [] + (lazy-seq + (when (.hasNext it) + (cons (member-from-cursor (.next it)) (step)))))] + (step))))) + +(defn sset-rseq + "Lazy descending seq of members, walking `getIndexKeyValuePair` from index + `start` down to 0. Low-memory (one member at a time)." + ([^ReadSortedSet rss] + (sset-rseq rss (dec (.count rss)))) + ([^ReadSortedSet rss start] + (when (>= start 0) + (letfn [(step [i] + (lazy-seq + (when (>= i 0) + (let [kv (.getIndexKeyValuePair rss (long i))] + (when kv + (cons (kvpair->member kv) (step (dec i))))))))] + (step start))))) + +(defn smap-kv-reduce + [^ReadSortedMap rsm read-from-cursor f init] + (let [it (.iterator rsm)] + (loop [result init] + (if (.hasNext it) + (let [cursor (.next it) + kv (.readKeyValuePair cursor) + k (decode-key-cursor (.-keyCursor kv)) + v (read-from-cursor (.-valueCursor kv)) + new-result (f result k v)] + (if (reduced? new-result) + @new-result + (recur new-result))) + result)))) diff --git a/src/xitdb/xitdb_types.clj b/src/xitdb/xitdb_types.clj index 2e1c2d4..b011b11 100644 --- a/src/xitdb/xitdb_types.clj +++ b/src/xitdb/xitdb_types.clj @@ -5,6 +5,8 @@ [xitdb.hash-map :as xhash-map] [xitdb.hash-set :as xhash-set] [xitdb.linked-list :as xlinked-list] + [xitdb.sorted-map :as xsorted-map] + [xitdb.sorted-set :as xsorted-set] [xitdb.util.conversion :as conversion]) (:import [io.github.radarroark.xitdb ReadCursor Slot Tag WriteCursor])) @@ -51,6 +53,16 @@ (xhash-set/xwrite-hash-set-counted cursor) (xhash-set/xhash-set-counted cursor)) + (= value-tag Tag/SORTED_MAP) + (if for-writing? + (xsorted-map/xwrite-sorted-map cursor) + (xsorted-map/xsorted-map cursor)) + + (= value-tag Tag/SORTED_SET) + (if for-writing? + (xsorted-set/xwrite-sorted-set cursor) + (xsorted-set/xsorted-set cursor)) + (= value-tag Tag/ARRAY_LIST) (if for-writing? (xarray-list/xwrite-array-list cursor) diff --git a/test/xitdb/cursor_test.clj b/test/xitdb/cursor_test.clj index acbe375..9473093 100644 --- a/test/xitdb/cursor_test.clj +++ b/test/xitdb/cursor_test.clj @@ -28,3 +28,34 @@ (testing "Correctly handles invalid cursor path" (is (thrown? IndexOutOfBoundsException @(xdb/xdb-cursor db [:foo :bar 999]))))))) + +(deftest cursor-into-sorted-map + (with-open [db (xdb/xit-db :memory)] + (reset! db {:idx (sorted-map 1 {:name "a"} 2 {:name "b"})}) + (let [c (xdb/xdb-cursor db [:idx 1 :name])] + (testing "read through a sorted-map key" + (is (= "a" @c))) + (testing "reset! through a sorted-map key writes back to the db" + (reset! c "A") + (is (= "A" @c)) + (is (= "A" (get-in (xdb/materialize @db) [:idx 1 :name]))) + (is (= "b" (get-in (xdb/materialize @db) [:idx 2 :name]))))))) + +(deftest cursor-into-sorted-set + (with-open [db (xdb/xit-db :memory)] + (reset! db {:tags (sorted-set "a" "b" "c")}) + (let [c (xdb/xdb-cursor db [:tags])] + (testing "read a sorted set through a cursor" + (is (= ["a" "b" "c"] (seq (xdb/materialize @c))))) + (testing "swap! mutates the sorted set at the cursor" + (swap! c conj "d") + (is (= ["a" "b" "c" "d"] (seq (xdb/materialize @c)))))))) + +(deftest cursor-into-sorted-set-member-is-rejected + (with-open [db (xdb/xit-db :memory)] + (reset! db {:tags (sorted-set "a" "b" "c")}) + (testing "writing into a sorted-set member throws a clear, specific error + (members are immutable keys; use conj/disj on the set itself)" + (let [c (xdb/xdb-cursor db [:tags "a"]) + ex (is (thrown? IllegalArgumentException (reset! c "z")))] + (is (re-find #"sorted-set member" (.getMessage ex))))))) diff --git a/test/xitdb/sorted_key_test.clj b/test/xitdb/sorted_key_test.clj new file mode 100644 index 0000000..60bfebe --- /dev/null +++ b/test/xitdb/sorted_key_test.clj @@ -0,0 +1,217 @@ +(ns xitdb.sorted-key-test + (:require + [clojure.test :refer :all] + [xitdb.util.sorted-key :as sk]) + (:import + [java.time Instant] + [java.util Date])) + +(defn cmp-unsigned [^bytes a ^bytes b] + (java.util.Arrays/compareUnsigned a b)) + +(deftest string-roundtrip + (testing "strings encode and decode back to the same string" + (doseq [s ["" "a" "hello" "with spaces" "unicode-é-字"]] + (is (= s (sk/decode-key (sk/encode-key s))))))) + +(deftest keyword-roundtrip + (testing "keywords round-trip, including namespaced" + (doseq [k [:a :foo/bar :a-much-longer-keyword]] + (is (= k (sk/decode-key (sk/encode-key k))))))) + +(deftest keyword-order-matches-clojure + (testing "byte order matches Clojure's default keyword comparator: + non-namespaced keywords sort before namespaced ones" + (doseq [[a b] [[:a :aa] [:aa :b] + ;; every non-namespaced keyword sorts before any namespaced + [:b :a/a] [:zzz :a/a] + ;; among namespaced: by namespace then name + [:a/a :a/b] [:a/x :ab/a] [:a/b :b/a]]] + (is (neg? (cmp-unsigned (sk/encode-key a) (sk/encode-key b))) + (str a " < " b)) + ;; and consistent with clojure.core/compare on keywords + (is (= (Integer/signum (compare a b)) + (Integer/signum (cmp-unsigned (sk/encode-key a) (sk/encode-key b)))) + (str "order-agrees " a " " b))))) + +(deftest keyword-namespace-no-collision + (testing "(keyword nil \"a/b\") and :a/b are distinct keys that both round-trip" + (let [k1 (keyword nil "a/b") ;; no namespace, name contains a slash + k2 :a/b] ;; namespace \"a\", name \"b\" + (is (not= k1 k2)) + (is (= k1 (sk/decode-key (sk/encode-key k1)))) + (is (= k2 (sk/decode-key (sk/encode-key k2)))) + (is (not (java.util.Arrays/equals (sk/encode-key k1) (sk/encode-key k2))) + "encodings must differ so the keys do not collide on disk")))) + +(deftest string-order-preserved + (testing "byte order matches code-point order for strings" + (doseq [[a b] [["a" "b"] ["a" "ab"] ["abc" "abd"] ["" "a"] ["k0009" "k0010"]]] + (is (neg? (cmp-unsigned (sk/encode-key a) (sk/encode-key b))) + (str a " < " b))))) + +(deftest long-roundtrip + (testing "longs round-trip, including boundary values" + (doseq [n [0 1 -1 42 -42 Long/MIN_VALUE Long/MAX_VALUE + (long Integer/MIN_VALUE) (long Integer/MAX_VALUE)]] + (is (= n (sk/decode-key (sk/encode-key n))) + (str "roundtrip " n))))) + +(deftest long-order-preserved + (testing "byte order matches numeric order, negatives before positives" + (doseq [[a b] [[1 2] [9 10] [-5 0] [-5 3] [0 3] + [Long/MIN_VALUE -1] [Long/MIN_VALUE Long/MAX_VALUE] + [-1 0] [0 Long/MAX_VALUE]]] + (is (neg? (cmp-unsigned (sk/encode-key a) (sk/encode-key b))) + (str a " < " b))))) + +(deftest double-roundtrip + (testing "doubles round-trip, including extremes" + (doseq [d [0.0 1.0 -1.0 3.14 -3.14 1.0e308 -1.0e308 1.0e-308 -1.0e-308 + Double/MIN_VALUE Double/MAX_VALUE]] + (is (= d (sk/decode-key (sk/encode-key d))) + (str "roundtrip " d))))) + +(deftest double-order-preserved + (testing "byte order matches numeric order across sign and magnitude" + (doseq [[a b] [[1.0 2.0] [-1.0 0.0] [-2.0 -1.0] [-1.0e308 1.0e308] + [0.0 1.0e-308] [-3.14 3.14] [-1.0e-308 0.0]]] + (is (neg? (cmp-unsigned (sk/encode-key a) (sk/encode-key b))) + (str a " < " b))))) + +(deftest date-roundtrip + (testing "dates round-trip to Date" + (doseq [d [(Date. 0) (Date. 1719100000000) (Date. -100000) (Date.)]] + (is (= d (sk/decode-key (sk/encode-key d))) + (str "roundtrip " d)) + (is (instance? Date (sk/decode-key (sk/encode-key d))))))) + +(deftest date-order-preserved + (testing "byte order matches chronological order, including pre-epoch" + (doseq [[a b] [[(Date. 0) (Date. 1)] + [(Date. -5000) (Date. 0)] + [(Date. 1000) (Date. 2000)] + [(Date. -2000) (Date. -1000)]]] + (is (neg? (cmp-unsigned (sk/encode-key a) (sk/encode-key b))) + (str a " < " b))))) + +;; ----- property-based ordering (deterministic randomized loops, fixed seed) --- + +(defn- order-agrees? [a b] + (= (Integer/signum (compare a b)) + (Integer/signum (cmp-unsigned (sk/encode-key a) (sk/encode-key b))))) + +(deftest cross-type-numeric-order + (testing "longs and doubles interleave by numeric value, not by type tag" + (doseq [[a b] [[1 1.5] [1.5 2] [1 2.0] [2.0 3] + [-1 -0.5] [-1.5 -1] [-1.0 0] [0 0.5] + ;; magnitudes that exercise the gap between the two old tags + [3 3.5] [-3.5 -3] + ;; a long below and a double above zero and vice-versa + [-2 1.0] [-1.5 2]]] + (is (neg? (cmp-unsigned (sk/encode-key a) (sk/encode-key b))) + (str a " < " b)) + (is (pos? (cmp-unsigned (sk/encode-key b) (sk/encode-key a))) + (str b " > " a))))) + +(deftest prop-cross-type-numeric-order + (testing "for 4000 random long/double pairs of distinct value, byte order + agrees in sign with clojure.core/compare (the numeric order)" + (let [r (java.util.Random. 77) + rand-num (fn [] (if (.nextBoolean r) + (long (.nextInt r 2000000)) ;; small long + (* (.nextDouble r) 1.0e6 + (if (.nextBoolean r) 1.0 -1.0))))] ;; small double + (is (every? + (fn [_] + (let [a (rand-num) b (rand-num) + c (compare a b)] + (or (zero? c) ;; numerically equal (e.g. 1 vs 1.0): order is unspecified + (= (Integer/signum c) + (Integer/signum (cmp-unsigned (sk/encode-key a) + (sk/encode-key b))))))) + (range 4000)))))) + +(deftest prop-long-order + (testing "for 2000 random long pairs, byte order == numeric order" + (let [r (java.util.Random. 42)] + (is (every? (fn [_] (order-agrees? (.nextLong r) (.nextLong r))) + (range 2000)))))) + +(deftest prop-double-order + (testing "for 2000 random finite double pairs, byte order == numeric order" + (let [r (java.util.Random. 43) + rand-d (fn [] (* (.nextDouble r) + (Math/pow 10 (- (.nextInt r 40) 20)) + (if (.nextBoolean r) 1.0 -1.0)))] + (is (every? (fn [_] (order-agrees? (rand-d) (rand-d))) + (range 2000)))))) + +(deftest prop-instant-order + (testing "for 2000 random instant pairs, byte order == chronological order" + (let [r (java.util.Random. 44) + rand-i (fn [] (Instant/ofEpochSecond + (- (.nextLong (java.util.Random. (.nextLong r)) + 4000000000) 2000000000) + (.nextInt r 1000000000)))] + (is (every? (fn [_] (order-agrees? (rand-i) (rand-i))) + (range 2000)))))) + +(deftest prop-roundtrip + (testing "random keys of every type round-trip exactly" + (let [r (java.util.Random. 45)] + (is (every? + (fn [_] + (let [k (case (.nextInt r 5) + 0 (.nextLong r) + 1 (* (.nextDouble r) (if (.nextBoolean r) 1e9 -1e-9)) + 2 (Instant/ofEpochSecond (.nextInt r 2000000000) + (.nextInt r 1000000000)) + 3 (Date. (long (.nextInt r 2000000000))) + 4 (str "k" (.nextInt r 100000)))] + (= k (sk/decode-key (sk/encode-key k))))) + (range 2000)))))) + +(deftest cross-type-never-throws + (testing "encoding any supported type and comparing across types never throws" + (let [vals [0 -1 Long/MAX_VALUE 3.14 -2.0 "abc" :kw + (Instant/ofEpochSecond 5) (Date. 1000)] + encoded (map sk/encode-key vals)] + (doseq [a encoded b encoded] + (is (integer? (cmp-unsigned a b))))))) + +(deftest unsupported-key-throws + (is (thrown? IllegalArgumentException (sk/encode-key nil))) + (is (thrown? IllegalArgumentException (sk/encode-key true))) + (is (thrown? IllegalArgumentException (sk/encode-key Double/NaN)))) + +(deftest out-of-range-integer-key-throws-clearly + (testing "an integer key past the 64-bit long range is rejected eagerly with a + clear, key-specific message rather than a raw numeric-cast error" + (doseq [big [(.pow (java.math.BigInteger. "2") 100) + (.negate (.pow (java.math.BigInteger. "2") 100)) + (bigint "99999999999999999999999")]] + (let [ex (is (thrown? IllegalArgumentException (sk/encode-key big)))] + (is (re-find #"(?i)key" (.getMessage ex)) + "message should mention it is about a key") + (is (re-find #"(?i)long" (.getMessage ex)) + "message should mention the long range"))))) + +(deftest instant-roundtrip + (testing "instants round-trip to Instant, preserving sub-second precision" + (doseq [i [(Instant/ofEpochSecond 0) + (Instant/ofEpochSecond 1719100000 123456789) + (Instant/ofEpochSecond -100 500) + Instant/EPOCH]] + (is (= i (sk/decode-key (sk/encode-key i))) + (str "roundtrip " i)) + (is (instance? Instant (sk/decode-key (sk/encode-key i))))))) + +(deftest instant-order-preserved + (testing "byte order matches chronological order, incl. negative epoch & nanos" + (doseq [[a b] [[(Instant/ofEpochSecond 0) (Instant/ofEpochSecond 1)] + [(Instant/ofEpochSecond -5) (Instant/ofEpochSecond 0)] + [(Instant/ofEpochSecond 10 100) (Instant/ofEpochSecond 10 200)] + [(Instant/ofEpochSecond 10 999999999) (Instant/ofEpochSecond 11 0)]]] + (is (neg? (cmp-unsigned (sk/encode-key a) (sk/encode-key b))) + (str a " < " b))))) diff --git a/test/xitdb/sorted_map_test.clj b/test/xitdb/sorted_map_test.clj new file mode 100644 index 0000000..77a0fa7 --- /dev/null +++ b/test/xitdb/sorted_map_test.clj @@ -0,0 +1,350 @@ +(ns xitdb.sorted-map-test + (:require + [clojure.test :refer :all] + [xitdb.db :as xdb] + [xitdb.test-utils :as tu :refer [with-db]]) + (:import + [java.time Instant] + [java.util Date])) + +(deftest lookups-and-count + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "a" 1 "b" 2 "c" 3)) + (let [m @db] + (testing "get / invoke / find / contains?" + (is (= 1 (get m "a"))) + (is (= 2 (m "b"))) + (is (= ::nf (get m "z" ::nf))) + (is (true? (contains? m "c"))) + (is (false? (contains? m "z"))) + (is (= (clojure.lang.MapEntry. "a" 1) (find m "a"))) + (is (nil? (find m "z")))) + (testing "count is correct" + (is (= 3 (count m))))))) + +(deftest mutation-keeps-order + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "b" 2 "d" 4)) + (testing "assoc inserts in order" + (swap! db assoc "c" 3) + (swap! db assoc "a" 1) + (is (= ["a" "b" "c" "d"] (map key (seq @db))))) + (testing "dissoc removes and preserves order" + (swap! db dissoc "b") + (is (= ["a" "c" "d"] (map key (seq @db))))) + (testing "re-assoc replaces value without changing count" + (swap! db assoc "c" 30) + (is (= 3 (count @db))) + (is (= 30 (get @db "c")))))) + +(deftest keyword-keys + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map :banana 2 :apple 1 :cherry 3)) + (testing "keyword keys round-trip as keywords, in sorted order" + (is (= [:apple :banana :cherry] (map key (seq @db)))) + (is (every? keyword? (map key (seq @db)))) + (is (= 1 (get @db :apple)))))) + +(deftest namespaced-keyword-keys-match-clojure-order + (with-open [db (xdb/xit-db :memory)] + (let [oracle (sorted-map :b 2 :a/a 3 :a 1 :aa 4)] + (reset! db oracle) + (testing "namespaced keywords sort like Clojure's default comparator + (non-namespaced before namespaced), not as flattened strings" + (is (= (keys oracle) (map key (seq @db)))) + (is (= [:a :aa :b :a/a] (map key (seq @db))))) + (testing "subseq agrees with the Clojure oracle" + (is (= (vec (subseq oracle >= :aa)) + (vec (subseq @db >= :aa))))) + (testing "values round-trip under namespaced keys" + (is (= 3 (get @db :a/a))))))) + +(deftest heterogeneous-keys-materialize-and-print + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map)) + (swap! db assoc 1 :one) + (swap! db assoc "x" :ex) + (testing "seq works with mixed key types" + (is (= [1 "x"] (map key (seq @db))))) + (testing "materialize does not throw on mixed key types" + (let [m (tu/materialize @db)] + (is (sorted? m)) + (is (= [1 "x"] (keys m))) + (is (= {1 :one "x" :ex} (into {} m))))) + (testing "pr-str does not throw on mixed key types" + (is (string? (pr-str @db)))))) + +(deftest materialized-sorted-map-can-be-written-back + (testing "a materialized sorted map (which carries key-comparator) can be + stored into another db without being rejected as a custom comparator" + (with-open [db1 (xdb/xit-db :memory) + db2 (xdb/xit-db :memory)] + (reset! db1 (sorted-map "b" 2 "a" 1)) + (let [m (tu/materialize @db1)] + (reset! db2 m) + (is (= ["a" "b"] (map key (seq @db2)))) + (is (= 1 (get @db2 "a")))))) + (testing "round-trips through materialize even with heterogeneous keys" + (with-open [db1 (xdb/xit-db :memory) + db2 (xdb/xit-db :memory)] + (reset! db1 (sorted-map)) + (swap! db1 assoc 1 :one) + (swap! db1 assoc "x" :ex) + (let [m (tu/materialize @db1)] + (reset! db2 m) + (is (= [1 "x"] (map key (seq @db2)))))))) + +(deftest materialize-returns-plain-sorted-map + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "b" 2 "a" 1)) + (let [m (tu/materialize @db)] + (is (sorted? m)) + (is (not (instance? xitdb.sorted_map.XITDBSortedMap m))) + (is (= ["a" "b"] (keys m))) + (is (= {"a" 1 "b" 2} m))))) + +(deftest read-only-ops-return-plain-collections + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "a" 1 "b" 2)) + (let [m @db] + (testing "assoc outside a transaction returns a plain sorted map" + (let [r (assoc m "c" 3)] + (is (not (instance? xitdb.sorted_map.XITDBSortedMap r))) + (is (sorted? r)) + (is (= ["a" "b" "c"] (keys r))))) + (testing "dissoc outside a transaction returns a plain sorted map" + (let [r (dissoc m "a")] + (is (not (instance? xitdb.sorted_map.XITDBSortedMap r))) + (is (sorted? r)) + (is (= ["b"] (keys r)))))))) + +(deftest custom-comparator-rejected + (with-open [db (xdb/xit-db :memory)] + (is (thrown? IllegalArgumentException + (reset! db (sorted-map-by > 1 :a 2 :b)))))) + +(deftest nesting-and-complex-values + (testing "sorted map nests inside a hash map value" + (with-open [db (xdb/xit-db :memory)] + (reset! db {:idx (sorted-map "b" 2 "a" 1)}) + (is (instance? xitdb.sorted_map.XITDBSortedMap (:idx @db))) + (is (= ["a" "b"] (map key (seq (:idx @db))))))) + (testing "nested sorted map round-trips against an in-memory atom" + (with-db [db (tu/test-db)] + (reset! db {:idx (sorted-map "b" 2 "a" 1)}) + (is (tu/db-equal-to-atom? db)))) + (testing "sorted map values may be vectors, maps and sets" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "vec" [1 2 3] + "map" {:x 1} + "set" #{:a :b})) + (is (= [1 2 3] (tu/materialize (get @db "vec")))) + (is (= {:x 1} (tu/materialize (get @db "map")))) + (is (= #{:a :b} (tu/materialize (get @db "set")))))) + (testing "a sorted map nested directly inside a vector stays sorted" + (with-open [db (xdb/xit-db :memory)] + (reset! db [(sorted-map 3 :c 1 :a 2 :b)]) + (let [m (first (seq @db))] + (is (instance? xitdb.sorted_map.XITDBSortedMap m)) + (is (sorted? m)) + (is (= [1 2 3] (map key (seq m))))))) + (testing "a sorted map nested inside a list stays sorted" + (with-open [db (xdb/xit-db :memory)] + (reset! db (list (sorted-map 3 :c 1 :a 2 :b))) + (let [m (first (seq @db))] + (is (instance? xitdb.sorted_map.XITDBSortedMap m)) + (is (= [1 2 3] (map key (seq m)))))))) + +(deftest empty-clears-map + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "a" 1 "b" 2)) + (swap! db empty) + (is (= 0 (count @db))) + (is (empty? (seq @db))) + (swap! db assoc "c" 3) + (is (= ["c"] (map key (seq @db)))))) + +(deftest empty-then-reassoc-stays-a-sorted-map + (testing "after (swap! db empty) the value is still a sorted map, so keys + re-inserted afterwards keep sorted (not hash-map) semantics" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "a" 1 "b" 2)) + (swap! db empty) + (is (instance? xitdb.sorted_map.XITDBSortedMap @db)) + (is (sorted? @db)) + (is (= 0 (count @db))) + (swap! db assoc "c" 3 "a" 1) + (is (instance? xitdb.sorted_map.XITDBSortedMap @db)) + (is (sorted? @db)) + (is (= ["a" "c"] (map key (seq @db))))))) + +(deftest print-method-ordered + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "b" 2 "a" 1)) + (let [s (pr-str @db)] + (is (clojure.string/starts-with? s "#XITDBSortedMap")) + (is (clojure.string/includes? s "\"a\" 1, \"b\" 2"))))) + +(deftest sorted-predicate-and-comparator + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map 3 :c 1 :a 2 :b)) + (testing "sorted? is true for a persisted sorted map" + (is (sorted? @db))) + (testing "comparator is consistent with iteration order" + (let [^java.util.Comparator c (.comparator ^clojure.lang.Sorted @db)] + (is (neg? (.compare c 1 2))) + (is (pos? (.compare c 2 1))) + (is (zero? (.compare c 2 2))) + ;; cross-type bound checks must agree with the engine (not core/compare) + (is (neg? (.compare c 5 "x"))))))) + +(deftest nth-indexed + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-map) (map vector (shuffle (range 20)) (range 20)))] + (reset! db oracle) + (let [m @db + ov (vec oracle)] + (testing "nth by positive index matches the oracle's entry at that rank" + (doseq [i (range 20)] + (is (= (nth ov i) (nth m i)) (str "nth " i)))) + (testing "negative index counts from the end (-1 = last)" + (is (= (last ov) (nth m -1))) + (is (= (nth ov 18) (nth m -2)))) + (testing "out-of-range nth/2 returns not-found" + (is (= ::nf (nth m 100 ::nf))) + (is (= ::nf (nth m -100 ::nf)))) + (testing "out-of-range nth/1 throws like a vector" + (is (thrown? IndexOutOfBoundsException (nth m 100)))))))) + +(deftest subseq-matches-oracle + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-map) (map vector (shuffle (range 0 40 2)) (range)))] + (reset! db oracle) + (let [m @db] + (doseq [k [10 11 0 38 39 -1 50]] + (testing (str "single-bound subseq at " k) + (is (= (subseq oracle >= k) (subseq m >= k)) (str ">= " k)) + (is (= (subseq oracle > k) (subseq m > k)) (str "> " k)) + (is (= (subseq oracle <= k) (subseq m <= k)) (str "<= " k)) + (is (= (subseq oracle < k) (subseq m < k)) (str "< " k)))) + (testing "two-bound subseq" + (is (= (subseq oracle >= 10 <= 30) (subseq m >= 10 <= 30))) + (is (= (subseq oracle > 10 < 30) (subseq m > 10 < 30))) + (is (= (subseq oracle >= 11 <= 29) (subseq m >= 11 <= 29)))))))) + +(deftest rseq-and-rsubseq-match-oracle + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-map) (map vector (shuffle (range 0 40 2)) (range)))] + (reset! db oracle) + (let [m @db] + (testing "rseq is the full descending sequence" + (is (= (rseq oracle) (rseq m)))) + (doseq [k [10 11 0 38 39 -1 50]] + (testing (str "single-bound rsubseq at " k) + (is (= (rsubseq oracle >= k) (rsubseq m >= k)) (str ">= " k)) + (is (= (rsubseq oracle > k) (rsubseq m > k)) (str "> " k)) + (is (= (rsubseq oracle <= k) (rsubseq m <= k)) (str "<= " k)) + (is (= (rsubseq oracle < k) (rsubseq m < k)) (str "< " k)))) + (testing "two-bound rsubseq" + (is (= (rsubseq oracle >= 10 <= 30) (rsubseq m >= 10 <= 30))) + (is (= (rsubseq oracle > 10 < 30) (rsubseq m > 10 < 30))) + (is (= (rsubseq oracle >= 11 <= 29) (rsubseq m >= 11 <= 29)))))))) + +(deftest empty-sorted-map-range-queries + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map)) + (let [m @db] + (testing "range queries on an empty (none-cursor) sorted map yield nothing" + (is (nil? (seq m))) + (is (nil? (rseq m))) + (is (empty? (subseq m >= 5))) + (is (empty? (subseq m < 5))) + (is (empty? (rsubseq m >= 5))) + (is (empty? (rsubseq m <= 5))) + (is (= ::nf (nth m 0 ::nf))))))) + +(deftest numeric-keys-iterate-numerically + (testing "long keys iterate in numeric, not lexical, order" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map 9 :a 10 :b 1 :c)) + (is (= [1 9 10] (map key (seq @db)))) + (is (= [:c :a :b] (map val (seq @db)))))) + (testing "negative and positive longs sort together, incl. extremes" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-map) + (map vector [3 -5 0 Long/MIN_VALUE Long/MAX_VALUE] + (range)))) + (is (= [Long/MIN_VALUE -5 0 3 Long/MAX_VALUE] (map key (seq @db)))))) + (testing "double keys sort numerically, incl. negatives and zero" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map 3.5 :a -1.5 :b 0.0 :c 1.0e308 :d -1.0e308 :e)) + (is (= [-1.0e308 -1.5 0.0 3.5 1.0e308] (map key (seq @db))))))) + +(deftest mixed-long-double-keys-interleave-numerically + (testing "long and double keys sort by numeric value and round-trip with type" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-map) + (map vector [1 0.5 2 1.5 3 -1.5] (range)))) + (is (= [-1.5 0.5 1 1.5 2 3] (map key (seq @db)))) + (is (some #(instance? Double %) (map key (seq @db)))) + (is (some integer? (map key (seq @db)))))) + (testing "matches the in-memory Clojure sorted-map oracle for mixed keys" + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-map) + (map vector [10 2.5 7 0.25 -3 -3.5 100.0 4] (range)))] + (reset! db oracle) + (is (= (keys oracle) (map key (seq @db))))))) + (testing "a double bound queries a long-keyed map at the right place (subseq)" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-map) (map vector [1 2 3 4 5] (range)))) + (let [m @db] + (is (= [2 3 4 5] (map key (subseq m >= 1.5)))) + (is (= [1 2 3] (map key (subseq m <= 3.5)))) + (is (= [3 4] (map key (subseq m > 2.5 < 4.5)))))))) + +(deftest temporal-keys-iterate-chronologically + (testing "Instant keys iterate chronologically and round-trip to Instant" + (with-open [db (xdb/xit-db :memory)] + (let [t0 (Instant/ofEpochSecond 100) + t1 (Instant/ofEpochSecond 200 500) + t2 (Instant/ofEpochSecond 200 999)] + (reset! db (sorted-map t2 :c t0 :a t1 :b)) + (is (= [t0 t1 t2] (map key (seq @db)))) + (is (every? #(instance? Instant %) (map key (seq @db))))))) + (testing "Date keys iterate chronologically and round-trip to Date" + (with-open [db (xdb/xit-db :memory)] + (let [d0 (Date. 0) d1 (Date. 1000) d2 (Date. 2000)] + (reset! db (sorted-map d2 :c d0 :a d1 :b)) + (is (= [d0 d1 d2] (map key (seq @db)))) + (is (every? #(instance? Date %) (map key (seq @db)))))))) + +(deftest write-view-supports-sorted-indexed-reversible + (testing "the writeable sorted map handed to swap! supports nth/subseq/rseq + and exposes the same comparator as the read view" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-map) (map vector (range 5) (range 5)))) + (swap! db + (fn [m] + (is (= (clojure.lang.MapEntry. 0 0) (nth m 0))) + (is (= 2 (key (nth m 2)))) + (is (= ::nf (nth m 99 ::nf))) + (is (= [2 3 4] (map key (subseq m >= 2)))) + (is (= [0 1] (map key (subseq m < 2)))) + (is (= [4 3 2 1 0] (map key (rseq m)))) + (is (instance? java.util.Comparator + (.comparator ^clojure.lang.Sorted m))) + m)) + (testing "the data is unchanged after read-only queries in the txn" + (is (= [0 1 2 3 4] (map key (seq @db)))))))) + +(deftest tracer-bullet-ordered-seq + (testing "a persisted sorted-map is stored as a sorted map and seqs in key order" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-map "b" 2 "a" 1)) + (is (instance? xitdb.sorted_map.XITDBSortedMap @db)) + (is (= [["a" 1] ["b" 2]] (map (juxt key val) (seq @db)))))) + (testing "ordering holds for many keys regardless of insertion order" + (with-open [db (xdb/xit-db :memory)] + (let [ks (map #(format "k%04d" %) (shuffle (range 50)))] + (reset! db (into (sorted-map) (map vector ks (range)))) + (is (= (sort ks) (map key (seq @db)))))))) diff --git a/test/xitdb/sorted_pagination_test.clj b/test/xitdb/sorted_pagination_test.clj new file mode 100644 index 0000000..b9d7ee9 --- /dev/null +++ b/test/xitdb/sorted_pagination_test.clj @@ -0,0 +1,136 @@ +(ns xitdb.sorted-pagination-test + (:require + [clojure.test :refer :all] + [xitdb.db :as xdb] + [xitdb.sorted :as xsorted] + [xitdb.test-utils :as tu :refer [with-db]]) + (:import + [java.time Instant])) + +(deftest rank-on-sorted-map + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-map) (map vector (range 0 40 2) (range)))] + (reset! db oracle) + (let [m @db] + (testing "rank of a present key is its index" + (doseq [[i k] (map-indexed vector (keys oracle))] + (is (= i (xsorted/rank m k)) (str "rank of present " k)))) + (testing "rank of an absent key is its would-be insertion index" + (is (= 0 (xsorted/rank m -1))) + (is (= 1 (xsorted/rank m 1))) + (is (= 20 (xsorted/rank m 100)))))))) + +(deftest rank-on-sorted-set + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-set) (range 0 40 2))] + (reset! db oracle) + (let [s @db] + (testing "rank of a present member is its index" + (doseq [[i k] (map-indexed vector (seq oracle))] + (is (= i (xsorted/rank s k)) (str "rank of present " k)))) + (testing "rank of an absent member is its would-be insertion index" + (is (= 0 (xsorted/rank s -1))) + (is (= 1 (xsorted/rank s 1))) + (is (= 20 (xsorted/rank s 100)))))))) + +(deftest rank-and-nth-are-inverses + (testing "on a sorted map: (= i (rank m (key (nth m i))))" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-map) (map vector (shuffle (range 30)) (range)))) + (let [m @db] + (doseq [i (range (count m))] + (is (= i (xsorted/rank m (key (nth m i)))) (str "i=" i)))))) + (testing "on a sorted set: (= i (rank s (nth s i)))" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-set) (shuffle (range 30)))) + (let [s @db] + (doseq [i (range (count s))] + (is (= i (xsorted/rank s (nth s i))) (str "i=" i))))))) + +(deftest pagination-on-map + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-map) (map vector (range 0 40 2) (range)))] + (reset! db oracle) + (let [m @db + ov (vec oracle)] + (testing "page returns the correct ordered window" + (is (= (subvec ov 5 10) (xsorted/page m 5 5))) + (is (= (take 3 ov) (xsorted/page m 0 3)))) + (testing "page stops cleanly at the end of the collection" + (is (= (subvec ov 18 20) (xsorted/page m 18 5))) + (is (= 2 (count (xsorted/page m 18 100))))) + (testing "from-index streams from a rank to the end" + (is (= (subvec ov 17 20) (xsorted/from-index m 17)))))))) + +(deftest pagination-on-set + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-set) (range 0 40 2))] + (reset! db oracle) + (let [s @db + ov (vec oracle)] + (testing "page returns the correct ordered window of members" + (is (= (subvec ov 5 10) (xsorted/page s 5 5))) + (is (= (take 3 ov) (xsorted/page s 0 3)))) + (testing "page stops cleanly at the end of the collection" + (is (= (subvec ov 18 20) (xsorted/page s 18 5))) + (is (= 2 (count (xsorted/page s 18 100))))) + (testing "from-index streams from a rank to the end" + (is (= (subvec ov 17 20) (xsorted/from-index s 17)))))))) + +(deftest negative-offset-rejected + (testing "from-index/page reject a negative rank eagerly, not on realisation" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-map) (map vector (range 10) (range)))) + (let [m @db] + (is (thrown? IllegalArgumentException (xsorted/from-index m -1))) + (is (thrown? IllegalArgumentException (xsorted/page m -1 5))))) + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-set) (range 10))) + (let [s @db] + (is (thrown? IllegalArgumentException (xsorted/from-index s -1))) + (is (thrown? IllegalArgumentException (xsorted/page s -1 5))))))) + +(deftest pagination-is-lazy + (testing "from-index returns a lazy seq and does not realise a large collection" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-map) (map vector (range 2000) (range)))) + (let [m @db + p (xsorted/page m 0 5)] + (is (instance? clojure.lang.LazySeq (xsorted/from-index m 0))) + (is (= 5 (count p))) + (is (= (map vector (range 5) (range 5)) + (map (juxt key val) p)))))) + (testing "from-index on a set is lazy over a large collection" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-set) (range 2000))) + (let [s @db] + (is (instance? clojure.lang.LazySeq (xsorted/from-index s 0))) + (is (= (range 0 5) (xsorted/page s 0 5))))))) + +(deftest doc-example-timestamp-id-secondary-index + (testing "build a timestamp -> id secondary index and page through it" + (with-open [db (xdb/xit-db :memory)] + ;; Events arrive out of order; index them by their (unique) timestamp. + (let [base (Instant/parse "2024-01-01T00:00:00Z") + events (for [i (shuffle (range 100))] + {:id i :ts (.plusSeconds base i)})] + (reset! db (sorted-map)) + (doseq [e events] + (swap! db assoc (:ts e) (:id e))) + + (testing "rank gives the chronological position of a timestamp" + (is (= 0 (xsorted/rank @db base))) + (is (= 50 (xsorted/rank @db (.plusSeconds base 50))))) + + (testing "page serves a chronological window of [ts id] pairs" + (let [pg (xsorted/page @db 10 5)] + (is (= [(.plusSeconds base 10) + (.plusSeconds base 11) + (.plusSeconds base 12) + (.plusSeconds base 13) + (.plusSeconds base 14)] + (map key pg))) + (is (= [10 11 12 13 14] (map val pg))))) + + (testing "paging to the end stops cleanly" + (is (= 3 (count (xsorted/page @db 97 10))))))))) diff --git a/test/xitdb/sorted_set_test.clj b/test/xitdb/sorted_set_test.clj new file mode 100644 index 0000000..a904a14 --- /dev/null +++ b/test/xitdb/sorted_set_test.clj @@ -0,0 +1,272 @@ +(ns xitdb.sorted-set-test + (:require + [clojure.test :refer :all] + [xitdb.db :as xdb] + [xitdb.test-utils :as tu :refer [with-db]]) + (:import + [java.time Instant] + [java.util Date])) + +(deftest tracer-bullet-ordered-seq + (testing "a persisted sorted-set is stored as a sorted set and seqs in order" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 3 1 2)) + (is (instance? xitdb.sorted_set.XITDBSortedSet @db)) + (is (= [1 2 3] (seq @db)))))) + +(deftest membership-and-count + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 1 2 3)) + (let [s @db] + (testing "contains? / get / invoke" + (is (true? (contains? s 2))) + (is (false? (contains? s 9))) + (is (= 2 (get s 2))) + (is (nil? (get s 9))) + (is (= 3 (s 3))) + (is (nil? (s 9)))) + (testing "count is correct and O(1)" + (is (= 3 (count s))))))) + +(deftest mutation-keeps-order + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 3 1)) + (testing "conj inserts in order" + (swap! db conj 5) + (swap! db conj 2) + (is (= [1 2 3 5] (seq @db)))) + (testing "disj removes and preserves order" + (swap! db disj 3) + (is (= [1 2 5] (seq @db)))) + (testing "conj of a duplicate is a no-op and does not change count" + (swap! db conj 2) + (is (= 3 (count @db))) + (is (= [1 2 5] (seq @db)))))) + +(deftest materialize-returns-plain-sorted-set + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 3 1 2)) + (let [s (tu/materialize @db)] + (is (sorted? s)) + (is (not (instance? xitdb.sorted_set.XITDBSortedSet s))) + (is (= [1 2 3] (seq s))) + (is (= #{1 2 3} s))))) + +(deftest heterogeneous-members-materialize-and-print + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set)) + (swap! db conj 1) + (swap! db conj "x") + (testing "seq works with mixed member types" + (is (= [1 "x"] (seq @db)))) + (testing "materialize does not throw on mixed member types" + (let [s (tu/materialize @db)] + (is (sorted? s)) + (is (= [1 "x"] (seq s))))) + (testing "pr-str does not throw on mixed member types" + (is (string? (pr-str @db)))))) + +(deftest materialized-sorted-set-can-be-written-back + (testing "a materialized sorted set (which carries key-comparator) can be + stored into another db without being rejected as a custom comparator" + (with-open [db1 (xdb/xit-db :memory) + db2 (xdb/xit-db :memory)] + (reset! db1 (sorted-set 3 1 2)) + (let [s (tu/materialize @db1)] + (reset! db2 s) + (is (= [1 2 3] (seq @db2))))))) + +(deftest read-only-ops-return-plain-collections + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 1 2)) + (let [s @db] + (testing "conj outside a transaction returns a plain sorted set" + (let [r (conj s 3)] + (is (not (instance? xitdb.sorted_set.XITDBSortedSet r))) + (is (sorted? r)) + (is (= [1 2 3] (seq r))))) + (testing "disj outside a transaction returns a plain sorted set" + (let [r (disj s 1)] + (is (not (instance? xitdb.sorted_set.XITDBSortedSet r))) + (is (sorted? r)) + (is (= [2] (seq r)))))))) + +(deftest sorted-predicate-and-comparator + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 3 1 2)) + (testing "sorted? is true for a persisted sorted set" + (is (sorted? @db))) + (testing "comparator is consistent with iteration order" + (let [^java.util.Comparator c (.comparator ^clojure.lang.Sorted @db)] + (is (neg? (.compare c 1 2))) + (is (pos? (.compare c 2 1))) + (is (zero? (.compare c 2 2))) + ;; cross-type bound checks must agree with the engine (not core/compare) + (is (neg? (.compare c 5 "x"))))))) + +(deftest nth-indexed + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-set) (shuffle (range 20)))] + (reset! db oracle) + (let [s @db + ov (vec oracle)] + (testing "nth by positive index matches the oracle's member at that rank" + (doseq [i (range 20)] + (is (= (nth ov i) (nth s i)) (str "nth " i)))) + (testing "negative index counts from the end (-1 = last)" + (is (= (last ov) (nth s -1))) + (is (= (nth ov 18) (nth s -2)))) + (testing "out-of-range nth/2 returns not-found" + (is (= ::nf (nth s 100 ::nf))) + (is (= ::nf (nth s -100 ::nf)))) + (testing "out-of-range nth/1 throws like a vector" + (is (thrown? IndexOutOfBoundsException (nth s 100)))))))) + +(deftest subseq-matches-oracle + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-set) (shuffle (range 0 40 2)))] + (reset! db oracle) + (let [s @db] + (doseq [k [10 11 0 38 39 -1 50]] + (testing (str "single-bound subseq at " k) + (is (= (subseq oracle >= k) (subseq s >= k)) (str ">= " k)) + (is (= (subseq oracle > k) (subseq s > k)) (str "> " k)) + (is (= (subseq oracle <= k) (subseq s <= k)) (str "<= " k)) + (is (= (subseq oracle < k) (subseq s < k)) (str "< " k)))) + (testing "two-bound subseq" + (is (= (subseq oracle >= 10 <= 30) (subseq s >= 10 <= 30))) + (is (= (subseq oracle > 10 < 30) (subseq s > 10 < 30))) + (is (= (subseq oracle >= 11 <= 29) (subseq s >= 11 <= 29)))))))) + +(deftest rseq-and-rsubseq-match-oracle + (with-open [db (xdb/xit-db :memory)] + (let [oracle (into (sorted-set) (shuffle (range 0 40 2)))] + (reset! db oracle) + (let [s @db] + (testing "rseq is the full descending sequence" + (is (= (rseq oracle) (rseq s)))) + (doseq [k [10 11 0 38 39 -1 50]] + (testing (str "single-bound rsubseq at " k) + (is (= (rsubseq oracle >= k) (rsubseq s >= k)) (str ">= " k)) + (is (= (rsubseq oracle > k) (rsubseq s > k)) (str "> " k)) + (is (= (rsubseq oracle <= k) (rsubseq s <= k)) (str "<= " k)) + (is (= (rsubseq oracle < k) (rsubseq s < k)) (str "< " k)))) + (testing "two-bound rsubseq" + (is (= (rsubseq oracle >= 10 <= 30) (rsubseq s >= 10 <= 30))) + (is (= (rsubseq oracle > 10 < 30) (rsubseq s > 10 < 30))) + (is (= (rsubseq oracle >= 11 <= 29) (rsubseq s >= 11 <= 29)))))))) + +(deftest empty-sorted-set-range-queries + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set)) + (let [s @db] + (testing "range queries on an empty (none-cursor) sorted set yield nothing" + (is (nil? (seq s))) + (is (nil? (rseq s))) + (is (empty? (subseq s >= 5))) + (is (empty? (subseq s < 5))) + (is (empty? (rsubseq s >= 5))) + (is (empty? (rsubseq s <= 5))) + (is (= ::nf (nth s 0 ::nf))))))) + +(deftest member-types-iterate-in-natural-order + (testing "string members iterate lexicographically" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set "banana" "apple" "cherry")) + (is (= ["apple" "banana" "cherry"] (seq @db))) + (is (every? string? (seq @db))))) + (testing "keyword members iterate in natural order and round-trip" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set :c :a :b)) + (is (= [:a :b :c] (seq @db))) + (is (every? keyword? (seq @db))))) + (testing "long members iterate numerically, incl. extremes" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-set) [3 -5 0 Long/MIN_VALUE Long/MAX_VALUE])) + (is (= [Long/MIN_VALUE -5 0 3 Long/MAX_VALUE] (seq @db))))) + (testing "double members iterate numerically, incl. negatives and zero" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 3.5 -1.5 0.0 1.0e308 -1.0e308)) + (is (= [-1.0e308 -1.5 0.0 3.5 1.0e308] (seq @db))))) + (testing "Instant members iterate chronologically and round-trip to Instant" + (with-open [db (xdb/xit-db :memory)] + (let [t0 (Instant/ofEpochSecond 100) + t1 (Instant/ofEpochSecond 200 500) + t2 (Instant/ofEpochSecond 200 999)] + (reset! db (sorted-set t2 t0 t1)) + (is (= [t0 t1 t2] (seq @db))) + (is (every? #(instance? Instant %) (seq @db)))))) + (testing "Date members iterate chronologically and round-trip to Date" + (with-open [db (xdb/xit-db :memory)] + (let [d0 (Date. 0) d1 (Date. 1000) d2 (Date. 2000)] + (reset! db (sorted-set d2 d0 d1)) + (is (= [d0 d1 d2] (seq @db))) + (is (every? #(instance? Date %) (seq @db))))))) + +(deftest custom-comparator-rejected + (with-open [db (xdb/xit-db :memory)] + (is (thrown? IllegalArgumentException + (reset! db (sorted-set-by > 1 2 3)))))) + +(deftest print-method-ordered + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 3 1 2)) + (let [s (pr-str @db)] + (is (clojure.string/starts-with? s "#XITDBSortedSet")) + (is (clojure.string/includes? s "1 2 3"))))) + +(deftest write-view-supports-sorted-indexed-reversible + (testing "the writeable sorted set handed to swap! supports nth/subseq/rseq + and exposes the same comparator as the read view" + (with-open [db (xdb/xit-db :memory)] + (reset! db (into (sorted-set) (range 5))) + (swap! db + (fn [s] + (is (= 0 (nth s 0))) + (is (= 2 (nth s 2))) + (is (= ::nf (nth s 99 ::nf))) + (is (= [2 3 4] (subseq s >= 2))) + (is (= [0 1] (subseq s < 2))) + (is (= [4 3 2 1 0] (rseq s))) + (is (instance? java.util.Comparator + (.comparator ^clojure.lang.Sorted s))) + s)) + (testing "the data is unchanged after read-only queries in the txn" + (is (= [0 1 2 3 4] (seq @db))))))) + +(deftest nesting-and-round-trip + (testing "sorted set nests inside a hash map value" + (with-open [db (xdb/xit-db :memory)] + (reset! db {:idx (sorted-set 3 1 2)}) + (is (instance? xitdb.sorted_set.XITDBSortedSet (:idx @db))) + (is (= [1 2 3] (seq (:idx @db)))))) + (testing "nested sorted set round-trips against an in-memory atom" + (with-db [db (tu/test-db)] + (reset! db {:idx (sorted-set 3 1 2)}) + (is (tu/db-equal-to-atom? db)))) + (testing "a sorted set nested directly inside a vector stays sorted" + (with-open [db (xdb/xit-db :memory)] + (reset! db [(sorted-set 30 10 20)]) + (let [s (first (seq @db))] + (is (instance? xitdb.sorted_set.XITDBSortedSet s)) + (is (sorted? s)) + (is (= [10 20 30] (seq s)))))) + (testing "empty clears the set in place" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 1 2 3)) + (swap! db empty) + (is (= 0 (count @db))) + (is (empty? (seq @db))) + (swap! db conj 7) + (is (= [7] (seq @db))))) + (testing "after empty the value is still a sorted set, so re-inserted members + keep sorted (not hash-set) semantics" + (with-open [db (xdb/xit-db :memory)] + (reset! db (sorted-set 1 2 3)) + (swap! db empty) + (is (instance? xitdb.sorted_set.XITDBSortedSet @db)) + (is (sorted? @db)) + (swap! db conj 5 1 3) + (is (instance? xitdb.sorted_set.XITDBSortedSet @db)) + (is (sorted? @db)) + (is (= [1 3 5] (seq @db))))))