Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 63 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,64 @@ Here's a taste of how your queries could look like:

```

## Sorted collections

In addition to (unordered) hash maps and sets, xitdb supports **on-disk sorted
maps and sets**, backed by the engine's rank-augmented B-tree. Store a Clojure
`sorted-map` / `sorted-set` and it is persisted as a sorted collection that keeps
its keys/members ordered on disk:

```clojure
(reset! db (sorted-map "banana" 2 "apple" 1 "cherry" 3))

@db
;; => #XITDBSortedMap{"apple" 1, "banana" 2, "cherry" 3}

(swap! db assoc "date" 4) ;; inserted in order, not appended
```

Reading back yields an `XITDBSortedMap` / `XITDBSortedSet` that implements
Clojure's ordered interfaces, so `seq`, `rseq`, `nth`, `subseq` and `rsubseq`
all work and read only what they touch from disk:

```clojure
(reset! db (into (sorted-map) (map vector (range 0 100 2) (range))))

(nth @db 10) ;; => [20 10] ;; O(log n), no full scan
(subseq @db >= 90) ;; => ([90 45] [92 46] [94 47] [96 48] [98 49])
(rseq @db) ;; => lazy descending seq of entries
```

Supported key/member types are strings, keywords, longs, doubles, `Instant`
and `Date`. They are stored with an order-preserving codec, so they iterate in
natural order — numeric for numbers, chronological for temporals, lexicographic
(by code point) for strings. Longs and doubles share a single numeric ordering,
so they interleave by value (e.g. `1 < 1.5 < 2`). Only the default ordering is
supported: `sorted-map-by` / `sorted-set-by` with a custom comparator is
rejected.

### Ranking & pagination

The `xitdb.sorted` namespace exposes the B-tree's O(log n) superpowers, which
are handy for building and paging on-disk secondary indexes:

- `(rank coll k)` — number of entries strictly less than `k` (i.e. the index of
`k`, or its would-be insertion index if absent).
- `(from-index coll n)` — lazy ordered seq starting at rank `n`.
- `(page coll offset limit)` — lazy ordered page `[offset, offset+limit)`.

```clojure
(require '[xitdb.sorted :as xsorted])

;; build a timestamp -> id index; events can arrive out of order
(reset! db (sorted-map))
(doseq [e events]
(swap! db assoc (:ts e) (:id e)))

(xsorted/rank @db some-ts) ;; chronological position of some-ts
(xsorted/page @db 100 20) ;; the 20 entries at ranks [100, 120)
```

## History

Since the database is immutable, all previous values are accessed by reading
Expand Down Expand Up @@ -185,8 +243,8 @@ Note that this is not doing an expensive copy of the fruits vector. We are benef
### Architecture
`xitdb-clj` builds on [xitdb-java](https://github.com/xit-vcs/xitdb-java) which implements:

- **Hash Array Mapped Trie (HAMT)** - For efficient map and set operations
- **RRB Trees** - For vector operations with good concatenation performance
- **Hash Array Mapped Trie (HAMT)** - For hash map/set and vector
- **B-trees** - For sorted map/set and linked list
- **Structural Sharing** - Minimizes memory usage across versions
- **Copy-on-Write** - Ensures immutability while maintaining performance

Expand All @@ -199,9 +257,11 @@ The Clojure wrapper adds:
### Supported Data Types

- **Maps** - Hash maps with efficient key-value access
- **Sorted maps** - On-disk B-tree maps with ordered iteration, `subseq`/`nth`/`rank`
- **Vectors** - Array lists with indexed access
- **Sets** - Hash sets with unique element storage
- **Lists** - Linked lists and RRB tree-based linked array lists
- **Sorted sets** - On-disk B-tree sets with ordered iteration and ranking
- **Lists** - B-tree-based linked array lists
- **Primitives** - Numbers, strings, keywords, booleans, dates.

## Performance Characteristics
Expand Down
2 changes: 1 addition & 1 deletion deps.edn
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{:paths ["src" "test"]
:deps {org.clojure/clojure {:mvn/version "1.12.0"}
io.github.radarroark/xitdb {:mvn/version "0.30.0"}}
io.github.radarroark/xitdb {:mvn/version "0.32.0"}}

:aliases
{:test {:extra-deps {io.github.cognitect-labs/test-runner
Expand Down
73 changes: 73 additions & 0 deletions src/xitdb/sorted.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
(ns xitdb.sorted
"Public helpers for on-disk sorted collections (`XITDBSortedMap` /
`XITDBSortedSet`) that go beyond `clojure.core`'s in-memory sorted
collections, exposing the rank-augmented B-tree's superpowers:

- `rank` - O(log n) index of a key/member (inverse of indexed `nth`).
- `page` - lazy ordered page starting at a rank (offset/limit).
- `from-index` - lazy ordered seq starting at a rank.

These are the recommended way to build and paginate on-disk secondary indexes.
For example, a timestamp -> id index over events:

(reset! db (sorted-map))
(doseq [e events]
(swap! db assoc (:ts e) (:id e)))

;; serve the page [offset, offset+limit) in chronological order,
;; reading only that page from disk:
(xsorted/page @db offset limit)

;; or, starting from a known timestamp boundary:
(->> (subseq @db >= start-ts)
(take limit))

Both `rank` and the pagination helpers work on `XITDBSortedMap` (yielding
`MapEntry` pairs) and `XITDBSortedSet` (yielding members)."
(:require
[xitdb.common :as common]
[xitdb.util.sorted-operations :as sorted-ops])
(:import
[io.github.radarroark.xitdb ReadSortedMap ReadSortedSet]))

(defn rank
"Number of entries in the sorted collection `coll` strictly less than `k`,
in O(log n). For a present key/member this is its index (the inverse of
`nth`); for an absent one it is the would-be insertion index. Works on both
`XITDBSortedMap` and `XITDBSortedSet`."
[coll k]
(let [u (common/-unwrap coll)]
(cond
(instance? ReadSortedMap u) (sorted-ops/smap-rank u k)
(instance? ReadSortedSet u) (sorted-ops/sset-rank u k)
:else (throw (IllegalArgumentException.
(str "rank requires an XITDBSortedMap or XITDBSortedSet, got: "
(type coll)))))))

(defn from-index
"Lazy ordered seq of the sorted collection `coll` starting at rank `n`
(0-based), backed by the engine's `iteratorFromIndex` (O(log n) seek, then a
streaming walk). Does not materialise the whole collection. For a sorted map
the elements are `MapEntry` pairs; for a sorted set they are members. Returns
nil when `n` is at or past the end."
[coll n]
(when (neg? n)
(throw (IllegalArgumentException.
(str "from-index requires a non-negative rank, got: " n))))
(let [u (common/-unwrap coll)]
(cond
(instance? ReadSortedMap u)
(sorted-ops/smap-seq-from-index u common/-read-from-cursor n)
(instance? ReadSortedSet u)
(sorted-ops/sset-seq-from-index u n)
:else (throw (IllegalArgumentException.
(str "from-index requires an XITDBSortedMap or XITDBSortedSet, got: "
(type coll)))))))

(defn page
"Lazy ordered page of `coll`: at most `limit` elements starting at rank
`offset`. Equivalent to `(take limit (from-index coll offset))`, and stops
cleanly at the end of the collection. Lazy and low-memory. For a sorted map
the elements are `MapEntry` pairs; for a sorted set they are members."
[coll offset limit]
(take limit (from-index coll offset)))
Loading
Loading