Skip to content

[FEATURE] Provide a possibility to store and load an empty graph#685

Open
reta wants to merge 6 commits into
datastax:mainfrom
reta:issue-684
Open

[FEATURE] Provide a possibility to store and load an empty graph#685
reta wants to merge 6 commits into
datastax:mainfrom
reta:issue-684

Conversation

@reta

@reta reta commented Jun 19, 2026

Copy link
Copy Markdown

Provide a possibility to store and load an empty graph, see please #684 for motivation and background.

Closes #684

@reta reta force-pushed the issue-684 branch 3 times, most recently from c9c2ef2 to a2b357f Compare June 19, 2026 17:58
Signed-off-by: Andriy Redko <drreta@gmail.com>
@reta reta marked this pull request as ready for review June 19, 2026 18:43

@ashkrisk ashkrisk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta thanks for this PR. I've left a few comments. Additionally, I tried running a test by saving and loading an empty heap graph to an OnDiskGraphIndex, but this failed an assertion while loading the graph. The test in this PR only covers OnHeapGraphIndex::save and OnHeapGraphIndex::load.

I'll try and take a look at this later. In the meantime, if you manage to figure out the issue, please do update the PR.

Failure details

The error:

java.lang.AssertionError: Node ID -1419795886 out of bounds for layer 1

The sample program is below, please run with assertions enabled.

import java.io.IOException;
import java.nio.file.Path;
import java.util.List;
import java.util.Map;

import io.github.jbellis.jvector.disk.ReaderSupplierFactory;

import io.github.jbellis.jvector.graph.GraphIndexBuilder;
import io.github.jbellis.jvector.graph.ListRandomAccessVectorValues;
import io.github.jbellis.jvector.graph.disk.GraphIndexWriter;
import io.github.jbellis.jvector.graph.disk.GraphIndexWriterTypes;
import io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex;
import io.github.jbellis.jvector.graph.disk.feature.FeatureId;
import io.github.jbellis.jvector.graph.disk.feature.InlineVectors;
import io.github.jbellis.jvector.graph.similarity.BuildScoreProvider;
import io.github.jbellis.jvector.vector.VectorSimilarityFunction;

public class Test {
    public static void main(String[] forwardArgs) throws IOException {
        
        var dim = 1024;
        var vsf = VectorSimilarityFunction.DOT_PRODUCT;

        var M = 32;
        var ef = 100;
        var nOv = 1.2f;
        var alpha = 1.2f;
        var hier = true;

        var ravv = new ListRandomAccessVectorValues(List.of(), dim);
        var bsp = BuildScoreProvider.randomAccessScoreProvider(ravv, vsf);
        var graphPath = Path.of("./local/tmp.jvgraph");

        try (
            var builder = new GraphIndexBuilder(bsp, dim, M, ef, nOv, alpha, hier);
            var graph = builder.build(ravv);
            var writer = GraphIndexWriter.getBuilderFor(GraphIndexWriterTypes.RANDOM_ACCESS_PARALLEL, graph, graphPath)
                .with(new InlineVectors(dim))
                .build();
        ) {
            writer.write(Map.of(FeatureId.INLINE_VECTORS, i -> new InlineVectors.State(ravv.getVector(i))));
            System.out.println("OHG max level = " + graph.getMaxLevel());
        }

        try (
            var readerSupplier = ReaderSupplierFactory.open(graphPath);
            var graph = OnDiskGraphIndex.load(readerSupplier);
        ) {
            var count = 0;
            var niter = graph.getNodes(0);
            while (niter.hasNext()) {
                count++;
                System.out.println(">>> " + niter.next());
            }
            System.out.println(String.format("Counted %d nodes", count));
            graph.getDegree(0);
        }
    }
}

reta added 2 commits June 22, 2026 08:24
Signed-off-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Andriy Redko <drreta@gmail.com>
@reta

reta commented Jun 23, 2026

Copy link
Copy Markdown
Author

I'll try and take a look at this later. In the meantime, if you manage to figure out the issue, please do update the PR.

Thanks @ashkrisk , sorry forgot to update on this one,

Jun 23, 2026 8:53:10 A.M. io.github.jbellis.jvector.vector.VectorizationProvider lookup
WARNING: Java vector incubator module is not readable. For optimal vector performance, pass '--add-modules jdk.incubator.vector' to enable Vector API.
Jun 23, 2026 8:53:16 A.M. io.github.jbellis.jvector.util.PhysicalCoreExecutor resolvePhysicalCoreCount
INFO: PhysicalCoreExecutor: using 10 worker threads, determined by OS topology probe on "Mac OS X" (detected 10 physical cores, clamped to availableProcessors=10).
Jun 23, 2026 8:53:16 A.M. io.github.jbellis.jvector.util.PhysicalCoreExecutor resolvePhysicalCoreCount
WARNING: PhysicalCoreExecutor: previous auto-sized worker threads on this node would have been 5, but aligning to cores now sets them to 10. Set the jvector.physical_core_count system property to override.
08:53:16.350 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Writing common header at position 0
08:53:16.351 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Writing 1 layers
08:53:16.351 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Common header finished writing at position 288
08:53:16.361 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Writing common header at position 296
08:53:16.361 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Writing 1 layers
08:53:16.361 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Common header finished writing at position 584
08:53:16.361 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Writing common header at position 0
08:53:16.361 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Writing 1 layers
08:53:16.362 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Common header finished writing at position 288
OHG max level = 0
....

08:53:16.386 [main] DEBUG io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex - Loading OnDiskGraphIndex from offset=0
08:53:16.386 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Loading common header at position 0
08:53:16.388 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - 1 layers
08:53:16.388 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Common header finished reading at position 288
08:53:16.388 [main] DEBUG io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex - Header loaded: version=6, dimension=1024, entryNode=-1, layerInfoCount=1
08:53:16.388 [main] DEBUG io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex - Position after reading header=296
08:53:16.389 [main] DEBUG io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex - Version 5+ onwards uses a footer instead of header for metadata. Loading from footer
08:53:16.389 [main] DEBUG io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex - Loading OnDiskGraphIndex footer from offset=600
08:53:16.389 [main] DEBUG io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex - Loading header offset=592
08:53:16.389 [main] DEBUG io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex - Loading OnDiskGraphIndex header from offset=296
08:53:16.389 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Loading common header at position 296
08:53:16.389 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - 1 layers
08:53:16.389 [main] DEBUG io.github.jbellis.jvector.graph.disk.CommonHeader - Common header finished reading at position 584
08:53:16.389 [main] DEBUG io.github.jbellis.jvector.graph.disk.OnDiskGraphIndex - Header loaded: version=6, dimension=1024, entryNode=-1, layerInfoCount=1, Position after reading header=592
Counted 0 nodes

I run it against latest changes with assertions enabled - no issues, please let me know if I am missing something, thanks!

@tlwillke tlwillke added the enhancement New feature or request label Jun 24, 2026

@ashkrisk ashkrisk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta the new test for for the empty OnDiskGraphIndex looks good. I think the earlier failure I was seeing had to do with improper file cleanup on my system, I sorted that out and it runs fine now.

My remaining comments are mostly nitpicks. There are a lot of lines consisting of multiple spaces followed by a newline (Bob has a tendency to preserve indentation for empty lines), it would be better if you can remove those as well.

Comment thread jvector-tests/src/test/java/io/github/jbellis/jvector/TestUtil.java
Comment thread jvector-tests/src/test/java/io/github/jbellis/jvector/TestUtil.java Outdated

@Override
public int maxDegree() {
return 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's return a valid maxDegree instead of defaulting to zero. Similarly for maxDegrees, etc.

reta added 2 commits June 25, 2026 09:52
Signed-off-by: Andriy Redko <drreta@gmail.com>
Signed-off-by: Andriy Redko <drreta@gmail.com>
@reta

reta commented Jun 25, 2026

Copy link
Copy Markdown
Author

My remaining comments are mostly nitpicks. There are a lot of lines consisting of multiple spaces followed by a newline (Bob has a tendency to preserve indentation for empty lines), it would be better if you can remove those as well.

Thanks @ashkrisk , really appreciate it, I think we should be all set now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Provide a possibility to store and load an empty graph

3 participants