ORACLE

ORACLE is research software for oral and craniofacial single-cell analysis. The current release provides an oral-adapted scGPT query workflow for generating cell embeddings, predicting hierarchical cell type labels, and estimating candidate unknown-state scores.

The ORACLE source code is distributed through this repository. The cap5000 ORACLE model artifacts required for query embedding and annotation are distributed separately through the ORACLE GitHub Release assets. Large training atlases, raw sequencing data, processed private datasets, and manuscript analysis outputs are not included.

Status: pre-release research software. ORACLE is not a clinical or diagnostic tool, and outputs should be interpreted with appropriate biological validation.

Features

Query embedding using the oral-adapted scGPT encoder distributed as a release asset.
Reference-based level 1 to level 4 cell type annotation.
Candidate unknown-state scoring for cells that may not match the current reference label space.
tumor and non_tumor label policies for controlling the prediction label space.
Output embedding key: X_oracle.

Installation

Use Python 3.10-3.13. The full query workflow requires the scientific Python and scGPT runtime stack.

Clone the repository and install the package:

git clone https://github.com/Teichlab/ORACLE.git
cd oracle
python -m pip install -e .

Download the cap5000 model artifacts from the ORACLE GitHub Release:

oracle download-model

By default this installs the model into:

~/.cache/oracle/models/cap5000

If you manually download the release asset, extract it to a local directory and either set:

export ORACLE_CAP5000_DIR=/path/to/cap5000

or pass --model-dir /path/to/cap5000/encoder --artifact /path/to/cap5000/classifier to oracle run-query.

Development test dependencies can be installed with:

python -m pip install -e ".[dev]"
pytest

Quick Start

import oracle

print(oracle.__version__)

Inspect the installed model:

oracle --version
oracle info

Run ORACLE on a query AnnData file:

oracle run-query \
  --h5ad query.h5ad \
  --outdir oracle_query_output \
  --label-policy tumor

Use --label-policy non_tumor to mask tumor-source epithelial labels during prediction while retaining immune, stromal, endothelial, neural, mural, and muscle labels. Both policies use the same query embedding workflow and produce the same X_oracle embedding for a given input.

Typical outputs include:

query_oracle_embedded.h5ad: query object with adata.obsm["X_oracle"].
query_oracle_annotated.h5ad: query object with ORACLE embedding, predicted labels, and unknown scores.
query_oracle_embedded.stats.json: lightweight embedding summary.

Predicted annotation columns follow the current ORACLE schema:

oral_scgpt_pred_level1
oral_scgpt_pred_level2
oral_scgpt_pred_level3
oral_scgpt_pred_level4
oral_scgpt_unknown_score_level1
oral_scgpt_unknown_score_level2
oral_scgpt_unknown_score_level3
oral_scgpt_unknown_score_level4

Repository Contents

oracle/: importable Python package.
oracle/resources/: lightweight package resource namespace. Large cap5000 model artifacts are distributed through GitHub Release assets rather than committed directly to the repository.
tests/: minimal smoke tests that do not require private datasets.

Data and Model Resource Policy

The repository intentionally excludes raw data, private h5ad files, large reference atlases, manuscript analysis folders, generated figures, and large model files. The cap5000 model checkpoint and classifier artifacts are distributed as versioned GitHub Release assets because they are required for ORACLE query embedding and annotation.

Before reusing ORACLE outputs in a publication or downstream biological analysis, users should confirm that the input data are processed appropriately and that predicted rare or unknown states are supported by independent marker, sample, or experimental evidence.

License

ORACLE is distributed under the Apache License 2.0. Unless otherwise noted, this license applies to the source code and ORACLE cap5000 model artifacts distributed with the corresponding GitHub Release.

Citation

Citation information will be added when the associated manuscript or formal software release is available.

Contact

Weimin Lin

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
oracle		oracle
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ORACLE

Features

Installation

Quick Start

Repository Contents

Data and Model Resource Policy

License

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ORACLE

Features

Installation

Quick Start

Repository Contents

Data and Model Resource Policy

License

Citation

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages