Skip to content

vymalo/lightbridge-code-intelligence

Repository files navigation

Lightbridge Code Intelligence

License: MIT GitHub App

Lightbridge is a GitHub App for intelligent code review and repository Q&A. It listens for GitHub webhook events, records work in a Rust control plane, and runs each task in an isolated, short-lived Kubernetes Job. The work is backed by repository-aware retrieval over two complementary indexes — a Neo4j knowledge graph (structure) and pgvector (semantics) — with reasoning performed by a review agent over those retrieval tools (OpenCode today; moving to a native Rust agent loop, ADR-0026).

The Rust control plane is the trust boundary: it holds the GitHub App private key and mints short-lived, per-task installation tokens — the App key itself never reaches a Job. The agent only proposes; the control plane validates results before any write-back to GitHub.


System architecture

One control-plane binary runs in two roles (serve and dispatcher); the actual repository work runs in disposable per-task Jobs. A Job never receives the GitHub App key — it bootstraps a short-lived installation token at runtime — but it is injected with the shared runner bearer (AGENT_RUNNER_TOKEN) and the embeddings API key it needs to do its work (see Secrets a Job holds).

flowchart TD
    subgraph external["External services"]
        GH["GitHub App<br/>(webhooks + API)"]
        KC["Keycloak<br/>(OIDC IdP)"]
        EAIG["eaig / core-gateway<br/>(OpenAI-compatible embeddings)"]
    end

    WEB["Next.js web console<br/>(OIDC Auth Code + PKCE)"]

    subgraph cp["Rust control plane — trust boundary"]
        SERVE["role: serve<br/>HTTP API · mints tokens · validates JWT"]
        DISP["role: dispatcher<br/>queue consumer"]
        Q[("Postgres<br/>work queue")]
    end

    subgraph job["Kubernetes Job (ephemeral, per task)"]
        RUNNER["agent-runner<br/>clone · index · reason"]
        TS["tree-sitter chunker"]
        GFY["Graphify"]
    end

    VEC[("pgvector<br/>semantic")]
    NEO[("Neo4j<br/>structural")]
    AGENT["Review agent<br/>(OpenCode → native, ADR-0026)"]

    KC -. login .-> WEB
    WEB -->|"Bearer JWT (read)"| SERVE
    GH -->|webhook| SERVE
    SERVE -->|enqueue| Q
    Q -->|"claim (SKIP LOCKED)"| DISP
    DISP -->|"one Job per task"| RUNNER
    SERVE <-->|"context · token · status · chunks"| RUNNER
    RUNNER --> TS
    RUNNER --> GFY
    TS -->|embeddings| EAIG
    EAIG --> VEC
    GFY --> NEO
    VEC -. MCP .-> AGENT
    NEO -. MCP .-> AGENT
    AGENT -. result .-> SERVE
    SERVE -. writeback .-> GH
Loading

The full flow — clone → dual index → review → validated write-back — is implemented and deployed. The review agent runs via OpenCode today and is moving to a native Rust loop (ADR-0026). The per-task Job is also being restructured into a control sidecar + main container (ADR-0028).

See docs/architecture.md, docs/jobs-and-lifecycle.md, and docs/INDEX.md for the full picture.


Why two indexers? (tree-sitter chunker and Graphify)

This is the most common point of confusion, so it's worth being explicit: the Rust tree-sitter chunker and Graphify are not doing the same job twice. They feed two different stores that answer two different kinds of question — the dual-retrieval design (ADR-0003, ADR-0010). A good code review needs both kinds of recall, and no single store does both well.

flowchart LR
    SRC["Repo checkout<br/>(one clone, in the runner)"]

    SRC --> TS["tree-sitter chunker<br/>splits into semantic units"]
    SRC --> GFY["Graphify<br/>extracts symbols + relationships"]

    TS --> VEC[("pgvector<br/>embedding per chunk")]
    GFY --> NEO[("Neo4j<br/>typed graph")]

    VEC --> QS["Semantic question:<br/>'where is similar behaviour?'"]
    NEO --> QG["Structural question:<br/>'what calls this? PR impact?'"]
Loading
tree-sitter chunker → pgvector Graphify → Neo4j
Kind of recall Semantic (vector similarity) Structural (graph traversal)
Question it answers "where is similar code / behaviour?", natural-language search "what calls this function?", "what does this PR touch?", containment, test ownership
What it emits embedding-sized chunks with stable source ranges nodes (symbols, files) + edges (defines, calls, imports)
Why this tool purpose-built, lightweight, in-process Rust we control; chunk boundaries are a chunking concern specialised multi-modal graph extractor; relationships are a graph concern
Can the other store answer it? ❌ a graph can't rank by semantic similarity ❌ vector search can't enumerate exact callers
Status ✅ built (slice 2) ✅ built (slice 3)

Both run in the same runner Job over the same checkout — one indexes for fuzzy retrieval, the other for exact retrieval. The reasoning agent (slice 5) then queries each store via MCP for the question it's best at.


Task lifecycle

A task is created from a webhook, parked in the Postgres queue, claimed by a dispatcher under a lease, and executed in a Job. Statuses below are the ones the runner reports back to the control plane.

stateDiagram-v2
    [*] --> queued: webhook → enqueue (idempotent)
    queued --> running: dispatcher claims (lease) + Job starts

    state running {
        [*] --> cloning
        cloning --> indexing: checkout ready
        indexing --> reasoning: indexes ready (or reused, ADR-0025)
        reasoning --> [*]: review submitted
    }

    running --> posting_result: validated write-back to GitHub
    posting_result --> succeeded
    running --> succeeded: index-only task

    running --> failed: error reported
    running --> timed_out: activeDeadline exceeded
    queued --> cancelled: superseded / cancelled

    succeeded --> [*]
    failed --> [*]
    timed_out --> [*]
    cancelled --> [*]
Loading

There are two job kinds: an index task (on repo approval) and a review task (on a PR or @mention). A warm review reuses the base index (ADR-0025). The authoritative state machine, cancellation, and data-purge flows live in docs/jobs-and-lifecycle.md.


Indexing flow (sequence)

How a single task gets from a webhook to stored vectors. Note the runner never holds the GitHub App key — it borrows a short-lived installation token from the control plane just-in-time (ADR-0002, ADR-0017).

sequenceDiagram
    participant GH as GitHub
    participant CP as Control plane (serve)
    participant DB as Postgres (+pgvector)
    participant DP as Dispatcher
    participant R as Agent runner (Job)
    participant E as eaig (embeddings)

    GH->>CP: webhook (PR / command)
    CP->>DB: enqueue task (idempotent)
    DP->>DB: claim next (SKIP LOCKED) + lease
    DP->>R: launch Kubernetes Job
    R->>CP: GET /internal/tasks/{id}
    CP-->>R: context + short-lived install token
    R->>GH: clone @ head SHA (token)
    R->>R: tree-sitter chunk (Rust/TS/JS/Python)
    loop batches of 32 chunks
        R->>E: POST /v1/embeddings
        E-->>R: vectors
        R->>CP: POST /internal/tasks/{id}/chunks
        CP->>DB: upsert code_chunks (vector)
    end
    R->>CP: POST /internal/tasks/{id}/status = succeeded
    Note over R,GH: A review task also runs Graphify→Neo4j + the review agent → validated write-back
Loading

Secrets a Job holds

The trust boundary is specifically about the GitHub App private key, which never leaves the control plane — a Job mints a short-lived (~1h), installation-scoped token at runtime instead. A Job is not credential-free, though. Today the dispatcher injects into every runner pod (k8s.rs):

Secret Source Lifetime Notes
GitHub installation token minted per task by serve ~1h, auto-expires the only GitHub credential a Job sees
AGENT_RUNNER_TOKEN plaintext env in the pod spec long-lived, shared across all Jobs bearer for the internal API; a hardening target (move to a secretKeyRef, per-task scoping)
EMBEDDINGS_API_KEY secretKeyReflightbridge-agent-secrets long-lived, shared the embeddings gateway key (ADR-0018)
internal-CA cert mounted from a Secret n/a trusts the internal HTTPS embeddings gateway

So "no long-lived secrets in the Job" is not accurate — only the GitHub App key is withheld. Narrowing the shared AGENT_RUNNER_TOKEN's exposure (secret ref + per-task scoping) is tracked as a follow-up.


Monorepo layout

A pnpm + Turborepo monorepo with a Cargo workspace and an xtask for Rust automation (ADR-0009). Three language stacks live side by side: TypeScript (apps/, packages/), Rust (services/, xtask/), and Python (tools/).

Path Stack What it is
apps/web TS Next.js (App Router) web console; OIDC Auth Code + PKCE login against Keycloak (ADR-0014). README
packages/auth TS Shared OIDC/JWT helpers (token verification, claims, session cookie)
packages/tsconfig TS Shared TypeScript configs
services/control-plane Rust Axum control plane; Postgres via hand-written SQLx (cratestack deferred — ADR-0005). Runs as serve or dispatcher. README
services/agent-runner Rust Per-task Job: bootstraps from the control plane, clones, indexes (pgvector + Neo4j), and runs the review agent. README
services/config Rust Shared config loader: one JSON file + {env:VAR:-default} substitution, used by both Rust services. README
xtask Rust Cargo xtask workspace automation — the Rust side of just (cargo xtask ci|fmt|lint|test|build)
tools/dashboard-gen Python Generates the Grafana dashboards-as-code into the Helm chart. README
docs/ Documentation set, ADRs, RFCs, ways of working
deploy/ Per-environment Helm values (deploy/envs/) consumed by the ai-helm chart; image tags promoted by argocd-image-updater (GitOps). See docs/kubernetes-deployment.md.

Kubernetes layout

The control plane (serve + dispatcher) and the web console run in the platform namespace; each task runs as a Job in a dedicated agents namespace (AGENT_NAMESPACE, default lightbridge-agents); the data stores (Postgres/pgvector, Neo4j) are managed services. See docs/kubernetes-deployment.md.


Prerequisites

  • Node ≥ 22 (pinned in .nvmrc)
  • pnpm
  • Rust (stable toolchain + cargo)
  • just (task runner)
  • docker (for the local data plane via docker compose)

Optional: cargo-nextest (test runner), multipass (tentative local k3s cluster).

Quick start

Using just (the single human-facing entrypoint):

just setup   # pnpm install + cargo fetch
just up      # docker compose up -d  (Postgres+pgvector, Neo4j)
just dev     # run web + control plane via Turborepo

Raw equivalents (if you prefer not to use just):

pnpm install && cargo fetch          # just setup
docker compose up -d                 # just up
pnpm dev                             # just dev

# Run only one side / role:
cargo run -p control-plane           # serve role (default)
cargo run -p control-plane dispatcher # dispatcher role
pnpm --filter @lightbridge/web dev   # just dev-web

Quality gates

Run these locally before pushing (shift-left):

just lint   # pnpm lint + cargo clippy --all-targets -- -D warnings
just test   # pnpm test + cargo nextest run
just fmt    # biome + rustfmt

just ci runs the full local gate (lint, build, cargo xtask ci).

How we work

We follow XP + Lean + DevOps with shift-left delivery, under the ADORSYS-GIS AI Governance framework (Definition of Ready/Done, AI usage declarations). See ways of working and OKRs.

Documentation

Development status

Core shipped and running in production. The end-to-end path is live: webhooks → Postgres work queue + dispatcher → per-task Job → dual index (pgvector + Neo4j) → review agent → validated write-back, plus the admin/governance web console (repo approval gate, permission-based authz, runs + insights). Deployment is GitOps (per-env Helm values → the ai-helm chart → ArgoCD).

In flight: the native Rust review agent (replacing OpenCode, ADR-0026) and the agent-Job restructure — a control sidecar

  • main container with repo-level review config (ADR-00280031). See the Issues for the roadmap.

License

MIT — see LICENSE.

Acknowledgments

About

Revbot: Intelligent code review and repository Q&A using Graphify, Neo4j, pgvector, and OpenCode agents

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors