Skip to content

Ingest pipeline upgrade#537

Merged
akhileshh merged 2 commits into
mainfrom
ingest-pipeline-upgrade
Jun 11, 2026
Merged

Ingest pipeline upgrade#537
akhileshh merged 2 commits into
mainfrom
ingest-pipeline-upgrade

Conversation

@akhileshh

@akhileshh akhileshh commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Add pychunkedgraph.pipeline — cross-branch chunk-batch core (ingest + meshing)

Adds a self-contained pychunkedgraph/pipeline/ package that runs chunk-grid workloads (ingest, meshing) as stock Kubernetes Indexed Jobs — no Redis/RQ, no scheduler. This is the worker side of the new GKE Autopilot orchestration in CAVEpipelines; each workload is a module entrypoint the pipeline image runs.

Layout

pychunkedgraph/pipeline/
  grid.py          # fixed-seed scatter (Feistel permutation): batch index -> scattered coords
  lock.py          # per-chunk Bigtable claim/done CAS (token-fenced; one writer per chunk)
  exit_codes.py    # SUCCESS / TRANSIENT / FATAL — mapped to the Job podFailurePolicy
  worker.py        # generic harness: run(make_processor) -> exit code
  ingest/          # dispatch (branch shim) + setup (table+meta) + worker (lock+heartbeat)
  meshing/         # meta (MeshConfig) + setup (mesh-metadata) + worker (marching cubes / stitching)

Entrypoints (run in the container)

python -m pychunkedgraph.pipeline.ingest              # ingest worker
python -m pychunkedgraph.pipeline.ingest.setup <id>   # create table + graph meta
python -m pychunkedgraph.pipeline.meshing             # mesh worker (L2 cubes / L>2 stitching)
python -m pychunkedgraph.pipeline.meshing.setup <id>  # one-shot mesh metadata

Workers read the Indexed-Job env contract (JOB_COMPLETION_INDEX, PCG_GRAPH_ID/LAYER/PERM_SEED/BATCH_SIZE), map the index to a scattered batch of chunk coords, and process each — ingest under a per-chunk lock, meshing idempotently (no lock). Graph state is read from Bigtable by graph id (no yaml/Redis at run time).

Design notes

  • Self-contained: depends on the rest of the repo only for the actual chunk-compute functions (add_atomic_edges/add_layer, get_atomic_chunk_data/get_active_edges, meshgen.*) and core graph.* classes. Config and setup live in the package (minimal IngestConfig, inlined meta build — no IngestConfig/bootstrap from ingest/).
  • Branch-portable: the only branch-specific file is pipeline/ingest/dispatch.py (the chunk-body shim). On pcgv3 it needs a variant using that branch's chunk bodies (called out in the file docstring + pipeline/README.md); everything else cherry-picks unchanged.
  • Additive: no existing modules touched; pipeline is not auto-imported by the top-level package, so existing imports/tests are unaffected.

Tests

tests/test_pipeline_grid.py — pure unit tests for the permutation (bijection, invertibility, scatter, batch partition). Lock/worker are validated against the Bigtable emulator outside the committed suite, keeping the package importable on branches without that fixture.

Follow-up

Consumed by the CAVEpipelines PR (the pipeline CLI builds Indexed Jobs that run these entrypoints).

@akhileshh akhileshh force-pushed the ingest-pipeline-upgrade branch 2 times, most recently from f51d81e to a440800 Compare June 9, 2026 22:37
Add pychunkedgraph/pipeline/: a workload-agnostic core (grid scatter, per-chunk
Bigtable lock, exit-code contract, worker harness) shared by ingest and meshing
subpackages. Ingest builds L2/parent chunks under a per-chunk lock; meshing runs
marching cubes / sharded stitching, idempotent, plus one-shot mesh-metadata setup.
Self-contained except the chunk-compute functions; dispatch.py is the only
branch-specific shim. Entrypoints: python -m pychunkedgraph.pipeline.{ingest,meshing}[.setup].
@akhileshh akhileshh force-pushed the ingest-pipeline-upgrade branch from a440800 to 6d64961 Compare June 10, 2026 23:26
@akhileshh akhileshh merged commit a5cff4c into main Jun 11, 2026
1 check passed
@akhileshh akhileshh deleted the ingest-pipeline-upgrade branch June 11, 2026 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant