Ingest pipeline upgrade#537
Merged
Merged
Conversation
f51d81e to
a440800
Compare
Add pychunkedgraph/pipeline/: a workload-agnostic core (grid scatter, per-chunk
Bigtable lock, exit-code contract, worker harness) shared by ingest and meshing
subpackages. Ingest builds L2/parent chunks under a per-chunk lock; meshing runs
marching cubes / sharded stitching, idempotent, plus one-shot mesh-metadata setup.
Self-contained except the chunk-compute functions; dispatch.py is the only
branch-specific shim. Entrypoints: python -m pychunkedgraph.pipeline.{ingest,meshing}[.setup].
a440800 to
6d64961
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add
pychunkedgraph.pipeline— cross-branch chunk-batch core (ingest + meshing)Adds a self-contained
pychunkedgraph/pipeline/package that runs chunk-grid workloads (ingest, meshing) as stock Kubernetes Indexed Jobs — no Redis/RQ, no scheduler. This is the worker side of the new GKE Autopilot orchestration in CAVEpipelines; each workload is a module entrypoint the pipeline image runs.Layout
Entrypoints (run in the container)
Workers read the Indexed-Job env contract (
JOB_COMPLETION_INDEX,PCG_GRAPH_ID/LAYER/PERM_SEED/BATCH_SIZE), map the index to a scattered batch of chunk coords, and process each — ingest under a per-chunk lock, meshing idempotently (no lock). Graph state is read from Bigtable by graph id (no yaml/Redis at run time).Design notes
add_atomic_edges/add_layer,get_atomic_chunk_data/get_active_edges,meshgen.*) and coregraph.*classes. Config and setup live in the package (minimalIngestConfig, inlined meta build — noIngestConfig/bootstrapfromingest/).pipeline/ingest/dispatch.py(the chunk-body shim). Onpcgv3it needs a variant using that branch's chunk bodies (called out in the file docstring +pipeline/README.md); everything else cherry-picks unchanged.pipelineis not auto-imported by the top-level package, so existing imports/tests are unaffected.Tests
tests/test_pipeline_grid.py— pure unit tests for the permutation (bijection, invertibility, scatter, batch partition). Lock/worker are validated against the Bigtable emulator outside the committed suite, keeping the package importable on branches without that fixture.Follow-up
Consumed by the CAVEpipelines PR (the
pipelineCLI builds Indexed Jobs that run these entrypoints).