Skip to content

build(web_api): slim CPU/GPU extras + image size reduction#1263

Open
magic-vladyslav wants to merge 6 commits into
mainfrom
fix/build
Open

build(web_api): slim CPU/GPU extras + image size reduction#1263
magic-vladyslav wants to merge 6 commits into
mainfrom
fix/build

Conversation

@magic-vladyslav

@magic-vladyslav magic-vladyslav commented May 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Carves the web_api deploy images away from the monolithic modules extra and onto purpose-built, slim web_api (CPU) / web_api-gpu extras that contain only what web_api/app and internal.alignment actually import. Also makes the CPU image ship CPU-only torch (no CUDA wheels) and the GPU image drop the unused TensorRT/CUDA-13 stack, and restores the newest torch for everyone else.

Net effect: dramatically smaller, faster web_api images, and a web_api dev on Apple Silicon no longer has to resolve the other team's tensorrt/training deps.

What web_api actually imports → covering extra

Verified by tracing every import in web_api/app/*.py and recursively through internal.alignment (sift → misalignment_detector → manual_correspondence → field → online_finetuner). Nothing reachable from web_api imports mazepa, mazepa_addons, training, lightning, wandb, torchmetrics, meshing/skeletonization/chunkedgraph/montaging/calcada, or TensorRT.

web_api import external pkg covering extra
task_management.* (tasks.py) task_management (→ databackends, sql, tenacity, pcg_skel→caveclient)
db_annotations.* (annotations/collections/layers/...) databackends+tenacity (via task_management)
layer.volumetric + .cloudvol + .annotation (painting/precomputed) einops cloudvol + tensorstore (volumetric __init__ loads .tensorstore) + tensor_ops
internal.alignment.sift scipy, cv2, numpy, torch tensor_ops; scipy declared explicitly
internal.alignment.misalignment_detector einops, torch convnet
internal.alignment.manual_correspondence/field/online_finetuner torch, einops, torchfields tensor_ops
segmentation.py cutie, hydra-core, omegaconf, torch, google.cloud.storage new cutie sub-extra; hydra/omegaconf explicit; gcs via base cloud-files
main.py google-auth google-cloud-iap (web stack)

Hidden / transitive gaps closed

  • scipy (internal/alignment/sift.py, alignment.py) — only transitive via scikit-image → declared explicitly.
  • hydra-core / omegaconf (segmentation.py) — only transitive via the git cutiedeclared explicitly.
  • caveclient / tenacity / google-cloud-storage — verified not gaps (caveclient via pcg_skel, tenacity via task_management, gcs via cloud-files).
  • cchardet — pruned (doesn't build on 3.12+); cutie needs it → CPU image keeps the faust-cchardet/stub shim.

New / changed extras (pyproject.toml)

  • New cutie sub-extra; segmentation now references zetta_utils[cutie] (modules/segmentation semantics unchanged for the other team).
  • New web-api-base (shared deps) + web-api / web-api-gpu leaf extras.
  • CPU-only torch for web-api: a pytorch-cpu index + [tool.uv.sources] bind torch to it for the web-api extra, with [tool.uv] conflicts between web-api/web-api-gpu. requirements.web_api.txt resolves torch==…+cpu with 0 nvidia-* CUDA wheels. uv-only — plain pip install '.[web-api-gpu]' ignores it, so the GPU image keeps its base cu121 torch.
  • web-api-gpu omits the gpu/tensorrt extra — web_api only calls convnet.load_model(tensorrt_enabled=False), so TensorRT is never imported; layering a CUDA-13 runtime on the CUDA-12.1 base was pure bloat + a version mismatch.
  • torch floors: restored >= 2.11 on training/alignment/montaging; kept >= 2.5 on the web_api-path extras (convnet, tensor-typing, web-api) so the cu121 base's torch 2.5.1 is honored. --resolution highest still pins 2.12 everywhere else, so non-GPU consumers get the newest torch.

Removed from the web_api images

training (lightning/wandb/torchmetrics), mazepa_addons (kubernetes/awscli/mitmproxy/gcloud SDKs), meshing, skeletonization, montaging, chunkedgraph, calcada, and (CPU) the native abiss/waterz/lsds builds + their numpy==1.26.4/cython/nanobind machinery — plus the full nvidia-* CUDA stack (CPU) and tensorrt-cu13 + CUDA-13 libs (GPU).

Image size wins

  • CPU image: CPU-only torch, no CUDA libs / no triton → ~4–5 GB lighter.
  • GPU image: no CUDA-13 TensorRT stack → ~3–5 GB lighter, and the CUDA-12.1/13 mismatch is gone.

Dockerfiles

  • web_api/Dockerfile (CPU): install requirements.web_api.txt --no-deps with PIP_EXTRA_INDEX_URL=…/whl/cpu; keep the cchardet shim; drop libboost/unixodbc + standalone cutie + abiss/waterz/lsd machinery; replace RUN zetta --help (pulls kubernetes) with a python -c "import app.main" smoke test.
  • web_api/gpu.Dockerfile: pip install '.[web-api-gpu]' on the pytorch/pytorch:2.5.1-cuda12.1-cudnn9-runtime base, keeping PIP_EXTRA_INDEX_URL=…/cu121; drop numpy/cython/lsd/nanobind.
  • Deleted web_api/requirements.txt — single source of truth is the extra.

Scripts & CI

  • update_pinned_requirements.sh: exports requirements.web_api.txt / requirements.web_api_gpu.txt (lock/prune/fork-strategy untouched).
  • install_zutils.py: --mode gains web_api / web_api_gpu.
  • build_web_api.py: builds the CPU & GPU variants in parallel by default (--no-parallel to opt out) with per-variant prefixed output.
  • .github/workflows/testing.yaml: new web-api-extras-build job (py 3.11/3.12/3.14: clean CPU install with the cpu torch index → smoke-import app.main → assert lightning/wandb/torchmetrics/mitmproxy/awscli/kubernetes/tensorrt absent) and web-api-gpu-build job (full GPU docker build). Both added to all-checks-test.

Verification

  • web_api.app.main imports on CPU (darwin) including internal.alignment + cutie + hydra/omegaconf/scipy; no hardcoded .cuda().
  • Pinned files confirmed: web_apitorch==…+cpu, 0 CUDA libs, no heavy pkgs; web_api_gpu → CUDA torch, no tensorrt; modules/all → CUDA torch + tensorrt unchanged (other team unaffected).
  • web_api-gpu resolves cleanly with torch==2.5.1; modules now correctly rejects torch==2.5.1 (requires >=2.11).
  • update_pinned_requirements.sh runs on Apple Silicon without choking on tensorrt (static-metadata workaround).

Note for the other team

pyproject.toml is shared, but modules/training/segmentation/all semantics are preserved: modules still resolves with tensorrt + CUDA torch, segmentation still includes cutie (via the cutie sub-extra). No undeclared internal dependency was added — scipy/hydra-core/omegaconf were already resolved transitively and are now declared explicitly in the web_api extra only.

🤖 Generated with Claude Code

@magic-vladyslav magic-vladyslav marked this pull request as ready for review May 28, 2026 16:18
@codecov

codecov Bot commented May 28, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (72cd1e9) to head (06ea78c).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main     #1263   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          211       211           
  Lines        11292     11314   +22     
=========================================
+ Hits         11292     11314   +22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dodamih

dodamih commented May 29, 2026

Copy link
Copy Markdown
Collaborator

This would break tensorrt which requires CUDA 13. What needs CUDA 12.1?

@nkemnitz

Copy link
Copy Markdown
Collaborator

I had to modify the dependency installation partially due to cutie (but not only). pip respects the requires_python settings that maintainers specify in their packages, but their upper bounds are often too limiting (often they just haven't been tested, but work fine on newer Python, and/or the package simply has been abandonded - like cutie).
For update_pinned_requirements.sh and install_zutils.sh I am now using uv, which apparently ignores the requires_python upper bounds by design. That should also resolve the need to rebuild the super-outdated cchardet: https://github.com/ZettaAI/zetta_utils/blob/main/install_zutils.py#L779-L780

Also please make sure to rerun update_pinned_requirements.sh whenever you really need to update pyproject.toml dependencies.

@nkemnitz

Copy link
Copy Markdown
Collaborator

This would break tensorrt which requires CUDA 13. What needs CUDA 12.1?

I think it's because Cloud Run driver's for L4 GPU can't be updated: https://github.com/ZettaAI/zetta_utils/blob/main/web_api/gpu.Dockerfile#L3-L6

magic-vladyslav and others added 5 commits May 29, 2026 17:34
Replace the monolithic `modules` extra in both web_api images with new
`web_api` (CPU) and `web_api-gpu` extras that pull only what web_api/app and
internal.alignment actually import, dropping training/mazepa_addons/meshing/
skeletonization/chunkedgraph/segmentation-native and tensorrt from the
deploy images.

- pyproject: add `web-api-base` + `web-api`/`web-api-gpu` leaf extras and a
  `cutie` sub-extra (still pulled by `segmentation`). Restore torch>=2.11 on
  the non-web extras (training/alignment/montaging) while the web_api path
  stays torch>=2.5 so the cu121 GPU base image keeps torch 2.5.1.
- web_api resolves CPU-only torch via the pytorch-cpu index ([tool.uv.sources]
  + conflicts), so installing without a GPU pulls no nvidia-* CUDA wheels.
- web_api-gpu omits the gpu/tensorrt extra: web_api only calls
  convnet.load_model with tensorrt_enabled=False, so the CUDA-13 stack is dead
  weight on the CUDA-12.1 base.
- Dockerfiles install the slim extras (CPU: pinned --no-deps + cpu torch index,
  cchardet shim retained for cutie; GPU: resolution on the cu121 base). Drop the
  abiss/waterz/lsd build machinery and the `zetta --help` check (the CLI pulls
  kubernetes); smoke-import app.main instead.
- update_pinned_requirements.sh exports requirements.web_api{,_gpu}.txt;
  install_zutils gains web_api/web_api_gpu modes; web_api/requirements.txt is
  removed (single source of truth is the extra).
- CI: add web-api-extras-build (clean CPU install + smoke import + heavy-package
  absence assert) and web-api-gpu-build (full GPU docker build) jobs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Run the two image variants concurrently by default (--no-parallel to opt
out, and automatic when only one variant is selected), streaming
per-variant prefixed, line-buffered output so the interleaved logs stay
readable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- web-api-extras-build: set UV_INDEX_STRATEGY=unsafe-best-match so uv finds
  exact pins on PyPI even though the pytorch CPU index mirrors some packages
  (e.g. certifi) at older versions; default first-index strategy stopped there.
- gpu.Dockerfile: restore the cchardet stub + faust-cchardet shim before the
  resolution install, since cutie's cchardet>=2.1.7 does not build on the base
  image's Python.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The bash assert step reported all packages absent but still exited 1
under bash -l {0} (login shell + conda). Replace it with an
importlib.metadata check + sys.exit so the result is deterministic and
self-documenting.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@magic-vladyslav magic-vladyslav changed the title fix(web_api): revert pytorch to 2.5 for cuda 12.1 compatibility build(web_api): slim CPU/GPU extras + image size reduction May 30, 2026

@nkemnitz nkemnitz left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments, but a major one regarding the web_api/gpu build:

The R535 limitation on Cloud Run does not prevent you from using newer Pytorch / CUDA driver.

  • Updating to Pytorch 2.12 + CUDA 12.6 in the requirements should be straightforward, thanks to minor-version compatibility. That already resolves the issue with Pytorch 2.5 <-> 2.12
  • CUDA 13.0 is also possible with apt install cuda-compat-13-0, and prepending LD_LIBRARY_PATH with /usr/local/cuda-13.0/compat, which should contain the libcuda.so.
  • I would drop the pytorch/pytorch image in either case. and rely on the pinned requirements files to install the version zutils actually prefers. Otherwise we need to keep paying close attention to the requirements and update the base image version to stay in sync with the resolved dependencies

Comment thread pyproject.toml
Comment on lines +356 to +359
# backend; pip still pulls sub-package deps from the index at install time.
[[tool.uv.dependency-metadata]]
name = "tensorrt-cu13"
requires-dist = []

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The install scripts explicitly use --no-deps, so this change drops tensorrt libs and bindings from the pinned requirements and break the main image. Try preserving the dependencies for the metapackage, that might still bypass the macOS issue.

Suggested change
# backend; pip still pulls sub-package deps from the index at install time.
[[tool.uv.dependency-metadata]]
name = "tensorrt-cu13"
requires-dist = []
# backend
[[tool.uv.dependency-metadata]]
name = "tensorrt-cu13"
requires-dist = ["tensorrt-cu13-libs", "tensorrt-cu13-bindings"]

Comment thread web_api/gpu.Dockerfile
Comment on lines +29 to +37
# cutie declares cchardet>=2.1.7, which does not build on the base image's
# Python. Install an empty stub to satisfy the requirement (so the resolution
# install below does not try to build the real one) plus faust-cchardet to
# provide the actual top-level `cchardet` module.
RUN mkdir -p /tmp/cc_stub \
&& printf 'from setuptools import setup\nsetup(name="cchardet", version="2.1.7", py_modules=[])\n' > /tmp/cc_stub/setup.py \
&& pip install --no-deps /tmp/cc_stub \
&& rm -rf /tmp/cc_stub
RUN --mount=type=cache,target=/root/.cache/pip pip install faust-cchardet

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cchardet is declared by cutie, but never used. You neither need cchardet nor faust-cchardet.

Comment thread web_api/Dockerfile
Comment on lines 23 to 32
@@ -33,7 +32,8 @@ RUN --mount=type=cache,target=/root/.cache/pip \
pip install faust-cchardet

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cchardet is declared by cutie, but never used. You neither need cchardet nor faust-cchardet.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing consumes this file. In gpu.Dockerfile you are relying on pip+pyproject.toml to resolve dependencies. Also note that this requirements file here pins torch==2.12 and CUDA 13 libraries, which you wanted to avoid in the web_api/gpu.Dockerfile.

Comment thread web_api/Dockerfile

RUN --mount=type=cache,target=/root/.cache/pip \
pip install "cutie @ git+https://github.com/hkchengrex/Cutie.git"
pip install --no-deps -r requirements.web_api.txt \

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to uv pip here (like the base image does via install_zutils.sh) may help with consistency/reproducibility. It's also faster than pip

@supersergiy

Copy link
Copy Markdown
Member

Could you please address Nicos input?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants