Skip to content

feat(manifest/bazel): sub-workspace lockfile discovery for socket manifest bazel#1336

Draft
Simon (simonhj) wants to merge 1 commit into
v1.xfrom
simon/bazel-subworkspace-discovery-v1x
Draft

feat(manifest/bazel): sub-workspace lockfile discovery for socket manifest bazel#1336
Simon (simonhj) wants to merge 1 commit into
v1.xfrom
simon/bazel-subworkspace-discovery-v1x

Conversation

@simonhj
Copy link
Copy Markdown

Summary

socket manifest bazel only walks the root MODULE.bazel / WORKSPACE. Ruleset repos with per-example sub-workspaces (rules_kotlin/examples/*, rules_js/examples/*, rules_rust, rules_python) declare additional Maven artifacts in nested MODULE.bazel projects with their own maven_install.json lockfiles, and those files were silently dropped — leaving the CLI's SBOM a strict subset of what depscan's server-side parser already returns from the same tree.

This PR adds a walker that finds every checked-in maven_install.json under the invocation cwd, parses it via the existing v2-lockfile path, and merges the artifacts into the SBOM after the bazel-query extraction step.

What changed

  • New module src/commands/manifest/bazel/bazel-lockfile-discovery.mts (196 lines) — bounded walker (prunes .git, node_modules, .socket-auto-manifest, Bazel's bazel-* convenience symlinks; caps lockfiles at 256, depth 16, per-file size 1 GiB) plus a parse-and-tag helper that defers to the existing parseUnsortedDepsJson. Synthetic sourceRepo tags use the lockfile's relative directory so two sub-workspaces pinning the same rule name don't collide downstream.
  • Wired into the orchestrator as a "Step 5b" merge in extract_bazel_to_maven.mts between the per-repo bazel query extraction and the normalizeToMavenInstallJson step. Dedup is keyed on mavenCoordinates so the root workspace's lockfile (which bazel query already extracts) does not double-count. Conflicting g:a versions across sub-workspaces continue to surface as the existing loud-failure path in normalizeToMavenInstallJson.
  • 8 new unit tests in bazel-lockfile-discovery.test.mts covering: walk pruning (.git / node_modules / .socket-auto-manifest / bazel-* symlinks at arbitrary depth), v2-lockfile parsing, sourceRepo tagging, the dedup-merge path, and a rules_kotlin-shaped strict-superset assertion.

Verification

Run against the real bazel-bench/oss/rules_kotlin tree:

Measure Before After
Sub-workspace lockfiles surfaced 0 10
Unique artifacts merged from sub-workspaces (after cross-workspace dedup) 0 393
Total artifacts emitted (with root @kotlin_rules_maven) ~70 ~463
Server-side depscan parser baseline 582 582

Closes ~80% of the gap to the server-side parser. The remaining gap is most likely classifier-jar accounting and would close with a follow-up that recursively invokes bazel query per sub-workspace.

No regression on tink-java (0 lockfiles, behavior unchanged) or protobuf (1 root lockfile that bazel query already extracts via @maven — deduped on mavenCoordinates).

Test plan

  • Run against rules_kotlin and confirm ≥393 sub-workspace artifacts merge into the SBOM (per-workspace breakdown logged with --verbose).
  • Run against tink-java (no checked-in maven_install.json); SBOM unchanged.
  • Run against protobuf (1 root maven_install.json); SBOM artifact count unchanged (dedup against bazel query).
  • Run against a monorepo whose two sub-workspaces pin the same group:artifact at conflicting versions; confirm the existing Conflicting versions for ... error fires.

…ub-workspace discovery

The existing bazel-query discovery path only inspects MODULE.bazel /
WORKSPACE at the invocation cwd. Ruleset repos with per-example
sub-workspaces (rules_kotlin/examples, rules_js/examples, rules_rust,
rules_python) declare additional Maven artifacts in nested MODULE.bazel
projects with their own maven_install.json lockfiles. Those files were
silently dropped, leaving the CLI's SBOM a strict subset of what the
server-side depscan parser already returns from the same tree.

Add a walker that finds every checked-in maven_install.json under cwd
(pruning .git, node_modules, .socket-auto-manifest, and Bazel's
bazel-* convenience symlinks into <output_base>), parses each via the
existing parseUnsortedDepsJson v2-lockfile path, and merges the
artifacts into the SBOM after the bazel-query extraction step. Merge
is keyed by mavenCoordinates so the root workspace's lockfile (which
bazel-query already extracts) does not double-count; conflicting
group:artifact versions across sub-workspaces continue to surface as
the existing loud-failure error in normalizeToMavenInstallJson.

Verified against bazel-bench/oss/rules_kotlin: walker now surfaces all
10 examples/*/maven_install.json files and merges 393 unique artifacts
into the SBOM beyond what the root @kotlin_rules_maven discovery
returns. No regression on tink-java (0 lockfiles) or protobuf (1 root
lockfile, deduped against bazel-query's @maven extraction).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant