Drop support for SemanticDB#891
Draft
jupblb wants to merge 21 commits into
Draft
Conversation
…ission
First two milestones of dropping the intermediate SemanticDB step in favour
of direct SCIP shard output from the Java compiler plugin.
Adds, with no behaviour change in the default config:
semanticdb-javac:
- ScipSymbols: helper that maps SemanticDB symbol strings to SCIP
symbol strings. Globals get the '. . . . ' placeholder prefix that
the aggregator later rewrites into 'scip-java maven g a v ...'.
Locals are normalised to the canonical 'local N' form.
- ScipShardWriter: write-or-merge helper for *.scip shards that
deduplicates documents/symbols/occurrences across compiler rounds.
- ScipShardFromSemanticdb: intermediate translator that converts the
in-memory Semanticdb.TextDocument into a single-document Scip.Index
shard. To be replaced by a direct-from-AST ScipVisitor in Milestone 3.
- SemanticdbJavacOptions: new -emit-scip:on|off flag (default off).
- SemanticdbTaskListener: when -emit-scip:on is set, also writes a
*.scip shard under META-INF/scip/ alongside the existing *.semanticdb
file, reusing the already-built TextDocument.
scip-semanticdb:
- ScipShardWalker: recursively collects *.scip shards under the
configured targetroots, mirroring SemanticdbWalker.
- SymbolRewriter: rewrites placeholder global symbols into the final
'scip-java maven ...' form using PackageTable. Locals and already
rewritten symbols pass through unchanged.
build.sbt:
- javacPlugin now depends on scipProto so the plugin can emit Scip.*
protobuf messages directly.
- Discard top-level Bazel BUILD files from fat-jar merge so the new
scipProto resources don't collide with semanticdb-java.
tests/unit:
- ScipSymbolsSuite: unit tests for ScipSymbols and SymbolRewriter,
including the local/global discrimination and Package.EMPTY fallback.
- ScipShardEmissionSuite: end-to-end test that drives javac with the
semanticdb plugin and -emit-scip:on, then parses the produced
Scip.Index shard to assert the document layout and that every
emitted symbol either uses the placeholder prefix or is a 'local N'.
All 29 unit tests pass.
Milestone 3 of the SemanticDB->SCIP migration: replace the bridge that
went through ScipShardFromSemanticdb with a direct AST walk that
produces Scip.Document values.
- ScipVisitor: fork of SemanticdbVisitor with identical traversal
semantics. Emits Scip.Occurrence, Scip.SymbolInformation, and
Scip.Relationship directly. Symbols still come from the existing
GlobalSymbolsCache/LocalSymbolsCache and are translated to the
placeholder SCIP form via ScipSymbols.fromSemanticdbSymbol at the
emission boundary. Skips signatures and annotations for now -
ScipSignatureFormatter in Milestone 4 will add signature_documentation.
- SemanticdbTaskListener: when -emit-scip:on is set, runs ScipVisitor
directly instead of converting from Semanticdb.TextDocument. This is a
second AST walk during the transition; SemanticdbVisitor remains the
sole producer of legacy *.semanticdb files until Milestone 8.
- ScipShardFromSemanticdb: deleted; no longer needed now that ScipVisitor
produces the same shard format natively.
All 29 unit tests pass, including the end-to-end ScipShardEmissionSuite
that exercises the new ScipVisitor through real javac invocations.
Milestone 4: emit SCIP signature_documentation directly from the compiler
plugin, eliminating the need to format signatures from a SemanticDB
intermediate representation.
- ScipSignatureFormatter: walks javac Element/TypeMirror and produces
a readable Java declaration string. Supports classes, interfaces,
annotations, enums, methods, constructors, fields, parameters,
locals, enum constants, and type parameters with bounds. The internal
TypePrinter handles declared types, type arguments, arrays,
primitives, type variables, wildcards, intersections, and void.
Suppresses implicit 'extends Object' and 'java.lang.Object' supertypes.
- ScipVisitor: when a definition is emitted, the formatter is invoked
and (when the result is non-empty) the signature is attached to
SymbolInformation.signature_documentation with language 'Java' and
the current source's relative path.
- ScipShardEmissionSuite: extended end-to-end checks. Verifies the
shard contains at least one signature_documentation block, that the
Foo class symbol's signature contains 'class Foo', and that the bar()
method symbol's signature contains 'int bar('.
All 29 unit tests pass.
Milestone 5: parallel aggregator that walks *.scip shards produced by
ScipVisitor and emits a final scip-java-scheme index.scip. The existing
SemanticDB-based ScipSemanticdb.run() is untouched.
- ScipShardAggregator:
* walks for *.scip shards (and *.jar files containing them) via
ScipShardWalker
* parses each shard into a Scip.Index
* rewrites placeholder global symbols ('. . . . ' prefix) into the
final 'scip-java maven g a v ...' form via SymbolRewriter
* deduplicates documents by relative_path, merging occurrences and
symbol-info entries from annotation-processor rounds
* computes inverse 'is_implementation && is_reference' relationships
across the whole project, gated on options.emitInverseRelationships
* emits one Metadata Index plus one Index per merged Document via
ScipWriter
- ScipAggregationSuite: end-to-end test that compiles a Java source with
-emit-scip:on, runs ScipShardAggregator over the produced shards, and
asserts the final index has metadata with the scip-java tool name and
that every emitted symbol/occurrence is either local or starts with
'scip-java maven '.
All 30 unit tests pass.
Milestone 6: surface the direct-SCIP path through the existing
index-semanticdb command and through the Maven / ScipBuildTool paths so
end-to-end indexing can use ScipShardAggregator. Default behaviour is
unchanged.
- IndexSemanticdbCommand: new --use-scip-shards flag. When set, the
command runs ScipShardAggregator (walking META-INF/scip/*.scip) instead
of ScipSemanticdb (walking META-INF/semanticdb/*.semanticdb).
- SemanticdbOptionBuilder: reads -Dsemanticdb.emit-scip and appends
'-emit-scip:on' to the injected -Xplugin:semanticdb argument so the
custom javac wrapper emits SCIP shards.
- Embedded.customJavac: new optional emitScip parameter; when true,
propagates -Dsemanticdb.emit-scip=true into the launched javac
wrapper.
- MavenBuildTool: forwards index.indexSemanticdb.useScipShards to the
customJavac wrapper.
- ScipBuildTool: when useScipShards is on, appends '-emit-scip:on'
to the directly-constructed -Xplugin:semanticdb arguments used by
the in-process javac compilation.
Not yet wired (deferred):
- SemanticdbGradlePlugin propagation
- BazelBuildTool / scip_java.bzl
- Kotlin guard for projects that mix Java+Kotlin sources
All 30 unit tests pass.
…hots
Drives the minimized snapshot suite through the new SCIP-direct path
(via --use-scip-shards) and reconciles the resulting output so it can be
locked in as the canonical scheme.
semanticdb-javac:
- ScipVisitor: lowercase Document.language to 'java' (matching the
historical ScipSemanticdb output) and add (range, symbol, roles)
dedup of occurrences, preferring the variant that carries an
enclosing_range. Multiple ANALYZE rounds otherwise emit a second
definition occurrence without enclosing_range that survived the
structural-equality dedup in ScipShardWriter.
- ScipVisitor: treat ENUM the same as CLASS/INTERFACE/ANNOTATION_TYPE
in supportsReferenceRelationship so parent relationships don't get
a spurious is_reference flag.
- ScipShardWriter: switch occurrence merge to the looser
(range, symbol, roles) key, preferring entries with enclosing_range.
- SemanticdbTaskListener: delete the stale .scip shard alongside the
.semanticdb file on ENTER so re-runs don't accumulate occurrences
across builds.
scip-semanticdb:
- ScipShardAggregator: mergeInto now uses the same (range, symbol,
roles) dedup with enclosing_range preference, and merges duplicate
symbol relationships across shards.
build.sbt:
- Add -emit-scip:on to the minimized javac plugin invocation so the
tests/minimized targetroot always contains shard files.
tests/snapshots:
- MinimizedSnapshotScipGenerator now passes --use-scip-shards to
drive the snapshot suite through ScipShardAggregator.
- Regenerate all 23 minimized snapshots under the new 'scip-java'
symbol scheme.
tests/unit:
- ScipShardEmissionSuite: update assertions to expect the lowercase
'java' language string.
Full snapshot suite passes (102 tests). Unit suite passes (30 tests).
After M3-M7 the per-source SCIP shard format is stable and the
ScipShardAggregator produces equivalent output to the legacy
SemanticDB->SCIP path. This commit promotes the cheap compiler-side
half of the dual-emission to be on by default so that:
- any javac plugin invocation (sbt, Maven, Bazel, ad-hoc) writes a
*.scip shard under META-INF/scip/ alongside the *.semanticdb file
without needing an explicit -emit-scip:on flag;
- users (or build tools) that want to consume the new path only need
to flip the CLI switch (--use-scip-shards) once the indexer runs;
- legacy callers that only read *.semanticdb files are unaffected.
The CLI default for index-semanticdb's --use-scip-shards remains false
because the broader ecosystem (notably the Kotlin compiler and the
existing snapshot/build tool integrations) still produces only
*.semanticdb. That flip is deferred to a follow-up PR.
semanticdb-javac:
- SemanticdbJavacOptions.emitScip defaults to true. -emit-scip:off
is now the explicit opt-out and is documented as the legacy path.
scip-java:
- SnapshotCommand: skip per-source shards (those without a
metadata.project_root) so 'scip-java snapshot' continues to render
only the top-level aggregator output. Per-source shards have no
project_root and would otherwise crash with 'missing scheme'
when their relative paths are resolved into a URI.
build.sbt:
- Drop the now-redundant -emit-scip:on flag from the minimized
project; the plugin default already emits shards.
tests/unit:
- ScipShardEmissionSuite: invert the off-path test so it explicitly
passes -emit-scip:off; the previous test relied on the old
default of false.
Full snapshot suite (102 tests) and unit suite (30 tests) green.
Post-PR1 cleanup of dead code, redundant flag plumbing, and duplication.
No behavioral changes; snapshot suite (102 passed) and unit suite (28 passed)
remain green.
Dead code removed:
- ScipShardAggregator: drop unused documentsFromShards{,Collected}
and their Stream/Collectors imports.
- ScipSymbols: drop unused isPlaceholderGlobal/descriptorPath; only
fromSemanticdbSymbol + PLACEHOLDER_PREFIX are needed in production.
- ScipSymbolsSuite: drop the tests for the removed helpers.
Redundant -emit-scip:on plumbing removed:
With compiler-side default emitScip=true (M8), the CLI/build-tool
machinery that conditionally toggled the flag is purely cosmetic.
- Embedded.customJavac: drop emitScip param + emitScipProp system
property prefix.
- MavenBuildTool: stop passing emitScip = useScipShards.
- ScipBuildTool: stop appending -emit-scip:on to the -Xplugin string.
- SemanticdbOptionBuilder: drop EMIT_SCIP system-property handling and
the corresponding xpluginOption branch.
- SemanticdbJavacOptions still parses -emit-scip:on / -emit-scip:off as
the compiler-side opt-out.
- IndexSemanticdbCommand help text no longer implies the shards require
an extra compiler flag.
Internal duplication removed:
- New ScipOccurrences package-private helper centralizes the
(symbol, range, roles) dedup key and the 'prefer enclosing_range'
merge rule that ScipVisitor and ScipShardWriter both used.
- ScipShardWriter.mergeSymbol now uses LinkedHashMap for relationships
so output ordering is deterministic.
Small ScipVisitor cleanups:
- Drop dependency on Semanticdb Property bitmask; compute isStatic /
isAbstract directly from Modifier set.
- Make 'source' final and initialized via a static sourceText helper.
- Merge identical switch arms for ENUM/CLASS/INTERFACE/ANNOTATION_TYPE
in emitSymbolInformation.
- Refresh stale class-level javadoc; signature docs are now produced
via ScipSignatureFormatter.
Add the scaffolding required for the Kotlin compiler plug-in to emit SCIP shards directly, mirroring the Java side from PR1. This commit is passive: the new types are not wired into the analyzer yet, so behavior is unchanged. - semanticdb-kotlinc now depends on scipProto so it can reference the generated SCIP protobuf types. - ScipSymbols: placeholder symbol formatter that produces the same '. . . . <path>' globals and canonical 'local N' locals the aggregator already understands. - ScipOccurrences: deduplicates occurrences by (symbol, range, roles), preferring entries that carry an enclosing_range. - ScipShardWriter: writes a per-source-file Scip.Index shard with overwrite semantics, matching ScipShardWriter on the Java side. - ScipTextDocumentBuilder: assembles a Scip.Document for one source using the above helpers.
Wire the Kotlin compiler plug-in so a single analyzer pass populates both the existing SemanticdbTextDocumentBuilder and the new ScipTextDocumentBuilder. The PostAnalysisExtension now writes: - META-INF/semanticdb/<path>.semanticdb (unchanged) - META-INF/scip/<path>.scip (new) Behavior of consumers that still read .semanticdb is preserved; the companion CLI change to actually consume the .scip shards lands in K4. Legacy SemanticDB emission is intentionally kept for now and will be removed in a later cleanup PR.
Two small robustness fixes uncovered while validating PR2 end-to-end: - ScipWriter.build(): create parent directories before moving the temporary aggregated output into place so callers may target paths whose enclosing directory does not yet exist (e.g. target/scip-index/). - ScipShardWalker: restrict the walk to files under META-INF/scip/ so an aggregated index.scip co-located inside a targetroot is not re-ingested as a shard on subsequent runs.
Now that both javac and kotlinc emit .scip shards under META-INF/scip/, switch the CLI default to read from those shards and update build wiring + a unit fixture that asserted the old scheme. - IndexSemanticdbCommand: --use-scip-shards defaults to true; the help text reflects that javac and kotlinc both ship shards. - build.sbt (kotlincSnapshots task): pass --use-scip-shards and write the aggregated index.scip into target/scip-index/ so the next invocation does not walk over its own previous output. - SnapshotCommandSuite: expected symbol scheme is now 'scip-java maven ...' instead of 'semanticdb maven ...'.
Outputs reflect the new direct-from-SCIP scheme: - symbols are emitted under 'scip-java maven ...' instead of the legacy 'semanticdb maven ...' scheme, - Kotlin symbol info now carries SCIP-native fields such as 'signature_documentation kotlin ...' and 'kind ...'. Regenerated with: sbt 'snapshots/Test/runMain tests.SaveSnapshots'
Introduce small in-package value types so the direct-to-SCIP visitor no
longer reaches into the SemanticDB-generated protobuf classes:
* ScipRange: holds (startLine, startCharacter, endLine, endCharacter)
with an asScipRange() helper that produces the compact 3/4-int form
SCIP expects.
* ScipRole: minimal {DEFINITION, REFERENCE, SYNTHETIC_DEFINITION}
mirror of Semanticdb.SymbolOccurrence.Role.
Update ScipVisitor to use the new types end-to-end. Pure refactor: no
behavior change, no snapshot churn. Sets up D2..D5 where the legacy
SemanticDB types/modules will be deleted.
Drop the legacy SemanticDB code path from the Java compiler plug-in:
* SemanticdbTaskListener: stop building Semanticdb.TextDocument and
writing META-INF/semanticdb/*.semanticdb; ScipVisitor is now the only
emitter, producing META-INF/scip/*.scip directly. The shard path is
computed without going through a SemanticDB intermediate.
* Delete unused legacy emitter sources:
- SemanticdbVisitor.java
- SemanticdbTypeVisitor.java
- SemanticdbSignatures.java
- SemanticdbTrees.java
* SemanticdbJavacOptions: drop the emitScip field. Keep -emit-scip:on
and -emit-scip:off as deprecated no-ops so cached compiler options
keep working without erroring.
Migrate the test infrastructure to consume the SCIP shard output:
* CompileResult: replace textDocuments(Semanticdb.TextDocuments) with
documents(List[Scip.Document]) plus a documentsFromShard helper.
* TestCompiler: read META-INF/scip/<rel>.scip back from disk after
javac runs and surface the documents through CompileResult.
* OverridesSuite: assert on SymbolInformation.relationships
(is_implementation=true) instead of Semanticdb.getOverriddenSymbolsList.
* TargetedSuite: compare positions against Scip.Occurrence.range and
strip the placeholder prefix when comparing symbols.
* GeneratedConstructorSuite: switch the stub signature from
Semanticdb.TextDocument to Scip.Document.
* JavacClassesDirectorySuite: verify the shard lands at
META-INF/scip/.../Example.java.scip.
* ScipShardEmissionSuite: assert that the legacy .semanticdb file is
NOT produced; replace the -emit-scip:off shard-suppression test with
one that verifies the deprecated flag is still accepted as a no-op.
* BaseBuildToolSuite: rename semanticdbPattern/semanticdbFiles to
scipShardPattern/scipShards and match META-INF/scip/**.scip so the
Gradle/Maven build tool suites continue to count the right files.
* Delete tests/snapshots/src/main/scala/tests/SemanticdbFile.scala
(no remaining callers).
Validation: sbt unit/test (28 passing), sbt snapshots/test
(102 passing).
Drop the legacy SemanticDB code path from the Kotlin compiler plug-in: * ScipRole: new local enum mirroring the DEFINITION/REFERENCE subset of Semanticdb.SymbolOccurrence.Role. * SemanticdbVisitor: drop the documentBuilder field and build()/Semanticdb.TextDocument helper; the visitor now only feeds ScipTextDocumentBuilder and uses ScipRole at every emit site. * ScipTextDocumentBuilder: switch role parameter from Semanticdb.SymbolOccurrence.Role to ScipRole. * PostAnalysisExtension: remove the SemanticDB write path and the (Semanticdb.TextDocument) -> Unit callback; the extension now only walks the visitors and writes META-INF/scip/<rel>.scip shards. * AnalyzerRegistrar: remove the SemanticDB callback parameter. Delete the legacy implementation source: * SemanticdbTextDocumentBuilder.kt Delete the legacy Kotlin test suites that asserted on Semanticdb protobuf output: * src/test/kotlin/.../test/AnalyzerTest.kt (1528 lines) * src/test/kotlin/.../test/SemanticdbSymbolsTest.kt (726 lines) * src/test/kotlin/.../test/Utils.kt (203 lines) The Kotlin compiler plug-in behavior remains covered end-to-end by the existing snapshot suites (semanticdb-kotlinc/minimized fixtures + the exposed-core library snapshot regenerated in PR2 K5). Validation: sbt unit/test (28 passing), sbt snapshots/test (102 passing).
The aggregator now consumes SCIP shards only. The legacy SemanticDB-based reader/aggregator is removed. Wiring: * IndexSemanticdbCommand: remove the --use-scip-shards flag and the ScipSemanticdb.run() else branch; always call ScipShardAggregator. * BazelBuildTool: switch the Bazel main entry to ScipShardAggregator. * MinimizedSnapshotScipGenerator: drop the --use-scip-shards argument (the default switched in PR2 K4 and the flag is being removed now). * build.sbt (kotlincSnapshots): drop --use-scip-shards from the index-semanticdb invocation. Delete the legacy SemanticDB-consuming aggregator sources, none of which have any remaining callers: * ScipSemanticdb.java * SemanticdbWalker.java * SemanticdbTreeVisitor.java * ScipTextDocument.java * SignatureFormatter.java * SignatureFormatterException.java * SymbolOccurrences.java * Symtab.java * RangeComparator.java Validation: sbt unit/test (28 passing), sbt snapshots/test (102 passing).
- Remove generated SemanticDB protobuf module: - semanticdb-java/src/main/protobuf/semanticdb.proto - semanticdb-java/src/main/protobuf/BUILD - semanticdb-kotlinc/src/main/proto/.../semanticdb.proto - Delete unused SemanticDB builder helpers: - semanticdb-java/.../SemanticdbBuilders.java - semanticdb-kotlinc/.../SemanticdbBuilders.kt - Keep a minimal semanticdb-java module that only ships SemanticdbSymbols.java (a pure-Java symbol helper still consumed by semanticdb-javac and scip-semanticdb), without protobuf generation. - Update sbt and Bazel build files accordingly.
The Java/Kotlin compiler plugins now emit per-file SCIP shards directly and the 'index-semanticdb' command aggregates those shards into a single SCIP index. Update user-facing strings and docs to describe the actual behavior instead of the now-removed SemanticDB->SCIP conversion step. Keep compatibility names (Xplugin:semanticdb, index-semanticdb CLI, semanticdb-targetroot directory, semanticdb-javac module/package) so existing build integrations keep working.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.