feat: wire mask and map (create_map) through codegen dispatch#4635
Open
andygrove wants to merge 1 commit into
Open
feat: wire mask and map (create_map) through codegen dispatch#4635andygrove wants to merge 1 commit into
andygrove wants to merge 1 commit into
Conversation
Routes Mask and CreateMap through the JVM codegen dispatcher so they run inside the Comet pipeline with Spark-exact results, with SQL file test coverage and a reference status update.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
N/A. Tier 2 of expanding JVM codegen dispatch coverage (follow-on to the tier 1 PR).
Rationale for this change
maskandmapare scalar expressions with supported output types that were falling back to Spark even though they are eligible for the codegen dispatch path (which runs Spark's owndoGenCodeinside the Comet pipeline for Spark-exact results).What changes are included in this PR?
mask(Mask): registered asCometCodegenDispatch.map(CreateMap): registered asCometCodegenDispatch.docs/source/user-guide/latest/expressions.mdflips both from Planned to Supported.Scoped down from the originally planned tier 2 set after empirical testing. The following were deferred because they need more than a one-line registration, and I would rather land them with proper handling:
base64/encode: on Spark 4.x these lower toStaticInvoke(codec object), not theBase64/Encodecase classes, so they need theStaticInvokeallowlist instatics.scalaextended (and the path differs across Spark versions).split_part: rewrites toelement_at(StringSplitSQL(...)). DispatchingStringSplitSQLthen composing withelement_attripped a native panic (Arrays with inconsistent types passed to MutableArrayData), so it needs investigation.array_prepend:ArrayPrependonly exists in Spark 3.5+, so it needs the version-specific expression map rather than the shared one.How are these changes tested?
New Comet SQL file tests
string/mask.sqlandmap/create_map.sql, run withCometSqlFileTestSuiteand passing (native execution plus result match against Spark). Both exercise column inputs, literals, nulls, and the multi-argument forms.