Skip to content

feat: wire mask and map (create_map) through codegen dispatch#4635

Open
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:feat/codegen-dispatch-tier2
Open

feat: wire mask and map (create_map) through codegen dispatch#4635
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:feat/codegen-dispatch-tier2

Conversation

@andygrove

Copy link
Copy Markdown
Member

Which issue does this PR close?

N/A. Tier 2 of expanding JVM codegen dispatch coverage (follow-on to the tier 1 PR).

Rationale for this change

mask and map are scalar expressions with supported output types that were falling back to Spark even though they are eligible for the codegen dispatch path (which runs Spark's own doGenCode inside the Comet pipeline for Spark-exact results).

What changes are included in this PR?

  • mask (Mask): registered as CometCodegenDispatch.
  • map (CreateMap): registered as CometCodegenDispatch.

docs/source/user-guide/latest/expressions.md flips both from Planned to Supported.

Scoped down from the originally planned tier 2 set after empirical testing. The following were deferred because they need more than a one-line registration, and I would rather land them with proper handling:

  • base64 / encode: on Spark 4.x these lower to StaticInvoke (codec object), not the Base64 / Encode case classes, so they need the StaticInvoke allowlist in statics.scala extended (and the path differs across Spark versions).
  • split_part: rewrites to element_at(StringSplitSQL(...)). Dispatching StringSplitSQL then composing with element_at tripped a native panic (Arrays with inconsistent types passed to MutableArrayData), so it needs investigation.
  • array_prepend: ArrayPrepend only exists in Spark 3.5+, so it needs the version-specific expression map rather than the shared one.

How are these changes tested?

New Comet SQL file tests string/mask.sql and map/create_map.sql, run with CometSqlFileTestSuite and passing (native execution plus result match against Spark). Both exercise column inputs, literals, nulls, and the multi-argument forms.

Routes Mask and CreateMap through the JVM codegen dispatcher so they run inside
the Comet pipeline with Spark-exact results, with SQL file test coverage and a
reference status update.
@andygrove andygrove added this to the 0.17.0 milestone Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant