feat: wire try_to_number and general filter lambda through codegen dispatch#4634
Open
andygrove wants to merge 3 commits into
Open
feat: wire try_to_number and general filter lambda through codegen dispatch#4634andygrove wants to merge 3 commits into
andygrove wants to merge 3 commits into
Conversation
…spatch Routes TryToNumber and the general-lambda form of ArrayFilter through the JVM codegen dispatcher, matching the existing to_number / higher-order-function serdes. The array_compact form of filter keeps its native fast path. Also adds SQL file test coverage for regexp_count, regexp_substr, and try_to_binary, which already run natively via their RuntimeReplaceable rewrites, and flips all five expressions to Supported in the reference.
mbutrovich
approved these changes
Jun 12, 2026
mbutrovich
left a comment
Contributor
There was a problem hiding this comment.
Looks good to me, thanks @andygrove!
…-tier1 # Conflicts: # docs/source/user-guide/latest/expressions.md # spark/src/main/scala/org/apache/comet/serde/strings.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
N/A. Part of expanding the JVM codegen dispatch coverage (the path that runs Spark's own
doGenCodeinside the Comet native pipeline for Spark-exact results).Rationale for this change
Several scalar string and array expressions still fell back to Spark even though they are eligible for the codegen dispatch path: each either needs only a one-line dispatch registration or already worked through a
RuntimeReplaceablerewrite and was simply mislabeled in the reference. This PR closes those gaps.What changes are included in this PR?
Two genuine wirings:
try_to_number(TryToNumber): registered asCometCodegenDispatch, mirroring the existingto_number(CometToNumber).filtergeneral lambda (ArrayFilter): the general lambda form now routes through the codegen dispatcher, like the other higher-order functions (transform,exists,forall). Thearray_compactform (filter(arr, x -> x is not null)) keeps its native fast path to avoid the per-batch JNI cost.Three expressions that already ran natively through their
RuntimeReplaceablerewrites get SQL test coverage and a corrected reference status (they were marked Planned but already worked, similar to the recentdayname/monthnamecorrection):regexp_countrewrites tosize(regexp_extract_all(...)).regexp_substrrewrites tonullif(regexp_extract(...), '').try_to_binaryrewrites totry_eval(to_binary(...)).docs/source/user-guide/latest/expressions.mdflips all five from Planned to Supported.How are these changes tested?
New Comet SQL file tests under
spark/src/test/resources/sql-tests/expressions/:string/try_to_number.sql,string/regexp_count.sql,string/regexp_substr.sql,string/try_to_binary.sql, plus the existingarray/array_filter.sqlupgraded fromspark_answer_onlytoqueryso the general lambda now asserts native execution. Each was run withCometSqlFileTestSuiteon Spark 3.5 and passes (native execution plus result match against Spark).