Skip to content

feat: wire try_to_number and general filter lambda through codegen dispatch#4634

Open
andygrove wants to merge 3 commits into
apache:mainfrom
andygrove:feat/codegen-dispatch-tier1
Open

feat: wire try_to_number and general filter lambda through codegen dispatch#4634
andygrove wants to merge 3 commits into
apache:mainfrom
andygrove:feat/codegen-dispatch-tier1

Conversation

@andygrove

@andygrove andygrove commented Jun 12, 2026

Copy link
Copy Markdown
Member

Which issue does this PR close?

N/A. Part of expanding the JVM codegen dispatch coverage (the path that runs Spark's own doGenCode inside the Comet native pipeline for Spark-exact results).

Rationale for this change

Several scalar string and array expressions still fell back to Spark even though they are eligible for the codegen dispatch path: each either needs only a one-line dispatch registration or already worked through a RuntimeReplaceable rewrite and was simply mislabeled in the reference. This PR closes those gaps.

What changes are included in this PR?

Two genuine wirings:

  • try_to_number (TryToNumber): registered as CometCodegenDispatch, mirroring the existing to_number (CometToNumber).
  • filter general lambda (ArrayFilter): the general lambda form now routes through the codegen dispatcher, like the other higher-order functions (transform, exists, forall). The array_compact form (filter(arr, x -> x is not null)) keeps its native fast path to avoid the per-batch JNI cost.

Three expressions that already ran natively through their RuntimeReplaceable rewrites get SQL test coverage and a corrected reference status (they were marked Planned but already worked, similar to the recent dayname / monthname correction):

  • regexp_count rewrites to size(regexp_extract_all(...)).
  • regexp_substr rewrites to nullif(regexp_extract(...), '').
  • try_to_binary rewrites to try_eval(to_binary(...)).

docs/source/user-guide/latest/expressions.md flips all five from Planned to Supported.

How are these changes tested?

New Comet SQL file tests under spark/src/test/resources/sql-tests/expressions/: string/try_to_number.sql, string/regexp_count.sql, string/regexp_substr.sql, string/try_to_binary.sql, plus the existing array/array_filter.sql upgraded from spark_answer_only to query so the general lambda now asserts native execution. Each was run with CometSqlFileTestSuite on Spark 3.5 and passes (native execution plus result match against Spark).

…spatch

Routes TryToNumber and the general-lambda form of ArrayFilter through the JVM
codegen dispatcher, matching the existing to_number / higher-order-function
serdes. The array_compact form of filter keeps its native fast path.

Also adds SQL file test coverage for regexp_count, regexp_substr, and
try_to_binary, which already run natively via their RuntimeReplaceable
rewrites, and flips all five expressions to Supported in the reference.
@andygrove andygrove added this to the 0.17.0 milestone Jun 12, 2026

@mbutrovich mbutrovich left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks @andygrove!

…-tier1

# Conflicts:
#	docs/source/user-guide/latest/expressions.md
#	spark/src/main/scala/org/apache/comet/serde/strings.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants