Skip to content

feat: opt array_intersect, array_except, array_join into codegen dispatch#4636

Open
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:feat/codegen-dispatch-incompat-arrays
Open

feat: opt array_intersect, array_except, array_join into codegen dispatch#4636
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:feat/codegen-dispatch-incompat-arrays

Conversation

@andygrove

Copy link
Copy Markdown
Member

Which issue does this PR close?

Part of #4596 (the array group: array_intersect, array_except, array_join).

Rationale for this change

These three expressions report Incompatible (null handling and element ordering can diverge from DataFusion's native implementation), so with allowIncompatible unset they fall the whole projection back to Spark. They all have a real Spark doGenCode and supported input/output types, so they are eligible for the CodegenDispatchFallback path introduced in #4538: route the Incompatible result through the JVM codegen dispatcher (Spark's own doGenCode inside the Comet pipeline) so the projection stays native while matching Spark exactly.

What changes are included in this PR?

  • CometArrayIntersect, CometArrayExcept, CometArrayJoin mix in CodegenDispatchFallback. ArrayIntersect's Unsupported collation case is unchanged (still falls back); only the Incompatible case dispatches.
  • docs/source/user-guide/latest/expressions.md notes updated: these now route through the dispatcher by default, with the native incompatible path opt-in via allowIncompatible.

The native opt-in path (allowIncompatible=true) is unchanged.

How are these changes tested?

  • array_join.sql: upgraded from spark_answer_only to query so it now asserts native execution matching Spark, including the array('a', NULL, 'c') null case.
  • New array_intersect_dispatch.sql and array_except_dispatch.sql: exercise the dispatch path with allowIncompatible unset over the exact inputs the native path handles incompatibly (the right-longer-than-left ordering case for intersect, and the literal/literal case for except that the native path could not handle). Both assert native execution matching Spark with no sort_array workaround.
  • The existing array_intersect.sql / array_except.sql tests (native allowIncompatible=true path) still pass.

All run with CometSqlFileTestSuite on Spark 3.5 and pass.

…atch

Mixes CodegenDispatchFallback into the three array serdes so their Incompatible
results route through the JVM codegen dispatcher (Spark's own doGenCode inside
the Comet pipeline) by default, keeping the projection native while matching
Spark exactly. The native incompatible path stays opt-in via allowIncompatible.

Part of apache#4596.
@andygrove andygrove added this to the 0.17.0 milestone Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant