feat: opt array_intersect, array_except, array_join into codegen dispatch#4636
Open
andygrove wants to merge 1 commit into
Open
feat: opt array_intersect, array_except, array_join into codegen dispatch#4636andygrove wants to merge 1 commit into
andygrove wants to merge 1 commit into
Conversation
…atch Mixes CodegenDispatchFallback into the three array serdes so their Incompatible results route through the JVM codegen dispatcher (Spark's own doGenCode inside the Comet pipeline) by default, keeping the projection native while matching Spark exactly. The native incompatible path stays opt-in via allowIncompatible. Part of apache#4596.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Part of #4596 (the array group:
array_intersect,array_except,array_join).Rationale for this change
These three expressions report
Incompatible(null handling and element ordering can diverge from DataFusion's native implementation), so withallowIncompatibleunset they fall the whole projection back to Spark. They all have a real SparkdoGenCodeand supported input/output types, so they are eligible for theCodegenDispatchFallbackpath introduced in #4538: route theIncompatibleresult through the JVM codegen dispatcher (Spark's owndoGenCodeinside the Comet pipeline) so the projection stays native while matching Spark exactly.What changes are included in this PR?
CometArrayIntersect,CometArrayExcept,CometArrayJoinmix inCodegenDispatchFallback.ArrayIntersect'sUnsupportedcollation case is unchanged (still falls back); only theIncompatiblecase dispatches.docs/source/user-guide/latest/expressions.mdnotes updated: these now route through the dispatcher by default, with the native incompatible path opt-in viaallowIncompatible.The native opt-in path (
allowIncompatible=true) is unchanged.How are these changes tested?
array_join.sql: upgraded fromspark_answer_onlytoqueryso it now asserts native execution matching Spark, including thearray('a', NULL, 'c')null case.array_intersect_dispatch.sqlandarray_except_dispatch.sql: exercise the dispatch path withallowIncompatibleunset over the exact inputs the native path handles incompatibly (the right-longer-than-left ordering case for intersect, and the literal/literal case for except that the native path could not handle). Both assert native execution matching Spark with nosort_arrayworkaround.array_intersect.sql/array_except.sqltests (nativeallowIncompatible=truepath) still pass.All run with
CometSqlFileTestSuiteon Spark 3.5 and pass.