docs: explain native vs codegen-dispatch implementation model#4629
Merged
andygrove merged 1 commit intoJun 12, 2026
Merged
Conversation
…owIncompatible opt-in
comphead
approved these changes
Jun 12, 2026
Contributor
There was a problem hiding this comment.
Thanks @andygrove it actually makes me think if we should let user to select codegen/native implementation to choose? 🤔
Not sure if it makes sense, just a thought
Member
Author
I'm not sure I understand. What are you suggesting the approach should be? |
Member
Author
I think things will be clearer to users once #4509 merges |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #4310.
Rationale for this change
#4310 noted that Comet's model for choosing between a native (incompatible, faster) implementation and a Spark-compatible codegen-dispatch implementation was confusing, and that the regex family made it visible. The behavior that actually landed (via #4239) is general, not regex-specific: any expression that has both a native and a codegen-dispatch implementation defaults to codegen dispatch (Spark-compatible, runs natively with a per-batch JNI cost), and the user opts into the native path per expression with that expression's
allowIncompatibleflag. This was documented per-family in the regex and JSON guides but never stated as the general model. This PR explains it once, centrally.What changes are included in this PR?
compatibility/index.md: adds a "Native and codegen-dispatch implementations" section describing the two implementation kinds, why codegen dispatch is the default, howspark.comet.expression.<Expr>.allowIncompatible=trueopts into the native path, and the fallthrough-to-dispatcher behavior for cases the native path does not cover. It also distinguishes this from expressions that have no codegen-dispatch path (for examplecast), where the default is a Spark fallback.expressions.md: corrects the support-reference intro, which previously implied incompatible cases always fall back to Spark by default, and links to the new section.No code changes; the regex (
compatibility/regex.md) and JSON (compatibility/json.md) guides already document their per-expression configs and specific differences, so this PR only adds the general framing they are instances of.How are these changes tested?
Documentation-only change. The described model was verified against the serdes in
spark/src/main/scala/org/apache/comet/serde/strings.scala(regex family) and thespark.comet.exec.scalaUDF.codegen.enabledconfig inCometConf.scala.compatibility/index.mdwas run through prettier (unchanged);expressions.mdis listed in.prettierignore.