Skip to content

docs: explain native vs codegen-dispatch implementation model#4629

Merged
andygrove merged 1 commit into
apache:mainfrom
andygrove:docs/native-vs-codegen-dispatch-model
Jun 12, 2026
Merged

docs: explain native vs codegen-dispatch implementation model#4629
andygrove merged 1 commit into
apache:mainfrom
andygrove:docs/native-vs-codegen-dispatch-model

Conversation

@andygrove

Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #4310.

Rationale for this change

#4310 noted that Comet's model for choosing between a native (incompatible, faster) implementation and a Spark-compatible codegen-dispatch implementation was confusing, and that the regex family made it visible. The behavior that actually landed (via #4239) is general, not regex-specific: any expression that has both a native and a codegen-dispatch implementation defaults to codegen dispatch (Spark-compatible, runs natively with a per-batch JNI cost), and the user opts into the native path per expression with that expression's allowIncompatible flag. This was documented per-family in the regex and JSON guides but never stated as the general model. This PR explains it once, centrally.

What changes are included in this PR?

  • compatibility/index.md: adds a "Native and codegen-dispatch implementations" section describing the two implementation kinds, why codegen dispatch is the default, how spark.comet.expression.<Expr>.allowIncompatible=true opts into the native path, and the fallthrough-to-dispatcher behavior for cases the native path does not cover. It also distinguishes this from expressions that have no codegen-dispatch path (for example cast), where the default is a Spark fallback.
  • expressions.md: corrects the support-reference intro, which previously implied incompatible cases always fall back to Spark by default, and links to the new section.

No code changes; the regex (compatibility/regex.md) and JSON (compatibility/json.md) guides already document their per-expression configs and specific differences, so this PR only adds the general framing they are instances of.

How are these changes tested?

Documentation-only change. The described model was verified against the serdes in spark/src/main/scala/org/apache/comet/serde/strings.scala (regex family) and the spark.comet.exec.scalaUDF.codegen.enabled config in CometConf.scala. compatibility/index.md was run through prettier (unchanged); expressions.md is listed in .prettierignore.

@comphead comphead left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove it actually makes me think if we should let user to select codegen/native implementation to choose? 🤔

Not sure if it makes sense, just a thought

@andygrove

Copy link
Copy Markdown
Member Author

Thanks @andygrove it actually makes me think if we should let user to select codegen/native implementation to choose? 🤔

Not sure if it makes sense, just a thought

I'm not sure I understand. What are you suggesting the approach should be?

@andygrove andygrove merged commit 754289a into apache:main Jun 12, 2026
16 checks passed
@andygrove andygrove deleted the docs/native-vs-codegen-dispatch-model branch June 12, 2026 17:43
@andygrove

Copy link
Copy Markdown
Member Author

Thanks @andygrove it actually makes me think if we should let user to select codegen/native implementation to choose? 🤔
Not sure if it makes sense, just a thought

I'm not sure I understand. What are you suggesting the approach should be?

I think things will be clearer to users once #4509 merges

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DISCUSS] Simplify regex engine + incompatibility config model

2 participants