apache · andygrove · Jun 12, 2026 · Jun 11, 2026
diff --git a/docs/source/user-guide/latest/compatibility/index.md b/docs/source/user-guide/latest/compatibility/index.md
@@ -30,3 +30,30 @@ This guide documents areas where Comet's behavior is known to differ from Spark.
 - **Expressions**: per-expression compatibility notes, including cast.
 - **JSON**: choosing between the native and Spark-compatible engines for JSON expressions.
 - **Spark versions**: version-specific known issues and limitations.
+
+## Native and codegen-dispatch implementations
+
+Some Spark expressions have two implementations in Comet:
+
+- A **codegen-dispatch** implementation that runs Spark's own generated code for the
+  expression inside Comet's native pipeline (via the Arrow-direct codegen dispatcher). This
+  produces byte-exact Spark results at the cost of one JNI round-trip per batch. It is gated
+  globally by `spark.comet.exec.scalaUDF.codegen.enabled` (enabled by default); when the
+  dispatcher is disabled, these expressions fall back to Spark.
+- A **native** (Rust / DataFusion) implementation that is faster, with no JNI overhead, but
+  has known semantic differences from Spark for some inputs or patterns.
+
+Because the codegen-dispatch path matches Spark exactly, Comet uses it by **default**. The
+faster native path is **opt-in per expression** via that expression's
+`spark.comet.expression.<ExprClassName>.allowIncompatible=true` flag, which declares that you
+accept its differences from Spark. There is no global opt-in. When the native path is enabled
+but a specific input or pattern has no native implementation, Comet routes that case back
+through the codegen dispatcher rather than running something incompatible.
+
+This is the model behind the [regular expression](regex.md) and [JSON](json.md) families,
+which document their per-expression configs and the specific differences to expect.
+
+This is distinct from expressions that have **no** codegen-dispatch path: there, the
+incompatible cases fall back to Spark by default, and `allowIncompatible=true` runs the native
+(incompatible) path instead. `cast` is the main example; see the
+[expression reference](../expressions.md) for which expressions have incompatible cases.
diff --git a/docs/source/user-guide/latest/expressions.md b/docs/source/user-guide/latest/expressions.md
@@ -26,10 +26,14 @@ transparently falls back to Spark for that part of the plan; results are unaffec
 
 Expressions marked ✅ Supported are enabled by default and produce Spark-compatible results.
 
-Some ✅ Supported expressions have specific incompatible cases that fall back to Spark by
-default. Those cases must be opted into per expression with
+Some ✅ Supported expressions have specific incompatible cases that are not run by default.
+Those cases must be opted into per expression with
 `spark.comet.expression.EXPRNAME.allowIncompatible=true` (where `EXPRNAME` is the Spark
-expression class name, for example `Cast`). There is no global opt-in.
+expression class name, for example `Cast`). There is no global opt-in. By default such a case
+either falls back to Spark (for example `cast`) or, when the expression has a Spark-compatible
+codegen-dispatch implementation, runs through that instead (for example the regex and JSON
+families). See [Native and codegen-dispatch implementations](compatibility/index.md#native-and-codegen-dispatch-implementations)
+for how Comet chooses.
 
 Most expressions can also be disabled with `spark.comet.expression.EXPRNAME.enabled=false`, where
 `EXPRNAME` is the Spark expression class name (for example `Length` or `StartsWith`). See the