Skip to content

bug: scalar subquery in RepartitionByExpression crashes with "Subquery N not found" #4787

Description

@mbutrovich

Describe the bug

A scalar subquery inside a RepartitionByExpression (e.g. DISTRIBUTE BY) crashes natively: the subquery is not registered on the native side for this plan shape. Reproduces over a plain Parquet scan.

Steps to reproduce

Add to a suite extending CometTestBase:

test("scalar subquery in repartition") {
  withParquetTable((0 until 10).map(i => (i, i)), "t") {
    val df = sql("SELECT * FROM t DISTRIBUTE BY (_1 + (SELECT max(_2) FROM t))")
    checkSparkAnswer(df)
  }
}
org.apache.spark.SparkException: Job aborted due to stage failure: ... org.apache.comet.CometNativeException: Error inserting batch: External error: org.apache.comet.CometRuntimeException: Subquery 75 not found for plan 12.
    at org.apache.comet.Native.executePlan(Native Method)
    at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$2(CometExecIterator.scala:155)
    at org.apache.comet.vector.NativeUtil.getNextBatch(NativeUtil.scala:173)
    at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$1(CometExecIterator.scala:154)

Expected behavior

Query runs and returns Spark-equivalent results with no native crash.

Additional context

Found while enabling CometLocalTableScanExec by default (#4393), but reproduces over a plain Parquet scan. Upstream test: subquery in repartition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions