Describe the bug
flatten on an array<array<T>> column returns wrong results and silently drops nulls when a row's outer array contains a null sub-array. Spark returns null for any such row; Comet returns a non-null, misaligned array. Silent data corruption, not a crash.
Steps to reproduce
Add to a suite extending CometTestBase:
test("flatten with null sub-array") {
val data = Seq(
Tuple1(Seq(Seq(1, 2, 3), Seq(4, 5))),
Tuple1(Seq[Seq[Int]](Seq(1), null)), // Spark: flatten -> null
Tuple1(Seq[Seq[Int]](null, null))) // Spark: flatten -> null
withParquetTable(data, "t") {
checkSparkAnswerAndOperator("SELECT flatten(_1) FROM t")
}
}
== Results ==
!== Spark Answer - 3 == == Comet Answer - 3 ==
struct<flatten(_1):array<int>> struct<flatten(_1):array<int>>
![List(1, 2, 3, 4, 5)] [List()]
![null] [List(1)]
![null] [List(1, 2, 3, 4, 5)]
Expected behavior
Match Spark: a null sub-array makes flatten return null for that row.
Additional context
Found while enabling CometLocalTableScanExec by default (#4393), but reproduces over a plain Parquet scan. Upstream test: DataFrameFunctionsSuite "flatten function".
Describe the bug
flattenon anarray<array<T>>column returns wrong results and silently drops nulls when a row's outer array contains a null sub-array. Spark returnsnullfor any such row; Comet returns a non-null, misaligned array. Silent data corruption, not a crash.Steps to reproduce
Add to a suite extending
CometTestBase:Expected behavior
Match Spark: a null sub-array makes
flattenreturnnullfor that row.Additional context
Found while enabling
CometLocalTableScanExecby default (#4393), but reproduces over a plain Parquet scan. Upstream test:DataFrameFunctionsSuite"flatten function".