[SPARK-57103][SQL] Wire ordering for nanosecond timestamp types#56207
Open
stevomitric wants to merge 3 commits into
Open
[SPARK-57103][SQL] Wire ordering for nanosecond timestamp types#56207stevomitric wants to merge 3 commits into
stevomitric wants to merge 3 commits into
Conversation
### What changes were proposed in this pull request? Implement `Ordering` for `TimestampNTZNanosType(p)` and `TimestampLTZNanosType(p)`, both in the interpreted path (`PhysicalDataType.ordering`) and the codegen path (`CodeGenerator.genComp`). This is the second of three PRs for SPARK-57103. The first (`[SPARK-57103][SQL] Add Comparable to TimestampNanosVal`, commit 0b0ffb7) made the value class `Comparable`. The remaining PR will extend `hash`, `xxhash64`, and `murmur3` for the two nanos types. Changes: - `PhysicalDataType.scala`: replace the two `orderedOperationUnsupportedByDataTypeError` throws with `implicitly[Ordering[InternalType]]`, following the `PhysicalGeographyType` / `PhysicalGeometryType` precedent. Resolves via `scala.math.Ordering.ordered[T <: Comparable[T]]` now that `TimestampNanosVal` is `Comparable`. - `CodeGenerator.scala`: add an explicit `genComp` arm that calls `compareTo`. Required because the existing AtomicType fallback would emit `c1.compare(c2)`, which fails to compile on `TimestampNanosVal` (it has `compareTo`, not `compare`). - Updated the scaladoc on both physical types to note that only hash remains as future work. ### Why are the changes needed? Without ordering, SQL operators that need a total order on the type (`ORDER BY`, sort-merge join, sort-based `GROUP BY`, `DISTINCT`) cannot execute against nanos-precision columns. The two physical types previously threw at runtime. ### Does this PR introduce _any_ user-facing change? No. The nanos types remain gated behind `spark.sql.timestampNanosTypes.enabled`; this PR only fills in the ordering hole their `PhysicalDataType` had. ### How was this patch tested? Added 10 unit tests in `OrderingSuite` (5 cases × 2 types): equal values, `epochMicros` primary key, `nanosWithinMicro` tie-breaker, `Long.MinValue` / `Long.MaxValue` boundary, and pre-epoch (negative `epochMicros`). Each case verifies both `InterpretedOrdering` and `LazilyGeneratedOrdering` agree on ASC and DESC, and that `compare(a, a) == 0`. ``` build/mvn test -pl sql/catalyst \ -DwildcardSuites=org.apache.spark.sql.catalyst.expressions.OrderingSuite ``` Tests: 66/66 passing (10 new). Also ran `DataTypeSuite` and the catalyst-side `TimestampNanos*Suite` set: 344/344 passing. Not adding nanos types to `DataTypeTestUtils.atomicTypes` yet because the generic `GenerateOrdering with $dataType` test there uses `RandomDataGenerator`, which does not yet support nanos types (tracked in SPARK-57034). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.7)
Contributor
Author
|
@MaxGekk please take a look at this PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Implement
OrderingforTimestampNTZNanosType(p)andTimestampLTZNanosType(p), both in the interpreted path and the codegen path.Why are the changes needed?
Without ordering, SQL operators that need a total order on the type (
ORDER BY, sort-merge join, sort-basedGROUP BY,DISTINCT) cannot execute against nanos-precision columns.Does this PR introduce any user-facing change?
No.
How was this patch tested?
New UT in this PR.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7