DRILL-8537: Bump Calcite to Version 1.42#3025
Conversation
Major Changes1. Function Type InferenceEXTRACT FunctionProblem: EXTRACT(SECOND) was returning BIGINT instead of DOUBLE, losing fractional seconds
Files Modified:
TIMESTAMPDIFF FunctionProblem: Type mismatch between validation (BIGINT) and conversion (INTEGER)
Files Modified:
TIMESTAMPADD FunctionProblem: Calcite 1.35 was adding precision to DATE types, causing assertion errors
Files Modified:
2. Function Registration & ResolutionVararg Functions (CONCAT, COALESCE, etc.)Problem: Function resolution failures for functions with variable arguments Files Modified:
Niladic Special Functions (CURRENT_DATE, SESSION_USER, etc.)Problem: Special functions not properly recognized in Calcite 1.35
Files Modified:
3. COUNT(*) HandlingProblem: COUNT(*) type inference changed in Calcite 1.35 Files Modified:
4. Aggregate Cost EstimationProblem: Deprecated Files Modified:
5. TIMESTAMPADD ImplementationProblem: Complete signature change in Calcite 1.35
Files Modified:
6. Complex Writer FunctionsProblem: FLATTEN, CONVERT_FROM, CONVERT_TO require ProjectRecordBatch context
Files Modified:
7. FLATTEN in Aggregates ValidationProblem: FLATTEN only validated in COUNT, allowed in other aggregates Files Modified:
8. Error Handling & ValidationPrepared Statement ErrorsProblem: Parse errors wrapped differently in RPC layer, appearing as SYSTEM instead of VALIDATION
Files Modified:
Invalid CAST OperationsProblem: Calcite 1.35 correctly rejects semantically invalid CAST(DATE as TIME) Files Modified:
Test UpdatesCore Module Tests
JDBC Storage Plugin Tests
Key Behavioral ChangesType Inference
Function Resolution
Validation
Files CreatedCore Engine
Tests
Migration Notes for DevelopersIf you use EXTRACT(SECOND):
If you use COUNT(*):
If you use SQRT or math functions:
If you cast DATE to TIME:
CompatibilityBackward Compatibility
Breaking Changes
|
b1599a1 to
de8803d
Compare
f2e4fc5 to
981bafb
Compare
Update from Calcite 1.38 to 1.42 and adapt Drill to the breaking API changes between those releases: - pom.xml: calcite 1.42.0, avatica 1.28.0. - DrillRelDataTypeSystem: drop getMaxNumericPrecision()/getMaxNumericScale() overrides; they became final in 1.42 and now delegate to getMaxPrecision()/getMaxScale(), which Drill already overrides for DECIMAL. - JdbcExpressionCheck: implement RexVisitor.visitNodeAndFieldIndex() added in Calcite 1.41. - Parser.jj (TRIM): CalciteResource.illegalFromEmpty() was removed; raise a ParseException for the TRIM(FROM x) case while keeping Drill's TRIM(<flag> <chars>) extension. - DynamicSchema/DynamicRootSchema: port to the new schema lookup API (CALCITE-6029, Calcite 1.39) which removed getImplicitSubSchema/getImplicitTable. Storage-plugin schemas are still loaded lazily, now via an overridden subSchemas() Lookup; table lookups stay exact/case-sensitive and temporary tables are resolved through an overridden tables() Lookup. - WindowPrule: Calcite no longer names window output columns "w<group>$..."; select each group's output fields positionally instead of by name prefix. - TestLiteralAggFunction: update baselines where a column constrained to a single constant (WHERE col = N) is now constant-folded to an INT literal.
3d407f2 to
b9b3e59
Compare
Avatica 1.28 changed the Cursor.Accessor / AvaticaSite API: - AvaticaDrillSqlAccessor: implement the new unsigned accessors getUByte()/getUShort()/getUInt()/getULong() (returning jOOU types). Drill has no unsigned integer types, so they mirror the signed getters; the jOOU valueOf(<primitive>) overloads reinterpret the bits and never throw. - DrillResultSetImpl: AvaticaSite.get() gained a 'signed' parameter; pass true (Drill numeric types are signed), preserving the previous behaviour.
Two issues surfaced by Calcite 1.42's stronger plan-time constant reduction (ReduceExpressionsRule / predicate inference building Sargs): - DATE vs TIMESTAMP: expressions that SQL types as DATE but that Drill computes as a TIMESTAMP (e.g. DATE + INTERVAL YEAR) were folded to a TIMESTAMP literal. When such a literal landed in a DATE Sarg, RexSimplify failed with "TimestampString cannot be cast to DateString". Now emit a DATE literal (DateString value) whenever Calcite types the expression as DATE. - Null interpreter output: some CHAR constants now reach DrillConstExecutor and Drill's interpreter returns no value, causing an NPE. Leave the expression unfolded in that case instead of failing; it is evaluated at execution time. Fixes the TPC-H planning/execution suites (TestTpchPlanning, TestTpchExplain, TestTpchSingleMode, TestTpchDistributed, etc.).
DrillOptiq's ROW handling built each field-name string by calling getRexBuilder().makeLiteral(name).accept(this). The RexBuilder is null when converting a standalone expression with no input rel (e.g. constant folding via DrillConstExecutor), so ROW(...) over constants NPE'd under Calcite 1.42's stronger reduction. Build the Drill string literal directly instead.
Under Calcite 1.42, "WHERE employee_id = 1" constant-folds employee_id to the INT literal 1, so the column is INT rather than BIGINT. Update the baseline value accordingly (same behaviour change as TestLiteralAggFunction).
When a VARDECIMAL union computes a precision above the maximum of 38, Calcite 1.42 reduces the scale to fit (preserving integer digits), producing DECIMAL(38,4) instead of the previous DECIMAL(38,6). Both represent the values correctly; update TestVarlenDecimal.testWideningLimit to the new type.
Calcite 1.42's RexSimplify.simplifyLike mishandles a LIKE whose ESCAPE
character is itself a wildcard ('%' or '_'): simplifyMixedWildcards collapses
the escaped wildcard and produces an invalid/altered pattern (e.g. 'ABC%%'
ESCAPE '%' -> 'ABC%' ESCAPE '%'). Skip filter expression reduction when the
condition contains such a LIKE so the original pattern is preserved and
executed correctly (RegexpUtil is left strict).
Calcite 1.42 derives collations for constant/single-row VALUES and propagates them up through Project/Filter. Drill applies ordering in a later physical phase and has no logical collation-conversion rules, so such a collation requirement is unsatisfiable and planning fails with "CannotPlanException: ... sort=[...]". Strip derived collations from the tree before logical Volcano planning so the input and the derived target traits are collation-free. A Sort's own collation (an explicit ORDER BY) is preserved so ordering still works.
Calcite 1.42 no longer expands GROUP BY items that reference a SELECT-list alias (it still does so for HAVING), so queries like "SELECT length(n_name) AS len ... GROUP BY len" failed validation with "Expression 'n_name' is not being grouped" even though DrillConformance.isGroupByAlias() is true. Add a pre-validation GroupByAliasRewriter that expands such GROUP BY aliases into their defining expressions, restoring Drill's historical behavior.
Calcite 1.42 coerces BOOLEAN to a numeric type when a boolean is used in
arithmetic (e.g. "<boolean expr> * col"), emitting casts such as castINT(BIT)
that Drill did not implement ("Missing function implementation:
[castINT(BIT-OPTIONAL)]"). Generate Bit->Int/BigInt/Float4/Float8 cast
functions (true -> 1, false -> 0) alongside the existing Bit->TinyInt cast.
With Calcite 1.42, BOOLEAN is coerced to the numeric column type in set operations (true -> 1, false -> 0), and the BIT->numeric cast functions added for boolean arithmetic let such a union execute. The two DRILL-2590 tests that expected "SELECT int_col UNION [ALL] SELECT bool_col" to fail now assert that the union succeeds with the coerced values.
The previous collation stripping removed collations from every non-Sort node, which dropped the collation that an ORDER BY in a view (or subquery) propagates to the Project/Limit above its Sort -- causing the ordering to be lost (e.g. a view "... ORDER BY x DESC" feeding an outer LIMIT returned unsorted rows). Track whether a Sort exists in each node's subtree and strip a collation only when it is not anchored by a Sort. This still removes the spurious collations Calcite 1.42 attaches to constant VALUES (fixing CannotPlanException) while preserving ORDER BY semantics.
…errors
Calcite 1.42 constant-propagates scalars into calls such as FLATTEN, then asks
DrillConstExecutor to fold e.g. flatten('Sheri'). There is no flatten(<scalar>)
implementation, so materialization failed with "Missing function
implementation: [flatten(VARCHAR)]" and DrillConstExecutor threw a generic
constant-folding plan error -- masking the real runtime error.
When folding fails because no function implementation matches the argument
types, leave the expression unfolded so execution produces the proper error
(e.g. "Flatten does not support inputs of non-list values").
DrillPushFilterPastProjectRule still fires and correctly identifies the flatten-independent predicate (rownum = 100) as pushable, but it runs only in the cost-based (Volcano) logical phase. Under Calcite 1.42 the planner keeps the conjunctive filter combined above the flatten for this small input rather than splitting it -- the split plan's extra Filter operator is not cheaper on this dataset. The query result is unchanged. Update testPushFilterPastProject- WithFlatten to assert the combined filter above the flatten with no filter pushed below it.
DRILL-8537: Bump Calcite to Version 1.42
Description
I am attempting a new approach and instead of bumping Calcite from 1.34 -> 1.40, I'm going to try this one version at a time and see how far we get.
After reaching a known-good 1.38 (all unit tests passing), the remaining bump was done in a single step from 1.38 → 1.42 (skipping the 1.39 regressions, which are resolved by 1.42). Avatica was bumped 1.23 → 1.28 to match Calcite 1.42.
Current Status:
Significant Changes in Calcite 1.35
literal_aggfunction which allows literals in aggregate queries.VARDECIMALhandling.Significant Changes in Calcite 1.36
There are no significant changes in Calcite 1.36.
Significant Changes in Calcite 1.37
Significant Changes in Calcite 1.38
DrillSqlToRelConverterfor graceful handling of Calcite 1.38's strict type checkingASOFjoins, but Drill does not yet support that yet.Significant Changes in Calcite 1.39 – 1.42
Bumped directly from 1.38 to 1.42 (with Avatica 1.23 → 1.28). The notable breaking changes and how Drill adapts to them:
Schema resolution rewritten to a
LookupAPI (CALCITE-6029, 1.39).CalciteSchemano longer exposesgetImplicitSubSchema/getImplicitTable; sub-schema and table resolution now go throughsubSchemas()/tables()Lookups.DynamicSchema/DynamicRootSchemawere rewritten accordingly:subSchemas(). BothgetandgetIgnoreCasetrigger the lazy load, so single-identifier multi-level names such as`cp.default`/`dfs.tmp`keep working.tables().RexVisitor.visitNodeAndFieldIndex(1.41). Implemented inJdbcExpressionCheck(the only directRexVisitorimplementation); all other visitors extendRexVisitorImpland inherit the default.Type-system methods made
final(1.42).RelDataTypeSystem.getMaxNumericPrecision()/getMaxNumericScale()are nowfinal; removed Drill's overrides — the logic already lives ingetMaxPrecision/getMaxScale(which Drill overrides to 38 forDECIMAL).Window output column naming (1.42). Calcite no longer names window-function output columns
w<group>$....WindowPrulenow selects each group's output fields positionally (after the input fields) instead of by name prefix; the old name-based filter silently dropped all window columns, producing "field sizes are not equal".TRIMgrammar (1.42).CalciteResource.illegalFromEmpty()was removed;Parser.jjraises aParseExceptionforTRIM(FROM x)while preserving Drill'sTRIM(<flag> <chars>)extension.Avatica 1.28.
Cursor.Accessorgained unsigned accessors (getUByte/getUShort/getUInt/getULong, returning jOOU types) andAvaticaSite.getgained asignedparameter. Implemented the unsigned accessors inAvaticaDrillSqlAccessor(Drill has only signed types, so they mirror the signed getters) and passsigned=true.Stronger plan-time constant reduction. Calcite 1.42 reduces more aggressively and builds
Sargs during predicate inference, which surfaced two issues inDrillConstExecutor:DATE + INTERVAL YEARis typedDATEby Calcite but computed as aTIMESTAMPby Drill; folding it to aTIMESTAMPliteral then landing in aDATESargfailed withTimestampString cannot be cast to DateString. The executor now emits aDATEliteral whenever Calcite types the expression asDATE.CHARconstants make Drill's interpreter return no value; the executor now leaves such expressions unfolded instead of NPE-ing.ROW()in a constant context (1.42).DrillOptiq'sROWhandling built field-name literals viagetRexBuilder(), which is null when converting a standalone (constant-folded) expression; it now builds the Drill string literal directly.LIKE ... ESCAPEwhen the escape character is a wildcard. Calcite 1.42'sRexSimplify.simplifyLikemishandles aLIKEwhoseESCAPEcharacter is also a wildcard ('%'/'_') — it collapses the escaped wildcard and produces an altered pattern. Filter-expression reduction is skipped when the condition contains such aLIKE, preserving the original pattern.Constant
VALUEScollation. Calcite 1.42 derives collations for constant/single-rowVALUESand propagates them. Drill applies ordering in a later physical phase and has no logical collation-conversion rules, so this causedCannotPlanException: ... sort=[...]. Derived collations are stripped before logical Volcano planning; aSort's own collation (an explicitORDER BY) is preserved.GROUP BY by alias. Calcite 1.42 stopped expanding
GROUP BYitems that reference a SELECT alias (it still does forHAVING), soGROUP BY <alias>failed validation even thoughDrillConformance.isGroupByAlias()istrue. A pre-validationGroupByAliasRewriterrestores the behavior.BOOLEANin arithmetic. Calcite 1.42 coercesBOOLEANto a numeric type in arithmetic, emitting casts (e.g.castINT(BIT)) that Drill did not implement. AddedBit -> Int/BigInt/Float4/Float8cast functions (true -> 1,false -> 0).Other behavioral notes (test baselines updated):
WHERE col = N) is now constant-folded to anINTliteral, so such columns are reported asINTrather thanBIGINT.DECIMAL(38,4)) rather than dropping integer digits (DECIMAL(38,6)).Documentation
No user facing changes.
Testing
Ran existing unit tests.