fix(datafusion): return single row with count 0 for empty inserts#2712
fix(datafusion): return single row with count 0 for empty inserts#2712u70b3 wants to merge 1 commit into
Conversation
0f90cfc to
21ba002
Compare
huan233usc
left a comment
There was a problem hiding this comment.
Code looks good — returning one row with count = 0 is the right fix, and it matches
what the normal (non-empty) path already does. Tests are fine too.
One small thing about the description: it says this PR also stops empty inserts from
creating a snapshot. But that part already works on main — the code returns early
when there are no data files, before it ever starts a transaction, so an empty insert
never created a snapshot in the first place. The only real change here is what the
batch looks like (empty → one row with count 0). The "no snapshot" checks in the new
tests are still nice to have as guards, they just aren't testing anything new.
Could you drop that "skips creating a snapshot" line from the description? Just so it
matches what the PR actually does. Otherwise good to go.
| "Empty insert on partitioned table must not create a new snapshot" | ||
| ); | ||
|
|
||
| Ok(()) |
There was a problem hiding this comment.
Do you think we can migrate these tests to sqlogictest? The plan is to migrate off integration tests completely and use sqlogictest for logic like this
There was a problem hiding this comment.
yes, thanks for comment, already migrate some of the necessary testcase to sqlogictest
8402d58 to
14fbd0d
Compare
When an INSERT produces no data files, IcebergCommitExec previously returned an empty RecordBatch. DataFusion expects a single-row count result, so this change returns a batch with count=0 and skips creating a new snapshot. Includes unit and integration tests for unpartitioned and partitioned tables.
14fbd0d to
2ae10db
Compare
Which issue does this PR close?
What changes are included in this PR?
Fix empty inserts in the DataFusion integration.
When an
INSERTproduces no data files (e.g.INSERT INTO ... SELECT ... WHERE false),IcebergCommitExecpreviously returned an emptyRecordBatch. DataFusion expects a single-row count result for DML statements, so this PR changes the empty-data path to return a batch with count0.for empty inserts, the code returns early when there are no data files, before it ever starts a transaction, so it will not create snapshot.
Are these changes tested?
Yes. Added tests verifying that:
UInt64count batch with value0.Local verification:
cargo test -p iceberg-datafusioncargo clippy -p iceberg-datafusion --all-targets -- -D warningscargo fmt --all -- --checkgit diff --check