Skip to content

fix(datafusion): return single row with count 0 for empty inserts#2712

Open
u70b3 wants to merge 1 commit into
apache:mainfrom
u70b3:fix/datafusion-reject-non-append-inserts
Open

fix(datafusion): return single row with count 0 for empty inserts#2712
u70b3 wants to merge 1 commit into
apache:mainfrom
u70b3:fix/datafusion-reject-non-append-inserts

Conversation

@u70b3

@u70b3 u70b3 commented Jun 25, 2026

Copy link
Copy Markdown

Which issue does this PR close?

What changes are included in this PR?

Fix empty inserts in the DataFusion integration.

When an INSERT produces no data files (e.g. INSERT INTO ... SELECT ... WHERE false), IcebergCommitExec previously returned an empty RecordBatch. DataFusion expects a single-row count result for DML statements, so this PR changes the empty-data path to return a batch with count 0.

for empty inserts, the code returns early when there are no data files, before it ever starts a transaction, so it will not create snapshot.

Are these changes tested?

Yes. Added tests verifying that:

  • Empty inserts return a single-row UInt64 count batch with value 0.
  • Existing insert tests continue to pass.

Local verification:

  • cargo test -p iceberg-datafusion
  • cargo clippy -p iceberg-datafusion --all-targets -- -D warnings
  • cargo fmt --all -- --check
  • git diff --check

@u70b3 u70b3 force-pushed the fix/datafusion-reject-non-append-inserts branch from 0f90cfc to 21ba002 Compare June 25, 2026 12:07
@u70b3 u70b3 changed the title fix(datafusion): reject non-append insert operations fix(datafusion): return single row with count 0 for empty inserts Jun 25, 2026

@huan233usc huan233usc left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good — returning one row with count = 0 is the right fix, and it matches
what the normal (non-empty) path already does. Tests are fine too.

One small thing about the description: it says this PR also stops empty inserts from
creating a snapshot. But that part already works on main — the code returns early
when there are no data files, before it ever starts a transaction, so an empty insert
never created a snapshot in the first place. The only real change here is what the
batch looks like (empty → one row with count 0). The "no snapshot" checks in the new
tests are still nice to have as guards, they just aren't testing anything new.

Could you drop that "skips creating a snapshot" line from the description? Just so it
matches what the PR actually does. Otherwise good to go.

"Empty insert on partitioned table must not create a new snapshot"
);

Ok(())

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can migrate these tests to sqlogictest? The plan is to migrate off integration tests completely and use sqlogictest for logic like this

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thanks for comment, already migrate some of the necessary testcase to sqlogictest

@u70b3 u70b3 closed this Jun 30, 2026
@u70b3 u70b3 reopened this Jun 30, 2026
@u70b3 u70b3 force-pushed the fix/datafusion-reject-non-append-inserts branch from 8402d58 to 14fbd0d Compare June 30, 2026 04:34
When an INSERT produces no data files, IcebergCommitExec previously returned

an empty RecordBatch. DataFusion expects a single-row count result, so this

change returns a batch with count=0 and skips creating a new snapshot.

Includes unit and integration tests for unpartitioned and partitioned tables.
@u70b3 u70b3 force-pushed the fix/datafusion-reject-non-append-inserts branch from 14fbd0d to 2ae10db Compare June 30, 2026 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataFusion empty INSERT returns empty batch and creates unnecessary snapshot

3 participants