Skip to content

feat(transaction): add update_column_doc to UpdateSchemaAction#2743

Open
viirya wants to merge 1 commit into
apache:mainfrom
viirya:feat/update-column-doc
Open

feat(transaction): add update_column_doc to UpdateSchemaAction#2743
viirya wants to merge 1 commit into
apache:mainfrom
viirya:feat/update-column-doc

Conversation

@viirya

@viirya viirya commented Jun 29, 2026

Copy link
Copy Markdown
Member

Which issue does this PR close?

What changes are included in this PR?

UpdateSchemaAction only supported add_column / delete_column, so changing a column's doc meant delete + re-add — which assigns a fresh field ID. Because Iceberg resolves columns by field ID, that breaks reads of older data files (the column projects as all-NULL) and severs schema-evolution history. Editing a comment shouldn't do that.

This adds update_column_doc, modeled after iceberg-java's UpdateSchema.updateColumnDoc(String name, String newDoc):

tx.update_schema()
    .update_column_doc("z", Some("the z column".to_string()))         // set/replace
    .update_column_doc("person.name", Some("full name".to_string()))  // nested, dotted path
    .update_column_doc("legacy", None);                               // clear
  • Some(doc) sets/replaces; None clears — the Option<String> equivalent of Java's String + null.
  • The field ID and every other attribute are preserved.
  • Nested fields use the dotted path form (e.g. "address.city"), consistent with delete_column. They work without special handling: doc updates are keyed by field ID (resolved from the name via field_by_name), and the existing schema rebuild already recurses into structs/lists/maps.
  • The column must exist at commit time, else the commit fails with PreconditionFailed. When the same column is updated more than once, the last update wins.

Implementation is confined to transaction/update_schema.rs: a doc_updates field, the builder method, name→field-id resolution in commit (same pattern as delete validation), and a small resolved_doc helper applied in rebuild_field.

Are these changes tested?

Four new tests in transaction::update_schema:

  • test_update_column_doc_preserves_field_id — sets a doc on a root column and asserts the doc changed and the field ID is unchanged (siblings untouched).
  • test_update_column_doc_on_nested_field — sets a doc on person.name (nested), asserting the doc and the field/enclosing-struct IDs.
  • test_update_column_doc_clears_docSome(...) then None on the same column; last-wins clears it.
  • test_update_column_doc_missing_column_errors — updating a missing column fails with PreconditionFailed.

The first test fails if the doc-resolution logic is reverted (verified). Full iceberg lib suite (1375 tests) passes; clippy and rustfmt clean.

Changing a column's doc previously required deleting and re-adding it, which
assigns a fresh field ID. Since Iceberg identifies columns by field ID, that
breaks reads of older data files (the column projects as all-NULL) and severs
schema-evolution history.

Add `UpdateSchemaAction::update_column_doc(name, Option<String>)`, mirroring
iceberg-java's `UpdateSchema.updateColumnDoc`: it updates only the doc and
preserves the field ID and all other attributes. `Some(doc)` sets/replaces the
doc, `None` clears it (the Option equivalent of Java's String + null). Nested
fields are addressed by dotted path, consistent with `delete_column`, and are
supported naturally since the schema rebuild already recurses.

Closes apache#2742
@viirya

viirya commented Jun 30, 2026

Copy link
Copy Markdown
Member Author

The CI failure is irrelevant:

Error: /home/runner/go/pkg/mod/github.com/go-git/go-git/v5@v5.13.0/plumbing/hash/hash.go:10:2: github.com/pjbgf/sha1cd@v0.3.0: read "https://proxy.golang.org/github.com/pjbgf/sha1cd/@v/v0.3.0.zip": stream error: stream ID 123; INTERNAL_ERROR; received from peer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add update_column_doc to UpdateSchemaAction (changing a column doc shouldn't reassign its field ID)

1 participant