fix(arrow): clarify error when Arrow field is missing field id#2655
Closed
glitchy wants to merge 1 commit into
Closed
fix(arrow): clarify error when Arrow field is missing field id#2655glitchy wants to merge 1 commit into
glitchy wants to merge 1 commit into
Conversation
57e910a to
814a93b
Compare
814a93b to
eb26f8f
Compare
When matching an Arrow record batch to an Iceberg schema by field id, ArrowArrayAccessor::field_partner failed with an opaque "Field id N not found in struct array" when the Arrow field carried no PARQUET:field_id metadata, pointing at the symptom rather than the cause. In id mode the error now names the field and notes the likely cause (missing PARQUET:field_id metadata), pointing at current_schema().as_ref().try_into() to preserve field ids; name mode gets a matching by-name message. Message-only change: the error fires in exactly the same cases as before. Closes apache#2654
eb26f8f to
9d8a82b
Compare
Author
|
this is bleeding impl details--closing. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #2654
When matching an Arrow record batch to an Iceberg schema by field id,
ArrowArrayAccessor::field_partner(crates/iceberg/src/arrow/value.rs) fails with an opaqueField id N not found in struct arraywhen the incoming Arrow field carries noPARQUET:field_idmetadata. That points at the symptom rather than the cause--a downstream consumer that hand-builds a record batch schema gets no hint that the fix is to derive it from the table schema.This enriches the error at its source. In
FieldMatchMode::Idthe message now names the field and notes the likely cause (a missingPARQUET:field_id), pointing atcurrent_schema().as_ref().try_into()to preserve field ids.FieldMatchMode::Namegets a matching by-name message.It is a message-only change--the error fires in exactly the same cases as before, so there is no behavior change and nothing new to reject.
Reported by @malon64 while testing #2185 from a downstream Rust ingestion tool.
Are these changes tested?
New unit test
test_field_partner_missing_field_id_error_is_actionableinarrow::value::test: matching a field by id against a struct array whose Arrow field lacksPARQUET:field_idreturns aDataInvaliderror that names the missing metadata key. Existingvalue.rsaccessor tests continue to pass.