Skip to content

fix(arrow): clarify error when Arrow field is missing field id#2655

Closed
glitchy wants to merge 1 commit into
apache:mainfrom
glitchy:fix/parquet-writer-field-id-validation
Closed

fix(arrow): clarify error when Arrow field is missing field id#2655
glitchy wants to merge 1 commit into
apache:mainfrom
glitchy:fix/parquet-writer-field-id-validation

Conversation

@glitchy

@glitchy glitchy commented Jun 16, 2026

Copy link
Copy Markdown

Which issue does this PR close?

Closes #2654

When matching an Arrow record batch to an Iceberg schema by field id, ArrowArrayAccessor::field_partner (crates/iceberg/src/arrow/value.rs) fails with an opaque Field id N not found in struct array when the incoming Arrow field carries no PARQUET:field_id metadata. That points at the symptom rather than the cause--a downstream consumer that hand-builds a record batch schema gets no hint that the fix is to derive it from the table schema.

This enriches the error at its source. In FieldMatchMode::Id the message now names the field and notes the likely cause (a missing PARQUET:field_id), pointing at current_schema().as_ref().try_into() to preserve field ids. FieldMatchMode::Name gets a matching by-name message.

It is a message-only change--the error fires in exactly the same cases as before, so there is no behavior change and nothing new to reject.

Reported by @malon64 while testing #2185 from a downstream Rust ingestion tool.

Are these changes tested?

New unit test test_field_partner_missing_field_id_error_is_actionable in arrow::value::test: matching a field by id against a struct array whose Arrow field lacks PARQUET:field_id returns a DataInvalid error that names the missing metadata key. Existing value.rs accessor tests continue to pass.

@glitchy glitchy force-pushed the fix/parquet-writer-field-id-validation branch 2 times, most recently from 57e910a to 814a93b Compare June 16, 2026 03:54
@glitchy glitchy changed the title fix(writer): fail fast when record batch schema is missing field ids fix(writer): validate field ids in record batch schema Jun 16, 2026
@glitchy glitchy force-pushed the fix/parquet-writer-field-id-validation branch from 814a93b to eb26f8f Compare June 16, 2026 03:58
When matching an Arrow record batch to an Iceberg schema by field id,
ArrowArrayAccessor::field_partner failed with an opaque "Field id N not found
in struct array" when the Arrow field carried no PARQUET:field_id metadata,
pointing at the symptom rather than the cause.

In id mode the error now names the field and notes the likely cause (missing
PARQUET:field_id metadata), pointing at current_schema().as_ref().try_into() to
preserve field ids; name mode gets a matching by-name message. Message-only
change: the error fires in exactly the same cases as before.

Closes apache#2654
@glitchy glitchy changed the title fix(writer): validate field ids in record batch schema fix(arrow): clarify error when an Arrow field is missing its field id Jun 16, 2026
@glitchy glitchy force-pushed the fix/parquet-writer-field-id-validation branch from eb26f8f to 9d8a82b Compare June 16, 2026 04:27
@glitchy glitchy changed the title fix(arrow): clarify error when an Arrow field is missing its field id fix(arrow): clarify error when Arrow field is missing its field id Jun 16, 2026
@glitchy glitchy changed the title fix(arrow): clarify error when Arrow field is missing its field id fix(arrow): clarify error when Arrow field is missing field id Jun 16, 2026
@glitchy

glitchy commented Jun 16, 2026

Copy link
Copy Markdown
Author

this is bleeding impl details--closing.

@glitchy glitchy closed this Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ParquetWriter: opaque error when record batch schema is missing field ids

1 participant