GH-46179: [Python] Bump index level once if pandas df already contains __index_level_i__ column#46884
GH-46179: [Python] Bump index level once if pandas df already contains __index_level_i__ column#46884AlenkaF wants to merge 2 commits into
Conversation
|
|
c915159 to
3ee4599
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses GH-46179 in PyArrow’s pandas conversion by avoiding duplicate Arrow field names when a pandas DataFrame already contains __index_level_i__ columns, ensuring generated index columns use a non-conflicting name.
Changes:
- Update generated index column naming to pick the next available
__index_level_{j}__name if the default collides with existing columns. - Ensure uniqueness across both DataFrame columns and previously generated index columns when multiple index levels are serialized.
- Add regression tests for single-index and MultiIndex cases where
__index_level_0__already exists as a DataFrame column.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| python/pyarrow/pandas_compat.py | Adjusts index-level name generation to avoid collisions with existing column names and previously assigned index column names. |
| python/pyarrow/tests/test_pandas.py | Updates existing metadata assertion and adds new regression tests validating the bumped index column names. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| j = i | ||
| while f'__index_level_{j:d}__' in column_names: | ||
| j += 1 | ||
| return f'__index_level_{j:d}__' |
There was a problem hiding this comment.
Isn't schema based conversion already buggy without this change when it comes to the index levels? It probably silently ignores the duplicated level 0 currently?
There was a problem hiding this comment.
OK, getting used to this :) Copilot can't answer. Well, I think the change suggested can be a possible follow-up if we see this would be needed. But I do not think it is in the scope of this PR.
|
@jorisvandenbossche what do you think of the proposed change in this PR? |
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?