GH-46179: [Python] Bump index level once if pandas df already contains __index_level_i__ column by AlenkaF · Pull Request #46884 · apache/arrow

AlenkaF · 2025-06-23T14:27:24Z

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

GitHub Issue: [Python] Table.from_pandas creates duplicate column names if the dataframe already contains __index_level_i__ columns #46179

github-actions · 2025-06-23T14:27:50Z

⚠️ GitHub issue #46179 has been automatically assigned in GitHub to PR creator.

Copilot

Pull request overview

This PR addresses GH-46179 in PyArrow’s pandas conversion by avoiding duplicate Arrow field names when a pandas DataFrame already contains __index_level_i__ columns, ensuring generated index columns use a non-conflicting name.

Changes:

Update generated index column naming to pick the next available __index_level_{j}__ name if the default collides with existing columns.
Ensure uniqueness across both DataFrame columns and previously generated index columns when multiple index levels are serialized.
Add regression tests for single-index and MultiIndex cases where __index_level_0__ already exists as a DataFrame column.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
python/pyarrow/pandas_compat.py	Adjusts index-level name generation to avoid collisions with existing column names and previously assigned index column names.
python/pyarrow/tests/test_pandas.py	Updates existing metadata assertion and adds new regression tests validating the bumped index column names.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

AlenkaF · 2026-05-26T09:13:46Z

+        j = i
+        while f'__index_level_{j:d}__' in column_names:
+            j += 1
+        return f'__index_level_{j:d}__'


Isn't schema based conversion already buggy without this change when it comes to the index levels? It probably silently ignores the duplicated level 0 currently?

OK, getting used to this :) Copilot can't answer. Well, I think the change suggested can be a possible follow-up if we see this would be needed. But I do not think it is in the scope of this PR.

AlenkaF · 2026-05-26T09:28:55Z

@jorisvandenbossche what do you think of the proposed change in this PR?

github-actions Bot added Component: Python awaiting review Awaiting review labels Jun 23, 2025

AlenkaF mentioned this pull request Jun 23, 2025

[Python] Table.from_pandas creates duplicate column names if the dataframe already contains __index_level_i__ columns #46179

Open

AlenkaF changed the title ~~GH-46179: Bump index level once if pandas df already contains __index_level_i__ column~~ GH-46179: [Python] Bump index level once if pandas df already contains __index_level_i__ column Jun 23, 2025

AlenkaF added 2 commits May 25, 2026 13:32

Update _index_level_name

013df0c

Change the approach, add tests

3ee4599

AlenkaF force-pushed the gh-46179-duplicates-index-levels branch from c915159 to 3ee4599 Compare May 25, 2026 14:34

Copilot AI review requested due to automatic review settings May 25, 2026 14:34

Copilot started reviewing on behalf of AlenkaF May 25, 2026 14:34 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels May 26, 2026

AlenkaF marked this pull request as ready for review May 26, 2026 09:28

AlenkaF requested review from raulcd and rok as code owners May 26, 2026 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-46179: [Python] Bump index level once if pandas df already contains __index_level_i__ column#46884

GH-46179: [Python] Bump index level once if pandas df already contains __index_level_i__ column#46884
AlenkaF wants to merge 2 commits into
apache:mainfrom
AlenkaF:gh-46179-duplicates-index-levels

AlenkaF commented Jun 23, 2025 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Jun 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

AlenkaF May 26, 2026

Uh oh!

AlenkaF May 26, 2026

Uh oh!

AlenkaF commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AlenkaF commented Jun 23, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions Bot commented Jun 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

AlenkaF May 26, 2026

Choose a reason for hiding this comment

Uh oh!

AlenkaF May 26, 2026

Choose a reason for hiding this comment

Uh oh!

AlenkaF commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AlenkaF commented Jun 23, 2025 •

edited by github-actions Bot

Loading