Skip to content

fix: register dimensions without coordinates as columns#216

Merged
alxmrs merged 2 commits into
xqlsystems:mainfrom
ghostiee-11:fix/203-coordless-dimension-columns
Jul 2, 2026
Merged

fix: register dimensions without coordinates as columns#216
alxmrs merged 2 commits into
xqlsystems:mainfrom
ghostiee-11:fix/203-coordless-dimension-columns

Conversation

@ghostiee-11

Copy link
Copy Markdown
Contributor

Dimensions without coordinates (e.g. an image's channel/height/width axes) were dropped from the SQL schema, so SELECT * returned too few columns (2 instead of 5 in the #203 repro). This materialises a default integer index for each coordinate-less dimension at the reader boundary, so they show up as columns and keep their absolute position through chunked reads. Adds regression tests. Closes #203.

Before / after (issue repro)

Before (main), 2 columns:

 sample  images
      1     9.0
      1    10.0
      1    11.0
columns: ['sample', 'images']

After (this PR), 5 columns, values verified against xarray:

 sample  channel  height  width  images
      4        0       0      0    36.0
      4        0       0      1    37.0
      4        0       0      2    38.0
columns: ['sample', 'channel', 'height', 'width', 'images']
values equal xarray to_dataframe(): True

Full non-integration test suite passes locally.

…stems#203)

A Dataset can have "dimensions without coordinates" (e.g. an image's
channel/height/width axes). These are absent from ds.coords, so _parse_schema
dropped them from the SQL schema and SELECT * returned too few columns. Even if
they were added, slicing a block with isel synthesizes their index relative to
the block (restarting at 0 per partition), which would corrupt values on chunked
reads.

Materialise a default integer index coordinate for every coordinate-less
dimension once, at the reader entry points (read_xarray_table and
XarrayRecordBatchReader), via a new ensure_default_indexes() helper. This turns
them into ordinary dimension coordinates so the existing schema, record-batch,
filter-pushdown and round-trip machinery handles them correctly and they carry
their absolute position through chunked reads.

Adds regression tests for the schema, both record-batch builders (chunked along
a coordinate-less dim), and the from_dataset query path.
@ghostiee-11 ghostiee-11 changed the title fix: register dimensions without coordinates as columns (closes #203) fix: register dimensions without coordinates as columns Jul 2, 2026

@alxmrs alxmrs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far. Some notes. Thanks for the quick fix.

Comment thread tests/test_sql.py Outdated

@pytest.fixture
def coordless_dims_ds(self):
"""Mirror the fashion-mnist layout from issue #203: a dimension

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not refer to the GH issues in code unless there is a TODO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah truee.. will take care of this

Comment thread tests/test_df.py Outdated


def _coordless_ds():
"""A 2-D dataset whose dimensions have NO coordinates (issue #203)."""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's omit the reference to the GH issue.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with the below tests.

Comment thread tests/test_sql.py
coords={"sample": ("sample", np.arange(n_sample, dtype="int64"))},
).chunk({"sample": 1})

def test_coordless_dims_appear_as_columns(self, coordless_dims_ds):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this test.

Comment thread xarray_sql/df.py Outdated
block* (restarting at 0 in every partition). Materialising an explicit
``arange`` index up front turns them into ordinary dimension coordinates, so
they appear as columns and carry their absolute position through chunked
reads (issue #203). Datasets whose dimensions already have coordinates are

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's omit the gh issue ref.

Comment thread xarray_sql/df.py Outdated
Only *dimension coordinates* become dimension columns, so a dimension
without a coordinate would be dropped. Callers must run the Dataset through
:func:`ensure_default_indexes` first (the readers do) so every dimension has
a coordinate and appears as a column (issue #203).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good contract to document. Though, let's omit the issue ref.

Comment thread xarray_sql/reader.py Outdated
each block dict just before it's converted to Arrow. This
allows tests to track when iteration actually occurs.
"""
ds = ensure_default_indexes(ds)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think putting this in read_xarray below is better than the constructor.

Comment thread tests/test_df.py Outdated
assert batch.num_rows == expected_rows


def _coordless_ds():

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can omit the tests in this file altogether -- the public contract is covered by the sql tests file. Wdyt?

We don't mark it as such, but I see this new utility as a private function.

Privatise the index helper (_ensure_default_indexes), move its call from the
XarrayRecordBatchReader constructor to read_xarray, drop the df-level tests
(the public contract is covered by the sql tests), and remove the GitHub issue
references from code and test docstrings.

@alxmrs alxmrs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix!

@alxmrs alxmrs merged commit d8ffa52 into xqlsystems:main Jul 2, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Table that should have N columns only has 2

2 participants