Skip to content

refactor: add take_n_true method to BooleanArray#9823

Merged
scovich merged 6 commits into
apache:mainfrom
haohuaijin:add-take-n-true
Jul 1, 2026
Merged

refactor: add take_n_true method to BooleanArray#9823
scovich merged 6 commits into
apache:mainfrom
haohuaijin:add-take-n-true

Conversation

@haohuaijin

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

move truncate_filter_after_n_trues in read_plan.rs to BooleanArray take_n_true method

Are these changes tested?

yes, already have test case

Are there any user-facing changes?

@github-actions github-actions Bot added parquet Changes to the parquet crate arrow Changes to the arrow crate labels Apr 25, 2026

@scovich scovich left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this.

Two questions:

  1. Do we expect this new take_n_true method to have more use cases beyond the rather specific one demonstrated in this PR? (trying to assess the value of a new pub method)
  2. Can we simplify the read plan logic even further, now that the new method handles nulls correctly (vs. old code did not)?

Comment thread arrow-array/src/array/boolean_array.rs Outdated
Comment thread arrow-array/src/array/boolean_array.rs Outdated
///
/// `filter` must not contain nulls (callers apply [`prep_null_mask_filter`]
/// first). If `filter` has at most `n` `true` values, a clone is returned.
fn truncate_filter_after_n_trues(filter: BooleanArray, n: usize) -> BooleanArray {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do callers still (need to) apply this prep_null_mask_filter function, when the new code handles nulls correctly?

@haohuaijin haohuaijin Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it still need, we use the filter in the RowSelection::from_filters, and it need assert_eq!(filter.null_count(), 0);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that function converts NULL to false, following SQL semantics that a filter only keeps rows for which the predicate is true?

Meanwhile, maybe it's worth adding a code comment to the prep_null_mask_filter call site in ReaderBuilder::with_predicate_options, explaining that RowSelection::from_filters can't handle NULL values (panic). Because nothing else in the method cares -- take_n_true correctly handles/preserves NULL values.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that function converts NULL to false, following SQL semantics that a filter only keeps rows for which the predicate is true?

yes

added comment in 20b3d4b

Comment thread parquet/src/arrow/arrow_reader/read_plan.rs Outdated

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @haohuaijin -- I think @scovich has some nice comments and suggestions

@haohuaijin

Copy link
Copy Markdown
Contributor Author

Thanks @scovich @alamb , sorry for delay, i just return from the vacation, i will address the comment recently

@alamb alamb marked this pull request as draft May 6, 2026 15:01
@alamb

alamb commented May 6, 2026

Copy link
Copy Markdown
Contributor

Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is ready for another look

@haohuaijin

Copy link
Copy Markdown
Contributor Author

Thanks for your reviews again @scovich , sorry for the delay, i address all comment in 7a37934, ready for reviews now.

@haohuaijin haohuaijin marked this pull request as ready for review July 1, 2026 01:51

@scovich scovich left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One code comment suggestion to consider before merge.

///
/// `filter` must not contain nulls (callers apply [`prep_null_mask_filter`]
/// first). If `filter` has at most `n` `true` values, a clone is returned.
fn truncate_filter_after_n_trues(filter: BooleanArray, n: usize) -> BooleanArray {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that function converts NULL to false, following SQL semantics that a filter only keeps rows for which the predicate is true?

Meanwhile, maybe it's worth adding a code comment to the prep_null_mask_filter call site in ReaderBuilder::with_predicate_options, explaining that RowSelection::from_filters can't handle NULL values (panic). Because nothing else in the method cares -- take_n_true correctly handles/preserves NULL values.

@scovich scovich merged commit d8d3fa3 into apache:main Jul 1, 2026
32 checks passed
@haohuaijin

Copy link
Copy Markdown
Contributor Author

thanks again @scovich

@haohuaijin haohuaijin deleted the add-take-n-true branch July 1, 2026 14:36

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @haohuaijin and @scovich

/// assert_eq!(r, BooleanArray::from(vec![true, false, true, false, false, false]));
/// ```
pub fn take_n_true(self, n: usize) -> BooleanArray {
let len = self.len();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this takes self by value, we could potentially use Buffer.into_mutable to reuse the allocation if possible

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a issue to track #10251

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a new method BooleanArray::take_n_true

3 participants