refactor: add take_n_true method to BooleanArray#9823
Conversation
scovich
left a comment
There was a problem hiding this comment.
Thanks for looking into this.
Two questions:
- Do we expect this new
take_n_truemethod to have more use cases beyond the rather specific one demonstrated in this PR? (trying to assess the value of a new pub method) - Can we simplify the read plan logic even further, now that the new method handles nulls correctly (vs. old code did not)?
| /// | ||
| /// `filter` must not contain nulls (callers apply [`prep_null_mask_filter`] | ||
| /// first). If `filter` has at most `n` `true` values, a clone is returned. | ||
| fn truncate_filter_after_n_trues(filter: BooleanArray, n: usize) -> BooleanArray { |
There was a problem hiding this comment.
Do callers still (need to) apply this prep_null_mask_filter function, when the new code handles nulls correctly?
There was a problem hiding this comment.
it still need, we use the filter in the RowSelection::from_filters, and it need assert_eq!(filter.null_count(), 0);
There was a problem hiding this comment.
Ah, that function converts NULL to false, following SQL semantics that a filter only keeps rows for which the predicate is true?
Meanwhile, maybe it's worth adding a code comment to the prep_null_mask_filter call site in ReaderBuilder::with_predicate_options, explaining that RowSelection::from_filters can't handle NULL values (panic). Because nothing else in the method cares -- take_n_true correctly handles/preserves NULL values.
There was a problem hiding this comment.
Ah, that function converts NULL to false, following SQL semantics that a filter only keeps rows for which the predicate is true?
yes
added comment in 20b3d4b
alamb
left a comment
There was a problem hiding this comment.
Thanks @haohuaijin -- I think @scovich has some nice comments and suggestions
|
Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is ready for another look |
scovich
left a comment
There was a problem hiding this comment.
LGTM. One code comment suggestion to consider before merge.
| /// | ||
| /// `filter` must not contain nulls (callers apply [`prep_null_mask_filter`] | ||
| /// first). If `filter` has at most `n` `true` values, a clone is returned. | ||
| fn truncate_filter_after_n_trues(filter: BooleanArray, n: usize) -> BooleanArray { |
There was a problem hiding this comment.
Ah, that function converts NULL to false, following SQL semantics that a filter only keeps rows for which the predicate is true?
Meanwhile, maybe it's worth adding a code comment to the prep_null_mask_filter call site in ReaderBuilder::with_predicate_options, explaining that RowSelection::from_filters can't handle NULL values (panic). Because nothing else in the method cares -- take_n_true correctly handles/preserves NULL values.
|
thanks again @scovich |
alamb
left a comment
There was a problem hiding this comment.
Thanks @haohuaijin and @scovich
| /// assert_eq!(r, BooleanArray::from(vec![true, false, true, false, false, false])); | ||
| /// ``` | ||
| pub fn take_n_true(self, n: usize) -> BooleanArray { | ||
| let len = self.len(); |
There was a problem hiding this comment.
Since this takes self by value, we could potentially use Buffer.into_mutable to reuse the allocation if possible
Which issue does this PR close?
BooleanArray::take_n_true#9803Rationale for this change
BooleanArray::take_n_true#9803What changes are included in this PR?
move
truncate_filter_after_n_truesin read_plan.rs to BooleanArraytake_n_truemethodAre these changes tested?
yes, already have test case
Are there any user-facing changes?