feat(parquet): batch consecutive null/empty rows in write_list#9752
Conversation
Restructure `write_list()` to accumulate consecutive null and empty rows and flush them in a single `visit_leaves()` call using `extend(repeat_n(...))`, instead of calling `visit_leaves()` per row. With sparse data (99% nulls), a 4096-row batch previously triggered ~4000 individual tree traversals, each pushing a single value per leaf. Now consecutive null/empty runs are collapsed into one traversal that extends all leaf level buffers in bulk. This follows the same pattern already used by `write_struct()`. The `write_non_null_slice` path is unchanged since each non-null row has different offsets and cannot be batched. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
alamb
left a comment
There was a problem hiding this comment.
Thank you @HippoBaro -- this looks like a nice improvement to me
One thought i had that could make this code asier to follow (maybe as a follow on PR) would be to try and encapsulate the logic (pending_empties, pending_nulls, and then the logic to write the slices) somehow (maybe in a struct? Or a method on LevelInfoBuilder)
| let def_levels = leaf.def_levels.as_mut().unwrap(); | ||
| def_levels.push(ctx.def_level - 1); | ||
| }) | ||
| let write_null_run = |child: &mut LevelInfoBuilder, count: usize| { |
There was a problem hiding this comment.
It took me a little while to grok that the empty and null cases got reversed here, but once I got over that it seems good to me
|
run benchmark arrow_writer |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing batch_consecute_null_rows (6244daf) to 89b1497 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
🚀 -- thanks @HippoBaro |
Which issue does this PR close?
Rationale for this change
See #9731
What changes are included in this PR?
Restructure
write_list()to accumulate consecutive null and empty rows and flush them in a singlevisit_leaves()call usingextend(repeat_n(...)), instead of callingvisit_leaves()per row.With sparse data (99% nulls), a 4096-row batch previously triggered ~4000 individual tree traversals, each pushing a single value per leaf. Now consecutive null/empty runs are collapsed into one traversal that extends all leaf level buffers in bulk.
This follows the same pattern already used by
write_struct(). Thewrite_non_null_slicepath is unchanged since each non-null row has different offsets and cannot be batched.Are these changes tested?
All tests passing; existing tests give 100% coverage.
Are there any user-facing changes?
N/A