Skip to content

feat(parquet): compact level representation with generic writer dispatch#9831

Merged
alamb merged 4 commits intoapache:mainfrom
HippoBaro:compact_lvl_repr
May 7, 2026
Merged

feat(parquet): compact level representation with generic writer dispatch#9831
alamb merged 4 commits intoapache:mainfrom
HippoBaro:compact_lvl_repr

Conversation

@HippoBaro
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

See #9731

What changes are included in this PR?

Represent definition and repetition levels as LevelData/LevelDataRef with Absent, Materialized, and Uniform variants, and thread this through Arrow level generation, CDC chunking, and the generic column writer.

Uniform level runs, such as required fields and all-null pages, can now be encoded without materializing dense Vec<i16> buffers. Add bulk run support to LevelEncoder/RleEncoder so repeated levels are encoded in amortized O(1) after the RLE warmup, while preserving histogram, row count, null count, page splitting, and CDC chunk accounting.

Are these changes tested?

All tests passing. Coverage exercises bulk RLE level encoding, compact/uniform LevelData slicing and writer roundtrips across Parquet v1/v2, and CDC/Arrow writer behavior including all-null and nested-level cases.

Are there any user-facing changes?

None.

@github-actions github-actions Bot added the parquet Changes to the parquet crate label Apr 25, 2026
@HippoBaro HippoBaro changed the title Compact lvl repr feat(parquet): compact level representation with generic writer dispatch Apr 26, 2026
@HippoBaro
Copy link
Copy Markdown
Contributor Author

@alamb @etseidl This is the second-to-last PR from #9653 🙇 This one is bigger than the others. I haven’t found a clean way to split it further.

Copy link
Copy Markdown
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it through the first pass. Everything looks good, but I want to do a second pass when my brain is more awake 😅

/// API wraps caller-provided `&[i16]` slices directly as `Materialized`, while the Arrow
/// writer path converts owned `LevelData` via `.as_ref()` (which may also produce `Uniform`).
#[derive(Debug, Clone, Copy)]
pub(crate) enum LevelDataRef<'a> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why this is defined here rather than where LevelData is defined.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional: LevelData lives in the Arrow level builder, but GenericColumnWriter::write_batch_internal needs the borrowed representation for both Arrow-owned LevelData and the public non-Arrow write_batch API that receives &[i16]. Putting LevelDataRef next to LevelData would make column::writer depend upward on the Arrow writer module.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Just wanted to make sure it was intentional.

@alamb

This comment has been minimized.

@adriangbot

This comment has been minimized.

@adriangbot

This comment has been minimized.

@adriangbot

This comment has been minimized.

@adriangbot

This comment has been minimized.

@adriangbot

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went over this PR carefully this morning and it is really nice work 👌 -- thank you so much @HippoBaro (and @etseidl !). As I mentioned I think this PR makes the code easier to read AND faster which seems like a win all around in my mind

I reviewed correctness and code coverage (via cargo-llvm) and it all looks good to me

I had a bunch of suggestions that might improve the comments / readability but I don't think any of them is required to merge this PR.

Comment thread parquet/src/encodings/levels.rs
Comment thread parquet/src/encodings/levels.rs Outdated
Comment thread parquet/src/encodings/levels.rs Outdated
Comment thread parquet/src/arrow/arrow_writer/levels.rs
Comment thread parquet/src/arrow/arrow_writer/levels.rs
let rep_levels = self.rep_levels.as_ref().map(|levels| {
levels[chunk.level_offset..chunk.level_offset + chunk.num_levels].to_vec()
});
let def_levels = self.def_levels.slice(chunk.level_offset, chunk.num_levels);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new encpsulation makes this much easier to read I think -- so not only does this PR make the code faster, I also think it makes it easier to follow ❤️

Comment thread parquet/src/column/writer/mod.rs Outdated
Comment thread parquet/src/column/writer/mod.rs Outdated
self.page_metrics.num_page_nulls += (levels.len() - values_to_write) as u64;
values_to_write
}
LevelDataRef::Uniform { value, count } => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the key part of the whole PR, right? A special case when the levels information is the same (all null, all non null)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! This whole refactor allows me to add this Uniform case, which makes all-null (or really any uniform) data much faster to encode.

Comment thread parquet/src/column/writer/mod.rs Outdated
@alamb

This comment has been minimized.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 1, 2026

The clickbench results seem to report a few slower queries. I will rerun to reproduce

@adriangbot

This comment has been minimized.

@adriangbot

This comment has been minimized.

@adriangbot

This comment has been minimized.

@Dandandan

This comment has been minimized.

@adriangbot

This comment has been minimized.

@adriangbot

This comment has been minimized.

@HippoBaro
Copy link
Copy Markdown
Contributor Author

Thank you all for the reviews and feedback! I’ll push a follow-up commit to address them, hopefully by EOD.

@alamb

This comment has been minimized.

@alamb

This comment has been minimized.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 3, 2026

Some of the benchmark runner results looked like this might slow things down. I started two more benchmark runs to see if there is a repeatable pattenr

@adriangbot

This comment has been minimized.

@adriangbot

This comment has been minimized.

@adriangbot

This comment has been minimized.

@adriangbot

This comment has been minimized.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 4, 2026

🤔 some of the benchmarks show a consistent slowdown -- @HippoBaro do you have any insight as to why they would get slower?

float_with_nans/zstd_parquet_2                     1.01    135.7±3.04ms   103.2 MB/sec    1.00    134.7±1.95ms   103.9 MB/sec
list_primitive/bloom_filter                        1.03   336.3±13.61ms  1621.7 MB/sec    1.00   327.6±13.07ms  1664.8 MB/sec
list_primitive/cdc                                 1.05    360.4±5.16ms  1513.3 MB/sec    1.00    344.1±3.20ms  1584.8 MB/sec
list_primitive/default                             1.06    249.3±4.24ms     2.1 GB/sec    1.00    235.8±3.44ms     2.3 GB/sec
list_primitive/parquet_2                           1.05    270.5±2.23ms  2016.2 MB/sec    1.00    257.5±2.07ms     2.1 GB/sec

list_primitive/bloom_filter                        1.10   375.0±12.71ms  1454.5 MB/sec    1.00   339.5±12.02ms  1606.5 MB/sec
list_primitive/cdc                                 1.07   377.0±11.73ms  1446.8 MB/sec    1.00    351.2±6.03ms  1553.1 MB/sec
list_primitive/default                             1.13    285.0±5.95ms  1913.7 MB/sec    1.00    253.1±5.94ms     2.1 GB/sec
list_primitive/parquet_2                           1.11    301.4±2.78ms  1809.7 MB/sec    1.00    271.8±3.79ms  2006.7 MB/sec
list_primitive/zstd                                1.09    530.4±6.52ms  1028.2 MB/sec    1.00    488.4±4.52ms  1116.6 MB/sec
list_primitive/zstd_parquet_2                      1.05    506.5±5.09ms  1076.7 MB/sec    1.00    483.5±2.12ms  1128.1 MB/sec
list_primitive_non_null/bloom_filter               1.05   472.1±27.45ms  1152.7 MB/sec    1.00   450.2±20.68ms  1208.9 MB/sec
list_primitive_non_null/cdc                        1.03   443.9±10.39ms  1226.1 MB/sec    1.00   431.9±11.28ms  1260.0 MB/sec
list_primitive_non_null/default                    1.01   310.0±13.33ms  1755.5 MB/sec    1.00   307.3±12.36ms  1771.2 MB/sec
list_primitive_non_null/parquet_2                  1.20    365.4±4.33ms  1489.3 MB/sec    1.00   303.7±14.49ms  1791.8 MB/sec

@HippoBaro
Copy link
Copy Markdown
Contributor Author

Hum, I reran the above benchmarks locally on my laptop, both before and after rebasing onto main, and I can’t reproduce the slowdown. I get very consistent overlapping results (list_primitive_non_null/parquet_2):

main 0e478d8f5:    [279.28 ms 280.83 ms 282.68 ms]
branch ccb0ef114:  [277.20 ms 278.52 ms 280.15 ms]

I’m not sure I trust this particular benchmark runner result. For example, the worst offender in your list, list_primitive_non_null/parquet_2, was flat in the latest benchmark run:

 list_primitive_non_null/parquet_2  1.01  311.3±6.23ms  1748.4 MB/sec  1.00  308.4±13.64ms  1764.9 MB/sec

So I strongly suspect this is benchmark noise 🤷

@adriangbot

This comment has been minimized.

@adriangbot

This comment has been minimized.

@HippoBaro
Copy link
Copy Markdown
Contributor Author

@alamb These numbers look a lot more reasonable. The single-digit regressions do look consistent, though. I can’t reproduce them locally either. On my Apple laptop, both main and this branch score within Criterion’s margin of error.

It could be a CPU architecture thing. If those regressions are unacceptable, I can try to profile the code on an x86 VM later in the week.

@etseidl
Copy link
Copy Markdown
Contributor

etseidl commented May 5, 2026

@alamb These numbers look a lot more reasonable. The single-digit regressions do look consistent, though. I can’t reproduce them locally either. On my Apple laptop, both main and this branch score within Criterion’s margin of error.

It could be a CPU architecture thing. If those regressions are unacceptable, I can try to profile the code on an x86 VM later in the week.

I'll try reproducing on my Xeon workstation.

@etseidl
Copy link
Copy Markdown
Contributor

etseidl commented May 5, 2026

So the regression is def there. Around 5% on average for the list_primitive's. I ran the list_primitive/default through samply, and it looks like a good bit of the slowdown is in LevelInfoBuilder::write_leaf. I'm seeing extend in main at 6100 samples, being replaced by extend_from_iter 3480, extend 3070, append_rep_level_run 1890 (total ~8400). Looking at the code, the only real difference is passing through all the matchs to pick the right enum variant. Looking at just the eventual extend calls, they add up to around 6700, so overhead is adding the other 1700.

Would it be possible to have different versions of write_leaf for each variant?

Honestly I'm far more tolerant of write regressions than read. And 5% just for lists seems a fair trade.

@HippoBaro
Copy link
Copy Markdown
Contributor Author

Thank you @etseidl! That’s super useful context 🙇 Let me see if I can refactor this.

@HippoBaro
Copy link
Copy Markdown
Contributor Author

Thank you @etseidl and @alamb for pushing back! The regression was fairly straightforward: the compact level representation added extra branching/dispatch on a hot path. At first I thought this was the price to pay for speeding up the Uniform and Absent level representations. This was particularly expensive for list columns, because each non-empty list row called back into child level generation, reaching write_leaf for primitive children.

It turns out that I hadn't considered a good opportunity to batch writes there as well. We now batch consecutive non-empty list rows into a single child level write, then walk the appended repetition levels backwards to mark list-row boundaries.

On my laptop these benchmarks show low single-digit improvements for the list_primitive cases, but your mileage may vary.

@Dandandan

This comment has been minimized.

@adriangbot

This comment has been minimized.

@adriangbot

This comment has been minimized.

@Dandandan
Copy link
Copy Markdown
Contributor

list_primitive_sparse_99pct_null/bloom_filter      1.11     12.9±0.10ms     2.8 GB/sec    1.00     11.6±0.09ms     3.1 GB/sec
list_primitive_sparse_99pct_null/cdc               1.12     25.0±0.12ms  1492.7 MB/sec    1.00     22.3±0.07ms  1678.0 MB/sec
list_primitive_sparse_99pct_null/default           1.12     12.5±0.05ms     2.9 GB/sec    1.00     11.2±0.05ms     3.3 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.11     12.5±0.06ms     2.9 GB/sec    1.00     11.3±0.09ms     3.2 GB/sec
list_primitive_sparse_99pct_null/zstd              1.10     14.3±0.10ms     2.5 GB/sec    1.00     13.1±0.08ms     2.8 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.11     12.6±0.14ms     2.9 GB/sec    1.00     11.4±0.09ms     3.2 GB/sec

might this still be a regression?

@etseidl
Copy link
Copy Markdown
Contributor

etseidl commented May 6, 2026

might this still be a regression?

A bit, but only when compared to previously optimized code. Compared to before @HippoBaro started, it's still quite nice. (comparing to #9654 f5d6dc3)

list_primitive_sparse_99pct_null/bloom_filter      1.00      9.1±0.07ms     4.0 GB/sec    3.15     28.6±0.36ms  1305.5 MB/sec
list_primitive_sparse_99pct_null/cdc               1.00     18.7±0.44ms  1997.9 MB/sec  
list_primitive_sparse_99pct_null/default           1.00      8.6±0.11ms     4.2 GB/sec    3.24     27.9±0.22ms  1337.2 MB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00      8.7±0.21ms     4.2 GB/sec    3.29     28.5±1.21ms  1309.4 MB/sec
list_primitive_sparse_99pct_null/zstd              1.00     10.2±0.07ms     3.6 GB/sec    2.92     29.7±0.28ms  1257.0 MB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00      8.7±0.05ms     4.2 GB/sec    3.23     28.2±0.30ms  1324.3 MB/sec

FWIW, the sparse regression isn't as bad on my WS

group                                              9831-2                                 main3
-----                                              ------                                 -----
list_primitive/bloom_filter                        1.00    284.4±1.73ms  1917.3 MB/sec    1.04    295.5±1.94ms  1845.3 MB/sec
list_primitive/cdc                                 1.00    300.8±6.25ms  1813.2 MB/sec    1.02    308.2±6.27ms  1769.8 MB/sec
list_primitive/default                             1.00    224.9±1.85ms     2.4 GB/sec    1.04    234.6±3.39ms     2.3 GB/sec
list_primitive/parquet_2                           1.00    235.3±5.39ms     2.3 GB/sec    1.06    249.0±5.09ms     2.1 GB/sec
list_primitive/zstd                                1.00    371.7±3.83ms  1467.0 MB/sec    1.02    379.9±3.56ms  1435.7 MB/sec
list_primitive/zstd_parquet_2                      1.00    366.4±1.82ms  1488.5 MB/sec    1.02    372.1±3.90ms  1465.6 MB/sec
list_primitive_non_null/bloom_filter               1.00    344.6±3.78ms  1579.5 MB/sec    1.02    352.3±7.27ms  1544.9 MB/sec
list_primitive_non_null/cdc                        1.00    358.6±5.60ms  1517.7 MB/sec    1.01    360.6±4.80ms  1509.1 MB/sec
list_primitive_non_null/default                    1.00    250.1±3.05ms     2.1 GB/sec    1.04    259.7±9.49ms     2.0 GB/sec
list_primitive_non_null/parquet_2                  1.00    269.9±8.56ms  2016.7 MB/sec    1.00    270.0±7.87ms  2015.7 MB/sec
list_primitive_non_null/zstd                       1.01    496.6±7.86ms  1096.0 MB/sec    1.00    492.2±2.43ms  1105.7 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    485.6±2.63ms  1120.8 MB/sec    1.01    490.2±1.64ms  1110.4 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.02      9.1±0.07ms     4.0 GB/sec    1.00      8.9±0.08ms     4.1 GB/sec
list_primitive_sparse_99pct_null/cdc               1.07     18.7±0.44ms  1997.9 MB/sec    1.00     17.4±0.11ms     2.1 GB/sec
list_primitive_sparse_99pct_null/default           1.02      8.6±0.11ms     4.2 GB/sec    1.00      8.5±0.10ms     4.3 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.05      8.7±0.21ms     4.2 GB/sec    1.00      8.3±0.06ms     4.4 GB/sec
list_primitive_sparse_99pct_null/zstd              1.01     10.2±0.07ms     3.6 GB/sec    1.00     10.1±0.09ms     3.6 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.02      8.7±0.05ms     4.2 GB/sec    1.00      8.5±0.08ms     4.3 GB/sec

@HippoBaro HippoBaro force-pushed the compact_lvl_repr branch from ae4f2b8 to 4d0f400 Compare May 7, 2026 01:19
HippoBaro added 3 commits May 6, 2026 21:28
Adds a bulk encoding method for repeated level values. After a small
warmup to enter RLE accumulation mode, remaining values are extended
in O(1) via the existing `extend_run` path.

Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
Introduces a `LevelData` enum (`Absent`, `Materialized`, `Uniform`) to
replace `Option<Vec<i16>>` for definition and repetition levels, and a
borrowed `LevelDataRef` counterpart for the writer path. Uniform columns
(e.g. required fields, all-null pages) are now encoded in O(1) without
materializing a dense list.

The CDC chunker, column writer, and arrow writer are migrated to the new
types.

Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
Replace the long positional uniform roundtrip helper with a named
`ColumnRoundTripUniform` fixture. This makes each test spell out the
level inputs, schema levels, and expected read-back values explicitly.

Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
@HippoBaro HippoBaro force-pushed the compact_lvl_repr branch from 4d0f400 to 73df248 Compare May 7, 2026 01:29
@HippoBaro
Copy link
Copy Markdown
Contributor Author

HippoBaro commented May 7, 2026

Thank you @etseidl, @alamb, and @Dandandan 💯

After trying a few alternatives to reduce the cost of the extra level representation variants, I think the remaining overhead in the Materialized path is the inherent cost of specializing the Absent and Uniform cases. At the end of the day, we are making a tradeoff here.

A bit, but only when compared to previously optimized code. Compared to before @HippoBaro started, it's still quite nice. (comparing to #9654 f5d6dc3)

I think that's fair. This PR is part of a larger series that, taken together, improves runtime across most writer benchmarks.

I rebased the PR against main and worked through @alamb's feedback. I also added the builder pattern suggested here in 4d0f400. It's super neat!

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 7, 2026

The MSRV failure is unrelated to this PR. For more details:

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 7, 2026

run benchmarks arrow_writer

1 similar comment
@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 7, 2026

run benchmarks arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4397941393-2053-njzt6 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing compact_lvl_repr (73df248) to ded985c (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4397942184-2054-x6xlm 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing compact_lvl_repr (73df248) to ded985c (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 7, 2026

Merging up from main to get a clean CI run

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              compact_lvl_repr                       main
-----                                              ----------------                       ----
bool/bloom_filter                                  1.00     13.0±0.07ms    19.2 MB/sec    1.00     13.1±0.09ms    19.2 MB/sec
bool/cdc                                           1.01     16.0±0.11ms    15.6 MB/sec    1.00     15.8±0.10ms    15.8 MB/sec
bool/default                                       1.00     10.9±0.04ms    23.0 MB/sec    1.01     11.0±0.09ms    22.8 MB/sec
bool/parquet_2                                     1.00     14.7±0.06ms    17.0 MB/sec    1.00     14.7±0.10ms    17.1 MB/sec
bool/zstd                                          1.00     11.4±0.03ms    21.9 MB/sec    1.01     11.5±0.11ms    21.7 MB/sec
bool/zstd_parquet_2                                1.00     15.1±0.05ms    16.6 MB/sec    1.00     15.1±0.11ms    16.6 MB/sec
bool_non_null/bloom_filter                         1.00      7.0±0.03ms    17.9 MB/sec    1.01      7.1±0.03ms    17.7 MB/sec
bool_non_null/cdc                                  1.00      6.8±0.06ms    18.4 MB/sec    1.01      6.9±0.04ms    18.2 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.4 MB/sec    1.02      4.3±0.02ms    28.8 MB/sec
bool_non_null/parquet_2                            1.00      9.0±0.04ms    13.9 MB/sec    1.00      9.0±0.03ms    13.9 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.02ms    27.2 MB/sec    1.02      4.7±0.03ms    26.7 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.4±0.03ms    13.3 MB/sec    1.00      9.4±0.03ms    13.3 MB/sec
float_with_nans/bloom_filter                       1.00     93.5±0.61ms   149.7 MB/sec    1.01     94.6±0.44ms   148.0 MB/sec
float_with_nans/cdc                                1.00     81.6±0.17ms   171.6 MB/sec    1.01     82.7±1.50ms   169.2 MB/sec
float_with_nans/default                            1.00     74.7±0.42ms   187.4 MB/sec    1.01     75.1±0.27ms   186.3 MB/sec
float_with_nans/parquet_2                          1.00     94.9±0.49ms   147.5 MB/sec    1.01     96.1±0.27ms   145.7 MB/sec
float_with_nans/zstd                               1.00    112.2±0.25ms   124.7 MB/sec    1.01    112.8±0.24ms   124.1 MB/sec
float_with_nans/zstd_parquet_2                     1.00    132.1±0.34ms   106.0 MB/sec    1.01    133.2±0.20ms   105.1 MB/sec
list_primitive/bloom_filter                        1.00    327.1±3.75ms  1667.2 MB/sec    1.02    334.7±4.18ms  1629.4 MB/sec
list_primitive/cdc                                 1.02    357.9±1.35ms  1523.6 MB/sec    1.00    349.4±3.88ms  1560.8 MB/sec
list_primitive/default                             1.00    247.0±1.57ms     2.2 GB/sec    1.02    252.5±1.78ms     2.1 GB/sec
list_primitive/parquet_2                           1.00    267.7±0.74ms  2037.0 MB/sec    1.00    266.5±0.56ms  2046.6 MB/sec
list_primitive/zstd                                1.03    499.6±2.54ms  1091.6 MB/sec    1.00    485.2±1.43ms  1124.0 MB/sec
list_primitive/zstd_parquet_2                      1.02    491.4±0.39ms  1109.9 MB/sec    1.00    480.0±1.12ms  1136.1 MB/sec
list_primitive_non_null/bloom_filter               1.00    414.3±7.18ms  1313.6 MB/sec    1.03    425.1±6.10ms  1280.4 MB/sec
list_primitive_non_null/cdc                        1.01    435.4±8.14ms  1249.9 MB/sec    1.00    429.5±7.25ms  1267.1 MB/sec
list_primitive_non_null/default                    1.00    290.8±3.98ms  1871.6 MB/sec    1.01    293.0±6.35ms  1857.2 MB/sec
list_primitive_non_null/parquet_2                  1.02    298.2±5.41ms  1824.9 MB/sec    1.00   291.1±10.01ms  1869.4 MB/sec
list_primitive_non_null/zstd                       1.00    707.3±9.63ms   769.4 MB/sec    1.00   707.8±14.29ms   768.9 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    688.8±2.31ms   790.1 MB/sec    1.01    697.8±1.93ms   780.0 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.6±0.14ms     3.2 GB/sec    1.01     11.7±0.16ms     3.1 GB/sec
list_primitive_sparse_99pct_null/cdc               1.07     23.5±0.15ms  1590.0 MB/sec    1.00     22.0±0.10ms  1699.6 MB/sec
list_primitive_sparse_99pct_null/default           1.01     11.4±0.11ms     3.2 GB/sec    1.00     11.3±0.12ms     3.2 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     11.3±0.12ms     3.2 GB/sec    1.01     11.3±0.11ms     3.2 GB/sec
list_primitive_sparse_99pct_null/zstd              1.01     13.1±0.12ms     2.8 GB/sec    1.00     13.0±0.06ms     2.8 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.01     11.4±0.10ms     3.2 GB/sec    1.00     11.3±0.05ms     3.2 GB/sec
primitive/bloom_filter                             1.00    148.3±0.46ms   302.6 MB/sec    1.02    151.7±1.09ms   295.9 MB/sec
primitive/cdc                                      1.00    158.1±0.49ms   283.8 MB/sec    1.00    157.4±0.82ms   285.2 MB/sec
primitive/default                                  1.00    117.3±0.28ms   382.7 MB/sec    1.01    118.7±0.79ms   378.0 MB/sec
primitive/parquet_2                                1.00    132.2±0.41ms   339.4 MB/sec    1.02    135.2±2.00ms   332.0 MB/sec
primitive/zstd                                     1.00    147.0±0.30ms   305.2 MB/sec    1.01    147.8±0.93ms   303.7 MB/sec
primitive/zstd_parquet_2                           1.00    165.5±0.24ms   271.2 MB/sec    1.01    166.4±0.81ms   269.7 MB/sec
primitive_all_null/bloom_filter                    1.00     11.6±0.13ms     3.8 GB/sec    1.00     11.6±0.11ms     3.8 GB/sec
primitive_all_null/cdc                             1.05     29.4±0.54ms  1524.4 MB/sec    1.00     28.0±0.27ms  1601.4 MB/sec
primitive_all_null/default                         1.00     11.0±0.23ms     4.0 GB/sec    1.01     11.1±0.30ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.00     10.9±0.09ms     4.0 GB/sec    1.01     10.9±0.14ms     4.0 GB/sec
primitive_all_null/zstd                            1.01     11.2±0.26ms     3.9 GB/sec    1.00     11.1±0.18ms     3.9 GB/sec
primitive_all_null/zstd_parquet_2                  1.00     11.0±0.10ms     4.0 GB/sec    1.01     11.1±0.24ms     3.9 GB/sec
primitive_non_null/bloom_filter                    1.00    105.6±0.40ms   416.6 MB/sec    1.07    113.0±0.98ms   389.2 MB/sec
primitive_non_null/cdc                             1.00     90.1±0.29ms   488.6 MB/sec    1.00     90.0±0.27ms   488.8 MB/sec
primitive_non_null/default                         1.00     67.3±0.25ms   654.0 MB/sec    1.01     68.1±0.50ms   646.6 MB/sec
primitive_non_null/parquet_2                       1.00     89.0±0.31ms   494.2 MB/sec    1.00     89.4±0.36ms   491.9 MB/sec
primitive_non_null/zstd                            1.00     98.2±0.30ms   448.0 MB/sec    1.07    105.0±0.49ms   419.0 MB/sec
primitive_non_null/zstd_parquet_2                  1.00    123.3±0.61ms   356.9 MB/sec    1.00    123.5±1.15ms   356.3 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     18.6±0.16ms     2.4 GB/sec    1.00     18.6±0.39ms     2.4 GB/sec
primitive_sparse_99pct_null/cdc                    1.05     36.3±0.33ms  1236.6 MB/sec    1.00     34.7±0.25ms  1292.4 MB/sec
primitive_sparse_99pct_null/default                1.01     16.9±0.11ms     2.6 GB/sec    1.00     16.8±0.07ms     2.6 GB/sec
primitive_sparse_99pct_null/parquet_2              1.01     17.1±0.07ms     2.6 GB/sec    1.00     16.9±0.19ms     2.6 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.3±0.09ms     2.2 GB/sec    1.00     20.3±0.16ms     2.2 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.01     18.9±0.13ms     2.3 GB/sec    1.00     18.7±0.11ms     2.3 GB/sec
string/bloom_filter                                1.00   201.0±11.42ms     2.5 GB/sec    1.05   210.5±14.73ms     2.4 GB/sec
string/cdc                                         1.00    219.8±3.07ms     2.3 GB/sec    1.01    222.1±7.48ms     2.3 GB/sec
string/default                                     1.00   125.1±16.47ms     4.1 GB/sec    1.03   128.6±15.29ms     4.0 GB/sec
string/parquet_2                                   1.00    110.2±5.07ms     4.6 GB/sec    1.13    124.4±1.03ms     4.1 GB/sec
string/zstd                                        1.00    416.7±1.90ms  1258.2 MB/sec    1.07   444.1±17.97ms  1180.5 MB/sec
string/zstd_parquet_2                              1.00    394.4±0.41ms  1329.3 MB/sec    1.01    398.9±2.94ms  1314.3 MB/sec
string_and_binary_view/bloom_filter                1.00     65.5±0.30ms   492.6 MB/sec    1.03     67.6±0.77ms   477.3 MB/sec
string_and_binary_view/cdc                         1.00     59.2±0.45ms   544.4 MB/sec    1.02     60.4±1.24ms   533.8 MB/sec
string_and_binary_view/default                     1.00     47.5±0.10ms   678.5 MB/sec    1.03     48.7±0.17ms   661.6 MB/sec
string_and_binary_view/parquet_2                   1.00     59.1±0.33ms   545.6 MB/sec    1.02     60.1±0.41ms   536.3 MB/sec
string_and_binary_view/zstd                        1.00     84.9±0.64ms   379.7 MB/sec    1.01     85.8±0.49ms   375.7 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     73.1±0.34ms   440.9 MB/sec    1.02     74.7±1.02ms   431.5 MB/sec
string_dictionary/bloom_filter                     1.00     94.1±2.48ms     2.7 GB/sec    1.00     94.0±1.65ms     2.7 GB/sec
string_dictionary/cdc                              1.00     52.3±1.38ms     4.9 GB/sec    1.01     52.7±0.81ms     4.9 GB/sec
string_dictionary/default                          1.01     49.1±1.46ms     5.3 GB/sec    1.00     48.5±1.21ms     5.3 GB/sec
string_dictionary/parquet_2                        1.00     55.3±0.53ms     4.7 GB/sec    1.00     55.3±0.29ms     4.7 GB/sec
string_dictionary/zstd                             1.00    210.4±1.67ms  1255.6 MB/sec    1.00    210.2±1.49ms  1256.6 MB/sec
string_dictionary/zstd_parquet_2                   1.00    199.2±0.43ms  1326.2 MB/sec    1.01    200.3±0.52ms  1318.4 MB/sec
string_non_null/bloom_filter                       1.00   252.2±15.13ms     2.0 GB/sec    1.03   260.8±13.67ms  2009.2 MB/sec
string_non_null/cdc                                1.00   269.8±10.71ms  1942.1 MB/sec    1.02    275.1±6.47ms  1904.5 MB/sec
string_non_null/default                            1.00   138.0±12.83ms     3.7 GB/sec    1.00   137.7±11.81ms     3.7 GB/sec
string_non_null/parquet_2                          1.00    137.6±4.90ms     3.7 GB/sec    1.03    142.2±7.78ms     3.6 GB/sec
string_non_null/zstd                               1.02    551.9±3.31ms   949.4 MB/sec    1.00    539.8±2.65ms   970.6 MB/sec
string_non_null/zstd_parquet_2                     1.02    516.4±3.08ms  1014.7 MB/sec    1.00    508.1±1.28ms  1031.3 MB/sec
struct_all_null/bloom_filter                       1.00      2.5±0.00ms     6.3 GB/sec    1.65      4.2±0.02ms     3.8 GB/sec
struct_all_null/cdc                                1.00     10.0±0.15ms  1606.9 MB/sec    1.07     10.7±0.14ms  1503.6 MB/sec
struct_all_null/default                            1.00      2.2±0.00ms     7.0 GB/sec    1.73      3.9±0.03ms     4.1 GB/sec
struct_all_null/parquet_2                          1.00      2.2±0.00ms     7.0 GB/sec    1.73      3.9±0.03ms     4.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.9 GB/sec    1.72      3.9±0.03ms     4.0 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.73      3.9±0.06ms     4.0 GB/sec
struct_non_null/bloom_filter                       1.00     48.1±0.28ms   332.5 MB/sec    1.03     49.7±0.27ms   322.1 MB/sec
struct_non_null/cdc                                1.00     46.1±0.50ms   347.0 MB/sec    1.01     46.7±0.16ms   342.4 MB/sec
struct_non_null/default                            1.00     33.0±0.20ms   485.5 MB/sec    1.03     34.0±0.19ms   471.3 MB/sec
struct_non_null/parquet_2                          1.00     41.4±0.15ms   386.6 MB/sec    1.03     42.7±0.17ms   374.9 MB/sec
struct_non_null/zstd                               1.00     41.4±0.12ms   386.9 MB/sec    1.04     42.9±0.29ms   373.3 MB/sec
struct_non_null/zstd_parquet_2                     1.00     55.3±0.12ms   289.1 MB/sec    1.02     56.2±0.25ms   284.6 MB/sec
struct_sparse_99pct_null/bloom_filter              1.04      7.7±0.15ms     2.1 GB/sec    1.00      7.4±0.06ms     2.1 GB/sec
struct_sparse_99pct_null/cdc                       1.05     14.6±0.12ms  1106.8 MB/sec    1.00     13.9±0.15ms  1160.2 MB/sec
struct_sparse_99pct_null/default                   1.03      7.0±0.04ms     2.2 GB/sec    1.00      6.8±0.06ms     2.3 GB/sec
struct_sparse_99pct_null/parquet_2                 1.03      7.0±0.03ms     2.2 GB/sec    1.00      6.8±0.06ms     2.3 GB/sec
struct_sparse_99pct_null/zstd                      1.02      8.4±0.06ms  1923.6 MB/sec    1.00      8.2±0.07ms  1966.4 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.02      7.8±0.03ms     2.0 GB/sec    1.00      7.6±0.11ms     2.1 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1935.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1874.7s
CPU sys 58.0s
Peak spill 0 B

branch

Metric Value
Wall time 1930.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1886.4s
CPU sys 43.2s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              compact_lvl_repr                       main
-----                                              ----------------                       ----
bool/bloom_filter                                  1.02     13.2±0.12ms    19.0 MB/sec    1.00     13.0±0.05ms    19.3 MB/sec
bool/cdc                                           1.02     16.0±0.09ms    15.6 MB/sec    1.00     15.7±0.07ms    15.9 MB/sec
bool/default                                       1.01     11.0±0.09ms    22.7 MB/sec    1.00     10.9±0.05ms    23.0 MB/sec
bool/parquet_2                                     1.02     14.9±0.20ms    16.8 MB/sec    1.00     14.6±0.06ms    17.2 MB/sec
bool/zstd                                          1.01     11.5±0.09ms    21.7 MB/sec    1.00     11.4±0.05ms    22.0 MB/sec
bool/zstd_parquet_2                                1.01     15.2±0.10ms    16.5 MB/sec    1.00     14.9±0.05ms    16.7 MB/sec
bool_non_null/bloom_filter                         1.00      7.0±0.03ms    17.8 MB/sec    1.01      7.1±0.03ms    17.6 MB/sec
bool_non_null/cdc                                  1.00      6.8±0.03ms    18.3 MB/sec    1.01      6.9±0.04ms    18.1 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.4 MB/sec    1.02      4.3±0.03ms    28.8 MB/sec
bool_non_null/parquet_2                            1.00      9.1±0.04ms    13.8 MB/sec    1.00      9.0±0.05ms    13.9 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.02ms    27.1 MB/sec    1.01      4.7±0.03ms    26.7 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.4±0.04ms    13.2 MB/sec    1.00      9.4±0.03ms    13.3 MB/sec
float_with_nans/bloom_filter                       1.01     94.2±0.34ms   148.6 MB/sec    1.00     93.1±0.39ms   150.3 MB/sec
float_with_nans/cdc                                1.01     82.1±0.23ms   170.6 MB/sec    1.00     81.6±0.19ms   171.6 MB/sec
float_with_nans/default                            1.00     74.7±0.22ms   187.5 MB/sec    1.00     74.7±0.26ms   187.5 MB/sec
float_with_nans/parquet_2                          1.00     95.1±0.43ms   147.3 MB/sec    1.00     94.9±0.22ms   147.6 MB/sec
float_with_nans/zstd                               1.00    112.3±0.22ms   124.7 MB/sec    1.00    112.2±0.24ms   124.8 MB/sec
float_with_nans/zstd_parquet_2                     1.01    133.2±0.61ms   105.1 MB/sec    1.00    132.0±0.23ms   106.0 MB/sec
list_primitive/bloom_filter                        1.03    328.5±1.18ms  1660.0 MB/sec    1.00    319.7±1.54ms  1705.8 MB/sec
list_primitive/cdc                                 1.03    356.7±1.19ms  1528.9 MB/sec    1.00    347.6±2.34ms  1569.1 MB/sec
list_primitive/default                             1.02    248.3±1.36ms     2.1 GB/sec    1.00    242.7±0.86ms     2.2 GB/sec
list_primitive/parquet_2                           1.02    267.9±0.42ms  2035.4 MB/sec    1.00    263.1±1.13ms     2.0 GB/sec
list_primitive/zstd                                1.02    498.4±1.95ms  1094.2 MB/sec    1.00    486.6±2.40ms  1120.7 MB/sec
list_primitive/zstd_parquet_2                      1.03    491.5±1.07ms  1109.5 MB/sec    1.00    479.2±0.63ms  1138.0 MB/sec
list_primitive_non_null/bloom_filter               1.02    430.2±5.67ms  1265.1 MB/sec    1.00    422.1±6.27ms  1289.2 MB/sec
list_primitive_non_null/cdc                        1.03    438.2±8.56ms  1242.1 MB/sec    1.00    424.0±5.99ms  1283.5 MB/sec
list_primitive_non_null/default                    1.00    287.9±4.07ms  1890.3 MB/sec    1.02    295.1±7.56ms  1844.5 MB/sec
list_primitive_non_null/parquet_2                  1.11   309.8±12.79ms  1756.5 MB/sec    1.00   279.2±19.84ms  1949.4 MB/sec
list_primitive_non_null/zstd                       1.05   706.2±11.47ms   770.6 MB/sec    1.00    675.2±2.56ms   806.1 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    678.1±2.30ms   802.7 MB/sec    1.01    686.0±1.92ms   793.3 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.4±0.08ms     3.2 GB/sec    1.01     11.6±0.06ms     3.2 GB/sec
list_primitive_sparse_99pct_null/cdc               1.06     23.2±0.10ms  1613.1 MB/sec    1.00     21.9±0.09ms  1702.9 MB/sec
list_primitive_sparse_99pct_null/default           1.00     11.2±0.06ms     3.3 GB/sec    1.00     11.2±0.04ms     3.3 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     11.1±0.04ms     3.3 GB/sec    1.01     11.2±0.05ms     3.3 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     12.9±0.06ms     2.8 GB/sec    1.01     13.1±0.07ms     2.8 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     11.2±0.05ms     3.2 GB/sec    1.01     11.4±0.05ms     3.2 GB/sec
primitive/bloom_filter                             1.00    150.5±0.88ms   298.2 MB/sec    1.00    150.6±0.52ms   298.0 MB/sec
primitive/cdc                                      1.02    159.3±0.66ms   281.8 MB/sec    1.00    156.4±0.43ms   286.9 MB/sec
primitive/default                                  1.01    118.4±0.52ms   379.0 MB/sec    1.00    117.7±0.41ms   381.4 MB/sec
primitive/parquet_2                                1.00    133.5±0.53ms   336.2 MB/sec    1.00    133.8±1.80ms   335.4 MB/sec
primitive/zstd                                     1.00    148.1±0.48ms   303.1 MB/sec    1.00    148.7±2.89ms   301.8 MB/sec
primitive/zstd_parquet_2                           1.01    166.7±0.56ms   269.1 MB/sec    1.00    165.7±0.40ms   270.8 MB/sec
primitive_all_null/bloom_filter                    1.00     11.5±0.10ms     3.8 GB/sec    1.01     11.6±0.21ms     3.8 GB/sec
primitive_all_null/cdc                             1.03     29.0±0.29ms  1547.1 MB/sec    1.00     28.1±0.33ms  1599.5 MB/sec
primitive_all_null/default                         1.00     10.8±0.05ms     4.1 GB/sec    1.01     10.9±0.16ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.00     10.8±0.10ms     4.0 GB/sec    1.01     11.0±0.23ms     4.0 GB/sec
primitive_all_null/zstd                            1.00     11.0±0.09ms     4.0 GB/sec    1.01     11.1±0.23ms     3.9 GB/sec
primitive_all_null/zstd_parquet_2                  1.00     10.9±0.11ms     4.0 GB/sec    1.02     11.2±0.28ms     3.9 GB/sec
primitive_non_null/bloom_filter                    1.00    105.9±0.39ms   415.3 MB/sec    1.06    112.8±0.39ms   390.2 MB/sec
primitive_non_null/cdc                             1.00     90.6±0.31ms   485.6 MB/sec    1.00     90.4±0.25ms   486.5 MB/sec
primitive_non_null/default                         1.00     67.6±0.23ms   650.9 MB/sec    1.00     67.8±0.25ms   648.7 MB/sec
primitive_non_null/parquet_2                       1.00     89.2±0.26ms   493.1 MB/sec    1.00     89.6±0.26ms   491.0 MB/sec
primitive_non_null/zstd                            1.00     98.5±0.27ms   446.7 MB/sec    1.07    105.4±0.33ms   417.5 MB/sec
primitive_non_null/zstd_parquet_2                  1.00    123.1±0.23ms   357.4 MB/sec    1.00    123.7±1.13ms   355.8 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.01     18.6±0.14ms     2.4 GB/sec    1.00     18.3±0.21ms     2.4 GB/sec
primitive_sparse_99pct_null/cdc                    1.03     35.8±0.25ms  1252.7 MB/sec    1.00     34.8±0.30ms  1290.1 MB/sec
primitive_sparse_99pct_null/default                1.01     16.9±0.09ms     2.6 GB/sec    1.00     16.8±0.07ms     2.6 GB/sec
primitive_sparse_99pct_null/parquet_2              1.01     16.9±0.08ms     2.6 GB/sec    1.00     16.8±0.08ms     2.6 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.1±0.11ms     2.2 GB/sec    1.00     20.1±0.08ms     2.2 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.01     18.9±0.15ms     2.3 GB/sec    1.00     18.7±0.09ms     2.3 GB/sec
string/bloom_filter                                1.00   202.6±12.14ms     2.5 GB/sec    1.03   209.1±13.50ms     2.4 GB/sec
string/cdc                                         1.00    220.3±3.42ms     2.3 GB/sec    1.00    220.1±7.76ms     2.3 GB/sec
string/default                                     1.00   128.5±17.92ms     4.0 GB/sec    1.01   129.4±15.92ms     4.0 GB/sec
string/parquet_2                                   1.00    109.8±5.48ms     4.7 GB/sec    1.13    123.8±0.51ms     4.1 GB/sec
string/zstd                                        1.00    418.6±1.93ms  1252.4 MB/sec    1.05   441.6±17.68ms  1187.3 MB/sec
string/zstd_parquet_2                              1.00    395.6±0.65ms  1325.3 MB/sec    1.00    394.6±0.38ms  1328.5 MB/sec
string_and_binary_view/bloom_filter                1.00     64.5±0.37ms   499.9 MB/sec    1.00     64.7±0.33ms   498.3 MB/sec
string_and_binary_view/cdc                         1.01     58.4±0.28ms   552.3 MB/sec    1.00     57.8±0.13ms   557.7 MB/sec
string_and_binary_view/default                     1.00     47.8±0.21ms   675.0 MB/sec    1.00     47.9±0.19ms   673.0 MB/sec
string_and_binary_view/parquet_2                   1.00     58.7±0.27ms   549.5 MB/sec    1.00     58.8±0.18ms   548.2 MB/sec
string_and_binary_view/zstd                        1.00     84.4±0.29ms   382.0 MB/sec    1.00     84.7±0.41ms   380.6 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     72.4±0.25ms   445.3 MB/sec    1.00     72.6±0.22ms   444.1 MB/sec
string_dictionary/bloom_filter                     1.02     89.2±1.25ms     2.9 GB/sec    1.00     87.8±0.60ms     2.9 GB/sec
string_dictionary/cdc                              1.01     51.3±0.87ms     5.0 GB/sec    1.00     50.8±0.93ms     5.1 GB/sec
string_dictionary/default                          1.03     47.7±1.54ms     5.4 GB/sec    1.00     46.4±0.87ms     5.6 GB/sec
string_dictionary/parquet_2                        1.00     54.1±0.25ms     4.8 GB/sec    1.00     53.8±0.14ms     4.8 GB/sec
string_dictionary/zstd                             1.00    207.5±1.49ms  1273.2 MB/sec    1.00    207.0±1.61ms  1275.9 MB/sec
string_dictionary/zstd_parquet_2                   1.01    200.0±0.96ms  1320.9 MB/sec    1.00    198.6±0.34ms  1330.3 MB/sec
string_non_null/bloom_filter                       1.03   251.6±13.04ms     2.0 GB/sec    1.00   244.3±11.82ms     2.1 GB/sec
string_non_null/cdc                                1.01    274.8±9.29ms  1906.6 MB/sec    1.00   271.5±10.82ms  1929.8 MB/sec
string_non_null/default                            1.00   136.0±12.90ms     3.8 GB/sec    1.03   140.7±11.97ms     3.6 GB/sec
string_non_null/parquet_2                          1.05    139.9±8.56ms     3.7 GB/sec    1.00    132.6±3.61ms     3.9 GB/sec
string_non_null/zstd                               1.01   569.0±10.86ms   920.8 MB/sec    1.00    563.8±8.86ms   929.3 MB/sec
string_non_null/zstd_parquet_2                     1.00    521.4±6.97ms  1005.0 MB/sec    1.01   525.6±12.65ms   996.9 MB/sec
struct_all_null/bloom_filter                       1.00      2.5±0.01ms     6.2 GB/sec    1.67      4.2±0.11ms     3.7 GB/sec
struct_all_null/cdc                                1.00     10.0±0.13ms  1619.0 MB/sec    1.08     10.7±0.17ms  1500.8 MB/sec
struct_all_null/default                            1.00      2.2±0.00ms     7.0 GB/sec    1.74      3.9±0.06ms     4.0 GB/sec
struct_all_null/parquet_2                          1.00      2.3±0.01ms     7.0 GB/sec    1.73      3.9±0.05ms     4.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.9 GB/sec    1.74      4.0±0.10ms     3.9 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.73      3.9±0.06ms     4.0 GB/sec
struct_non_null/bloom_filter                       1.00     46.2±0.18ms   346.7 MB/sec    1.10     50.7±0.18ms   315.4 MB/sec
struct_non_null/cdc                                1.00     45.4±0.16ms   352.7 MB/sec    1.03     46.7±0.13ms   342.3 MB/sec
struct_non_null/default                            1.00     31.9±0.09ms   502.2 MB/sec    1.07     34.2±0.12ms   467.8 MB/sec
struct_non_null/parquet_2                          1.00     40.6±0.11ms   393.8 MB/sec    1.05     42.7±0.11ms   375.0 MB/sec
struct_non_null/zstd                               1.00     40.7±0.25ms   392.9 MB/sec    1.06     43.0±0.09ms   372.3 MB/sec
struct_non_null/zstd_parquet_2                     1.00     54.7±0.14ms   292.6 MB/sec    1.03     56.3±0.14ms   284.2 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00      7.4±0.04ms     2.1 GB/sec    1.00      7.4±0.06ms     2.1 GB/sec
struct_sparse_99pct_null/cdc                       1.03     14.4±0.10ms  1119.4 MB/sec    1.00     14.0±0.24ms  1154.2 MB/sec
struct_sparse_99pct_null/default                   1.00      6.9±0.02ms     2.3 GB/sec    1.00      6.9±0.03ms     2.3 GB/sec
struct_sparse_99pct_null/parquet_2                 1.01      6.9±0.02ms     2.3 GB/sec    1.00      6.9±0.03ms     2.3 GB/sec
struct_sparse_99pct_null/zstd                      1.01      8.3±0.03ms  1948.0 MB/sec    1.00      8.2±0.04ms  1960.0 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      7.7±0.03ms     2.0 GB/sec    1.00      7.6±0.04ms     2.1 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1920.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1863.5s
CPU sys 53.7s
Peak spill 0 B

branch

Metric Value
Wall time 1940.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1888.9s
CPU sys 46.6s
Peak spill 0 B

File an issue against this benchmark runner

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 7, 2026

I think the benchmarks are looking good now -- maybe there is some small slowdown but it also looks like it is within the noise threshold.

@alamb alamb merged commit 13f5f94 into apache:main May 7, 2026
16 checks passed
@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 7, 2026

Thank you for your diligence and patience @HippoBaro 🙏 (and @etseidl and @Dandandan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants