feat(ipc): Avoid zero-filling IPC reads with typed buffer handling#9971
feat(ipc): Avoid zero-filling IPC reads with typed buffer handling#9971pchintar wants to merge 1 commit into
Conversation
b59bbe2 to
e8845d0
Compare
a24d808 to
292cb21
Compare
292cb21 to
cabe3f2
Compare
|
run benchmark ipc_reader |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing ipc-typed-buffer-reads (cabe3f2) to 1ffd202 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
alamb
left a comment
There was a problem hiding this comment.
Thanks @pchintar -- this is getting closer but I think this still may cause issues.
I think any time we allocate (or reallcoate) a Vec<u8> we lose the alignment guarantee's. There are still several places in this PR that do Vec::new(...)
Poking around I found a way to to get an aligned vector that maybe we can try here
So instead of this code
let mut buf = MutableBuffer::from_len_zeroed(total_len);
reader.read_exact(&mut buf)?;
Ok(buf.into())We could do something like
let mut buf = aligned_vec(total_len);
reader.read_exact(&mut buf)?;
Ok(buf.into())And as long as aligned_vec was aligned to (1<<6) (64) I think it would be equivalent to the current mutable buffer. Maybe we should align to 128 to guarantee views work 🤔
| /// reads. | ||
| enum IpcBufferSource<'a> { | ||
| Buffer(&'a Buffer), | ||
| } |
There was a problem hiding this comment.
Stylistically if you want to make a wrapper type I think a more common pattern is a an "new type" struct -- something like
struct IpcBufferSource<'a>(&'a Buffer)That way you can still define methods on it, but you don't have to use match all over the place - you can just refer to the inner field as .0
| compression: Option<CompressionCodec>, | ||
| decompression_context: &mut DecompressionContext, | ||
| ) -> Result<Buffer, ArrowError> { | ||
| let byte_len = len |
There was a problem hiding this comment.
I found the naming a little confusing here as there are three buffers
I found this a little confusing as there three buffers -- self::Buffer, buf (the input buffer) and the output buffer
Can we maybe name them something more specific to distinguish them? I am in particular confused about the buffer in self and the one passed in via the argument
| ArrowError::IpcError("Buffer count mismatched with metadata".to_string()) | ||
| })?; | ||
|
|
||
| self.data.read_typed_buffer::<T>( |
There was a problem hiding this comment.
this part looks good, so far
| // Some invalid or legacy IPC inputs may contain shorter buffers than | ||
| // implied by the schema. Preserve the existing behavior and let array | ||
| // construction/validation report the error. | ||
| if buffer.len() <= byte_len || buffer.as_ptr().align_offset(std::mem::align_of::<T>()) != 0 |
There was a problem hiding this comment.
This doens't seem right -- our change should ensure that the buffer is always aligned correctly to the otutput requirements. The fact you have to check afterwards suggests this is not the case
| buf.len() | ||
| ))); | ||
| } | ||
| Ok(Buffer::from_vec(buf)) |
There was a problem hiding this comment.
I think this Vec and the one below (still) has no guaranteed alignment.
Which issue does this PR close?
Rationale for this change
This PR is a follow-up to the alignment concerns raised in #9778 when using
Vec<u8>for IPC body reads to replace the currentMutableBuffer::from_len_zeroedin IPC Reader.My earlier approach showed that reading directly into
Vec<u8>could substantially reduce redundant zero-filling in IPC reader paths, but some decode paths still relied on fixed-width typed buffers that could require additional alignment handling cost later during array construction.This PR keeps the
Vec<u8>-based read path for IPC message and block bodies, while adding typed IPC buffer handling for fixed-width physical buffers before array construction.This preserves the existing alignment behavior for those fixed-width decode paths while avoiding the additional alignment handling/copying costs that could otherwise occur later during array construction.
The typed-buffer handling now covers:
These paths now read their physical buffers through
next_typed_buffer::<T>()so the expected physical buffer lengths are derived from the native value type before array construction.Exception Case:
UnionArrayrequiresScalarBufferinputs directly, and external IPC producers may provide unaligned union type id or offset buffers. The updated Union handling therefore preserves the aligned fast path while falling back to alignedScalarBufferconstruction only for unaligned external union buffers.Container types such as
Struct,FixedSizeList,RunEndEncoded, and similar nested/container arrays were intentionally left on their existing decode paths because they do not directly own fixed-width value buffers at that level. Their child arrays continue to decode recursively through the updated typed-buffer paths where applicable.Are these changes tested?
Yes.
The existing IPC reader test suite was run with:
cargo test -p arrow-ipc --libIPC reader benchmark was also run with:
The non-compressed, non-mmap IPC reader paths showed consistent improvements locally. Compressed and mmap-heavy paths were mostly neutral, as expected.
Are there any user-facing changes?
No.