sha2: improve RISC-V Zknh backends#617
Conversation
| unsafe { | ||
| asm!( | ||
| "ld {left}, 0({bp})", | ||
| "srl {left}, {left}, {off1}", |
There was a problem hiding this comment.
This asm! block (and several others like it) read bits outside of the block (e.g. here we read off1 bits before block). In my understanding, this is not an UB. All bits are guaranteed to reside on the same page as at least one byte of the block, meaning that page faults should be impossible on such edge loads. The garbage bits get eliminated by the shift which is intentionally part of the assembly block, so outside of this asm block we can observe only bits which belong to block.
UPD: IRLO discussion
There was a problem hiding this comment.
llvm/llvm-project#150263 led me to this PR (the link there wasn't permalink, so it took some effort to chase the PR).
Would it be worth adding a separate case and simplifying block loads when +unaligned-scalar-mem feature is specified explicitly?
I have a virtual target that supports this and LLVM already produces A LOT more compact code when this feature is enabled, so not needing to do extra alignment checks + even less code should be even more beneficial. I do use Zknh support in sha2 crate specifically.
|
I plan to completely remove the unaligned load hack in a future release and blame LLVM and RISC-V spec for terrible codegen with default compilation options. |
|
Oh, does that also mean |
|
I wouldn't say "by accident", but yes. |
Annoyingly, RISC-V is really inconvenient when we have to deal with misaligned loads/stores. LLVM by default generates very inefficient code which loads every byte separately and combines them into a 32/64 bit integer. The
ldinstruction "may" support misaligned loads and for Linux user-space it's even guaranteed, but it can be (and IIUC often in practice is) "extremely slow", so we should not rely on it while writing performant code.After asking around, it looks like this mess is here to stay, so we have no choice but to work around it. To do that this PR introduces two separate paths for loading block data: aligned and misaligned. The aligned path should be the most common one. In the misaligned path we have to rely on inline assembly since we have to load some bits outside of the block.
Additionally, this PR makes inlining in the
riscv-zknhbackend less aggressive, which makes generated binary code 3-4 times smaller at the cost of one additional branch.Generated assembly for RV64: