Skip to content

sha2: improve RISC-V Zknh backends#617

Merged
newpavlov merged 10 commits into
masterfrom
sha2/riscv_unaligned
Aug 27, 2024
Merged

sha2: improve RISC-V Zknh backends#617
newpavlov merged 10 commits into
masterfrom
sha2/riscv_unaligned

Conversation

@newpavlov
Copy link
Copy Markdown
Member

@newpavlov newpavlov commented Aug 26, 2024

Annoyingly, RISC-V is really inconvenient when we have to deal with misaligned loads/stores. LLVM by default generates very inefficient code which loads every byte separately and combines them into a 32/64 bit integer. The ld instruction "may" support misaligned loads and for Linux user-space it's even guaranteed, but it can be (and IIUC often in practice is) "extremely slow", so we should not rely on it while writing performant code.

After asking around, it looks like this mess is here to stay, so we have no choice but to work around it. To do that this PR introduces two separate paths for loading block data: aligned and misaligned. The aligned path should be the most common one. In the misaligned path we have to rely on inline assembly since we have to load some bits outside of the block.

Additionally, this PR makes inlining in the riscv-zknh backend less aggressive, which makes generated binary code 3-4 times smaller at the cost of one additional branch.

Generated assembly for RV64:

Comment thread sha2/src/sha512/riscv_zknh_utils.rs Outdated
unsafe {
asm!(
"ld {left}, 0({bp})",
"srl {left}, {left}, {off1}",
Copy link
Copy Markdown
Member Author

@newpavlov newpavlov Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This asm! block (and several others like it) read bits outside of the block (e.g. here we read off1 bits before block). In my understanding, this is not an UB. All bits are guaranteed to reside on the same page as at least one byte of the block, meaning that page faults should be impossible on such edge loads. The garbage bits get eliminated by the shift which is intentionally part of the assembly block, so outside of this asm block we can observe only bits which belong to block.

UPD: IRLO discussion

@newpavlov newpavlov requested a review from tarcieri August 26, 2024 17:13
@newpavlov newpavlov merged commit b2312fa into master Aug 27, 2024
@newpavlov newpavlov deleted the sha2/riscv_unaligned branch August 27, 2024 09:56
Copy link
Copy Markdown

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm/llvm-project#150263 led me to this PR (the link there wasn't permalink, so it took some effort to chase the PR).

Would it be worth adding a separate case and simplifying block loads when +unaligned-scalar-mem feature is specified explicitly?

I have a virtual target that supports this and LLVM already produces A LOT more compact code when this feature is enabled, so not needing to do extra alignment checks + even less code should be even more beneficial. I do use Zknh support in sha2 crate specifically.

@newpavlov
Copy link
Copy Markdown
Member Author

I plan to completely remove the unaligned load hack in a future release and blame LLVM and RISC-V spec for terrible codegen with default compilation options.

@nazar-pc
Copy link
Copy Markdown

Oh, does that also mean +unaligned-scalar-mem will naturally result in more efficient code in that case by accident? I see why it is unpleasant to maintain such hacks downstream.

@newpavlov
Copy link
Copy Markdown
Member Author

I wouldn't say "by accident", but yes.

@newpavlov
Copy link
Copy Markdown
Member Author

@nazar-pc
See #879

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants