Add SVE2 BitPerm intrinsics (BDEP, BEXT, BGRP)#2069
Add SVE2 BitPerm intrinsics (BDEP, BEXT, BGRP)#2069ADunfield wants to merge 3 commits intorust-lang:mainfrom
Conversation
Add Rust bindings for the three SVE2 bit-permutation instructions
(FEAT_SVE_BitPerm): BDEP (bit deposit), BEXT (bit extract), and
BGRP (bit group). These are the scalable vector equivalents of
x86 BMI2 PDEP/PEXT.
Intrinsics added (12 total):
- svbdep_u{8,16,32,64} — scatter source bits to mask positions
- svbext_u{8,16,32,64} — gather bits from mask positions
- svbgrp_u{8,16,32,64} — partition bits into two groups by mask
All gated by #[target_feature(enable = "sve2-bitperm")] and the
unstable feature flag stdarch_aarch64_sve2_bitperm.
Also adds SVE scalable vector type definitions (svuint8_t through
svint64_t) using #[rustc_scalable_vector(N)] from rust#143924,
gated by stdarch_aarch64_sve.
Tested on AWS Graviton 4 (c8g) which has FEAT_SVE2_BitPerm.
LLVM intrinsic names verified via Clang assembly output.
Scalar reference tests pass on both x86_64 and aarch64.
|
r? @sayantn rustbot has assigned @sayantn. Use Why was this reviewer chosen?The reviewer was selected based on:
|
|
Does this even make sense yet? The arm work in the main repo is still ongoing. We might instead want to generate this from the yaml specification instead too? |
|
I need it for a project I am working on. |
SVE scalable vector types use `#[rustc_scalable_vector]` which is only supported on aarch64/arm64ec. Gate the SVE module with a target_arch cfg to prevent compilation errors during doc builds on other targets. Add SVE type mappings to stdarch-verify so the intrinsic verification proc macro can parse functions using SVE types without panicking.
|
I appreciate the interest in upstreaming some SVE intrinsics, but I'd prefer we wait until these are added as part of #2071 (actively being worked on) - this adds a lot more SVE intrinsics and uses our intrinsic generator to do it, which should be more maintainable long-term. We're also still working to resolve some implementation issues in rustc related to scalable vectors as part of that work. |
|
As per @davidtwco's comment, I'm going to close this in favor of #2071. We are very close to being able to land that PR and this one would only cause conflicts. |
Summary
Add Rust bindings for the three SVE2 bit-permutation instructions (FEAT_SVE_BitPerm):
svbdep_u{8,16,32,64}) — Bit Deposit: scatter consecutive low bits to mask-selected positions. SVE2 equivalent of x86 BMI2PDEP.svbext_u{8,16,32,64}) — Bit Extract: gather bits from mask-selected positions into consecutive low bits. SVE2 equivalent of x86 BMI2PEXT.svbgrp_u{8,16,32,64}) — Bit Group: partition bits into two groups by mask (selected bits packed low, unselected packed high). No x86 equivalent.12 intrinsics total, all gated by
#[target_feature(enable = "sve2-bitperm")].Implementation
svuint8_tthroughsvint64_t): defined using#[rustc_scalable_vector(N)]fromrustc_scalable_vector(N)rust#143924llvm.aarch64.sve.{bdep,bext,bgrp}.x.nxv{N}i{M}— verified correct via Clang assembly output on Graviton 4stdarch_aarch64_sve(types),stdarch_aarch64_sve2_bitperm(intrinsics)_n_variants deferred until base SVE1 intrinsics (svdup_n_u*) landTesting
-emit-llvmon aarch64bdep z0.d, z0.d, z1.detc.) via Clang-O2on AWS Graviton 4 (c8g, Neoverse V2)cargo check --target aarch64-unknown-linux-gnupasses on nightly 1.96.0Dependencies
rustc_scalable_vector(N)rust#143924 —#[rustc_scalable_vector(N)](merged)Hardware tested on
/proc/cpuinfoconfirmssvebitpermfeature flag