Make mask types opaque by Shnatsel · Pull Request #218 · linebender/fearless_simd

Shnatsel · 2026-05-18T20:40:01Z

This is necessary for eventual AVX-512 support. Part of #179.

This does not add any AVX-512 stuff yet, just lays the groundwork by abstracting away the internal representation.

Contrary to what the description of #196 said, we don't actually need i64 vectors in the public API so long as we're not providing direct conversions between mask types and integer vector types, which aren't available even on main.

Summary of changes:

Add SimdMask<S> trait independent from SimdBase<S>.
Remove integer-vector-style APIs from masks:
- Deref
- Indexing
- Bytes
- public SimdSplit / SimdCombine
- public slide / slide_within_blocks
- public byte conversions
Remove scalar bit-op overloads for masks, so masks support mask-to-mask & | ^ ! but not mask & -1.

This was extricated from a larger changeset I had locally that also added some std::simd API compatibility functions, but that was getting too complicated to review. I'm happy to add more APIs if there's desire and review capacity for them.

A port of vello to this API can be found here.

Shnatsel · 2026-05-18T21:55:08Z

The test failure from a completely unrelated part is spooky. I checked it for UB under miri with various CPU feature combinations, but it passes there, as well as on real hardware (locally and in CI) and on Intel's emulator on CI.

I'm pinning it on a Rosetta bug, since real hardwware and 2 other emulators all pass.

When running x86 code on the macos-latest image, it runs through Rosetta. We need to switch over to macos-26-intel runner for x86 macos to get real x86 hardware.

Shnatsel · 2026-05-19T00:55:08Z

Porting vello was trivial: https://github.com/linebender/vello/compare/main...Shnatsel:opaque-masks?expand=1

So this is ready for review.

LaurenzV

Haven't finished looking through it, just some first comments.

LaurenzV · 2026-05-19T18:43:50Z

    fn as_array_mask8x16(self, a: mask8x16<Self>) -> [i8; 16usize] {
        unsafe { core::mem::transmute::<__m128i, [i8; 16usize]>(a.val.0) }
    }


Why can we still keep this method? Wouldn't this also expect that the inner representation is backed by actual integers?

And similarly, how would load_array_mask8x16 be implemented for AVX512 if it takes the array as an integer array?

We can't keep Deref into an integer slice/array, but explicit conversion into an owned instance is perfectly fine and still useful. So conversions to/from the integer array representation are left intact in the API on purpose. They're the cheapest way to create a mask right now, and are decently cheap even on AVX-512 (a single instruction).

LaurenzV · 2026-05-19T19:01:59Z

@@ -392,36 +396,12 @@ pub trait Simd:
    fn widen_u8x16(self, a: u8x16<Self>) -> u16x16<Self>;
    #[doc = "Reinterpret the bits of this vector as a vector of `u32` elements.\n\nThe total bit width is preserved; the number of elements changes accordingly."]
    fn reinterpret_u32_u8x16(self, a: u8x16<Self>) -> u32x4<Self>;
-    #[doc = "Create a SIMD vector with all elements set to the given value."]
+    #[doc = "Create a SIMD mask with all lanes set from the given signed integer mask value."]
    fn splat_mask8x16(self, val: i8) -> mask8x16<Self>;


How would splat_mask be implemented for AVX-512 since the representation of a single entry is always just a single bit?

I guess more fundamentally, the question is how we fundamentally want to allow construction of masks while preventing hidden footguns due to the exact internals of the SIMD level. For example, I imagine a from_bitmask method would be fast for AVX-512 since its the native representation but slow on NEON since you need to expand the bitmask manually, while a from_array, as it is now, is the best for NEON but would require manually encoding the mask into a single bitmask. I'm not sure what the best thing to do is here, but probably we can take inspiration from portable SIMD?

It would check if the value is 0, and treat any other value as 1.

I've actually started this work by taking inspiration from std::simd, and I have a local branch that is a much more complete mirror of std::simd APIs. This PR is a stripped down, MVP version of it.

std::simd only implements construction of masks from bits represented as a u64 and from arrays of booleans; not from [i32; 4]. Their trade-off is greater portability but worse performance. I implemented that at first, then rolled back the removal of integer-based APIs because the assembly for bool arrays looked gnarly.

Now that you point it out, I agree it's probably better to mirror std::simd and make splat operate on booleans. I guess writing == 0 is not too much to ask of the users. I'll make that change to better align with std::simd in this case.

I've experimented with this and looked at the assembly; it does add a neg instruction in the hot path for non-constant inputs on x86, so it's technically less efficient, but that's probably worth it for the nicer API.

…g std::simd API. Adds a `neg` on the hot path on x86 for non-constant inputs, but seems cheap enough and worth it for the nicer API.

… actual integer values involved, now that we have boolean APIs and the conversions are relevant

…re not clashing with std::simd API of the same name

…d again

Shnatsel · 2026-05-20T16:11:42Z

I'd like to add to_bitmask()/from_bitmask() as well as test and set to match std:simd, but I'll do it in a follow-up PR to avoid complicating review of this one.

LaurenzV · 2026-05-22T05:59:32Z

Will try to review the rest after RustWeek!

LaurenzV

LGTM, I haven't tried to understand the new codegen logic very thoroughly, but the new generated code seems good to me and I agree that making the mask representation opaque is the right move towards getting AVX-512 working. It's a bit unfortunate that there is now more code duplication in the codegen code, but not a big deal.

LaurenzV · 2026-05-23T09:45:43Z

-                "A SIMD mask of {len} {scalar_bits}-bit elements.\n\n\
-                When created from a comparison operation, and as it should be used in a [`Self::select`] operation, each element will be all ones if it's \"true\", and all zeroes if it's \"false\".",
+                "A SIMD mask of {len} logical lanes corresponding to {scalar_bits}-bit vector elements.\n\n\
+                The storage representation of this type is intentionally opaque. For compatibility with existing APIs, it may be converted to and from signed integer lanes where false is encoded as all zeroes (integer value 0) and true is encoded as all ones (integer value -1).",


Maybe we should add a note that the conversion to/from signed integers might not be zero-cost for all levels though?

I'll describe the efficiency in a follow-up where I add other conversion paths

LaurenzV · 2026-05-23T09:46:45Z

-        "Behavior on mask elements that are not all zeroes or all ones is unspecified. It may vary depending on architecture, feature level, the mask elements' width, the mask vector's width, or library version.\n\n\
-        The behavior is also not guaranteed to be logically consistent if mask elements are not all zeroes or all ones. `any_true` may not return the same result as `!all_false`, and `all_true` may not return the same result as `!any_false`.\n\n\
-        The [`select`](crate::Select::select) operation also has unspecified behavior for mask elements that are not all zeroes or all ones. That behavior may not match the behavior of this operation."
+        "Masks may be converted to and from signed integer lane arrays for compatibility with older APIs. For those conversions, false is encoded as all zeroes (integer value 0) and true is encoded as all ones (integer value -1).\n\n\


LaurenzV · 2026-05-23T09:52:16Z

It seems like you might have to rebase or open a new PR to make CI work correctly again.

`macos-latest` is ARM, and runs x86 binaries through rosetta. We need to specify the Intel runner explicitly to get x86 hardware. Motivation: linebender#218 (comment) Our test suite is now so thorough that it caught a Rosetta bug :sunglasses:

Shnatsel · 2026-05-23T10:50:35Z

I've reduced the duplication in the generation of splat().

There is more refactoring we could do, but I haven't found the right cut points for shared logic yet. The conditions end up being more awkward and hard to follow when they're not defined at the leaf.

Make mask types opaque

8dead9c

Shnatsel force-pushed the opaque-mask-representation-minimal branch from daee6e0 to 8dead9c Compare May 18, 2026 21:00

Shnatsel mentioned this pull request May 18, 2026

Run x86 apple job on x86 hardware #219

Merged

Shnatsel marked this pull request as ready for review May 19, 2026 00:54

LaurenzV self-requested a review May 19, 2026 05:36

LaurenzV reviewed May 19, 2026

View reviewed changes

Shnatsel added 6 commits May 19, 2026 23:28

Merge branch 'main' into opaque-mask-representation-minimal

505a4f3

Convert mask splat() to accept booleans instead of integers, mirrorin…

55c4db5

…g std::simd API. Adds a `neg` on the hot path on x86 for non-constant inputs, but seems cheap enough and worth it for the nicer API.

Doc comments: instead of just 'all zeroes' or 'all ones', specify the…

ce79eec

… actual integer values involved, now that we have boolean APIs and the conversions are relevant

Expose the high-level splat() level in masks once again, now that we'…

e2a1d52

…re not clashing with std::simd API of the same name

drop now-unnecessary explicit conversion; fixed build and test

ebc8f17

Drop SimdFrom generic bound now that we expose a public splat() metho…

5981ef3

…d again

LaurenzV approved these changes May 23, 2026

View reviewed changes

Merge branch 'main' into opaque-mask-representation-minimal

fcbe60c

Reduce code duplication in Mask.splat()

dcca45b

Shnatsel added this pull request to the merge queue May 23, 2026

Merged via the queue into linebender:main with commit a1f5cd2 May 23, 2026
22 checks passed

Shnatsel deleted the opaque-mask-representation-minimal branch May 23, 2026 10:51

Shnatsel mentioned this pull request May 23, 2026

Add more mask methods mirroring std::simd API #226

Merged

Conversation

Shnatsel commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Shnatsel commented May 18, 2026

Uh oh!

Shnatsel commented May 19, 2026

Uh oh!

LaurenzV left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Shnatsel May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Shnatsel commented May 20, 2026

Uh oh!

LaurenzV commented May 22, 2026

Uh oh!

LaurenzV left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LaurenzV commented May 23, 2026

Uh oh!

Shnatsel commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Shnatsel commented May 18, 2026 •

edited

Loading

Shnatsel May 19, 2026 •

edited

Loading