Add 64-bit integer vectors and operations on them by Shnatsel · Pull Request #253 · linebender/fearless_simd

Shnatsel · 2026-06-23T13:03:12Z

Stacked on top of #231 because many 64-bit ops (e.g. min/max) were only added in AVX-512

Supersedes #97

Shnatsel · 2026-06-23T14:12:09Z

The documentation for load/store_interleaved_128 was misleading. Both formulations are valid for 32-bit elements but the 8- and 16-bit elements already behaved differently, following the NEON vld4/vst4 semantics rather than our documented semantics. This misled me into generalizing the op to 64-bit numbers incorrectly.

I've changed the implementation back to vld4/vst4 semantics in subsequent commits and updated documentation.

Shnatsel · 2026-06-28T15:39:50Z

+pub(crate) fn unrolled_array(len: usize, item: impl FnMut(usize) -> TokenStream) -> TokenStream {
+    let items = (0..len).map(item).collect::<Vec<_>>();
+    quote! { [#(#items),*] }
+}
+
+pub(crate) fn scalar_binary(f: TokenStream, vec_ty: &VecType, simd: impl ToTokens) -> TokenStream {
+    let scalar = vec_ty.scalar.rust(vec_ty.scalar_bits);
+    let len = vec_ty.len;
+    let items = unrolled_array(len, |idx| quote! { #f(a[#idx], b[#idx]) });
+
+    quote! {
+        let a: [#scalar; #len] = a.into();
+        let b: [#scalar; #len] = b.into();
+        let result: [#scalar; #len] = #items;
+        result.simd_into(#simd)
+    }
+}
+
+pub(crate) fn scalar_binary_method(
+    method: &str,
+    vec_ty: &VecType,
+    simd: impl ToTokens,
+) -> TokenStream {
+    let method = Ident::new(method, Span::call_site());
+    let scalar = vec_ty.scalar.rust(vec_ty.scalar_bits);
+    let len = vec_ty.len;
+    let items = unrolled_array(len, |idx| quote! { a[#idx].#method(b[#idx]) });
+
+    quote! {
+        let a: [#scalar; #len] = a.into();
+        let b: [#scalar; #len] = b.into();
+        let result: [#scalar; #len] = #items;
+        result.simd_into(#simd)
+    }
+}
+
+pub(crate) fn scalar_shift(f: TokenStream, vec_ty: &VecType, simd: impl ToTokens) -> TokenStream {
+    let scalar = vec_ty.scalar.rust(vec_ty.scalar_bits);
+    let len = vec_ty.len;
+    let items = unrolled_array(len, |idx| quote! { #f(a[#idx], shift) });
+
+    quote! {
+        let a: [#scalar; #len] = a.into();
+        let result: [#scalar; #len] = #items;
+        result.simd_into(#simd)
+    }
+}
+
+pub(crate) fn scalar_compare(method: &str, vec_ty: &VecType, simd: impl ToTokens) -> TokenStream {
+    let scalar = vec_ty.scalar.rust(vec_ty.scalar_bits);
+    let mask_scalar = ScalarType::Mask.rust(vec_ty.scalar_bits);
+    let len = vec_ty.len;
+    let op = match method {
+        "simd_eq" => quote! { == },
+        "simd_lt" => quote! { < },
+        "simd_le" => quote! { <= },
+        "simd_ge" => quote! { >= },
+        "simd_gt" => quote! { > },
+        _ => unreachable!("unsupported scalar comparison: {method}"),
+    };
+    let items = unrolled_array(len, |idx| {
+        quote! { if a[#idx] #op b[#idx] { true_lane } else { false_lane } }
+    });
+
+    quote! {
+        let a: [#scalar; #len] = a.into();
+        let b: [#scalar; #len] = b.into();
+        let true_lane: #mask_scalar = !0;
+        let false_lane: #mask_scalar = 0;
+        let result: [#mask_scalar; #len] = #items;
+        result.simd_into(#simd)


This basically duplicates the scalar fallback code but I didn't want to do a big refactoring here that would change the scalar fallback level.

But that refactoring might be worth it considering that #256 also needs it.

Add i64/u64 vector types and operations across the generated SIMD backends, with focused int64 coverage and optimized interleaved load/store paths where available.

LaurenzV · 2026-06-30T17:45:41Z

I'm curious, do you think this will overall impact the compile time for the crate a lot, even if none of the 64-bit stuff is used? Have you done any measurements?

Shnatsel · 2026-06-30T18:00:00Z

It really shouldn't. This is all generic code, so it is not actually instantiated and doesn't turn into MIR or LLVM IR until something actually calls it.

The downside of generics is that if we call the same function 5 times you get 5 different instantiations of it so 5x the IR for LLVM to chew through, but in our case we want all the intrinsics inlined anyway so this is unavoidable, generics or not.

Shnatsel commented Jun 28, 2026

View reviewed changes

Add 64-bit integer vector support

34c0f3c

Add i64/u64 vector types and operations across the generated SIMD backends, with focused int64 coverage and optimized interleaved load/store paths where available.

Shnatsel force-pushed the 64-bit-ints branch from d5fae13 to 34c0f3c Compare June 28, 2026 21:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add 64-bit integer vectors and operations on them#253

Add 64-bit integer vectors and operations on them#253
Shnatsel wants to merge 1 commit into
linebender:mainfrom
Shnatsel:64-bit-ints

Shnatsel commented Jun 23, 2026

Uh oh!

Shnatsel commented Jun 23, 2026

Uh oh!

Shnatsel Jun 28, 2026

Uh oh!

LaurenzV commented Jun 30, 2026

Uh oh!

Shnatsel commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Shnatsel commented Jun 23, 2026

Uh oh!

Shnatsel commented Jun 23, 2026

Uh oh!

Shnatsel Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

LaurenzV commented Jun 30, 2026

Uh oh!

Shnatsel commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants