Pr 973 bench baseline by lerman25 · Pull Request #974 · RedisAI/VectorSimilarity

lerman25 · 2026-06-01T17:42:08Z

Describe the changes in the pull request

A clear and concise description of what the PR is solving.

Which issues this PR fixes

#...
MOD...

Main objects this PR modified

...
...

Mark if applicable

This PR introduces API changes
This PR introduces serialization changes

Stacked on PR #970 (MOD-14954 x86 kernels). Mirrors x86 structure onto NEON_HP / SVE / SVE2 tiers. Zero CMake changes; reuses existing ARM TU compile flags. Scalar fallback already on main serves as reference. Bakes in PR #970 review lessons (assert(dim>=16), 4-accumulator ILP, formula anchor, load_unaligned<float> metadata, dispatcher-routed tier-walk tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

14 bite-sized tasks following the spec at 2026-05-28-arm-sq8-fp16-design.md. Each task ends in a commit; assistant runs tests/ASan/benchmarks after the user confirms each ARM build cycle. Zero CMake changes; PR stacks on #970. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…OD-14972] The 9 ARM tier blocks (L2/IP/Cosine × SVE2/SVE/NEON_HP) were missing ASSERT_EQ(alignment, 0) after each ASSERT_NEAR, unlike the SQ8_FP32 sister blocks which assert it. Adds the assertions to lock the contract that ARM tiers leave the caller's alignment value untouched.

…D-14972] svcvt_f32_f16_x (FCVT) reads even-indexed FP16 elements: FP32[e] ← FP16[2e]. The step function loaded chunk consecutive FP16 values into positions 0..chunk-1, then passed them directly to svcvt_f32_f16_x, which picked positions 0,2,4,... and silently skipped positions 1,3,5,... For chunk=4 (128-bit SVE), only 2 of 4 FP16 values per step were used, producing wrong dot products. Fix: svzip1_f16(q_h, zeros) spreads values to even positions [v0,0,v1,0,...] so FCVT correctly reads v[0],v[1],v[2],... Applied to both the full step helper and the partial-chunk path. Discovered and fixed during ARM host verification (Task 14, MOD-14972).

…D-14972] SVE hot loop: replace svzip1_f16+svdup_f16+svwhilelt_b16 (4 ops) with svld1uh_u32 (1 op) — zero-extends each FP16 halfword into a 32-bit lane so svcvt_f32_f16_x reads the correct bits directly. Same fix applied to the partial-chunk path, which also drops the now-redundant pg16_partial predicate. Accumulator combine changed from svadd_f32_x to svadd_f32_z to match the SQ8_FP32 SVE sister. NEON residual: replace the single 8-lane block + up-to-7 software-scalar iterations with three independent 4-lane sub-steps (r>=4, r>=8, r>=12), leaving at most 3 elements for scalar — mirrors the SQ8_FP32 NEON sister exactly. Eliminates expensive vecsim_types::FP16_to_FP32 calls for residuals 4..15 (previously up to 7 software conversions per call). Both IP headers: remove assert()+<cassert> (no sister kernel uses them). Both L2 headers: drop redundant float16.h include and using declarations (arrive transitively through the included IP header).

…MOD-14972] - Remove docs/superpowers/ design and plan files (~1550 lines); sister PR #970 removed its equivalent doc before merge. - Drop 5-line "No alignment write" prose comment from the three AArch64 NEON_HP dispatcher blocks; the sister SQ8_FP32 ARM dispatchers carry no such comment — the absent alignment write already encodes the intent. - Trim GetDistFuncSQ8FP16Asymmetric to a 7-line template-mapping check at dim=15, matching the shape of GetDistFuncSQ8Asymmetric (SQ8_FP32 sister). The scalar-fallback assertion it previously duplicated is already covered by the trailing block of SQ8_FP16_SpacesOptimizationTest.

The bm_spaces_sq8_fp16 executable is built but was never emitted by benchmarks.sh, so no CI label (bm-spaces / benchmarks-all) would run it. Register it in bm-spaces, bm-spaces-sq8-full, benchmarks-all and benchmarks-default, and add a dedicated bm-spaces-sq8-fp16 case.

jit-ci · 2026-06-01T17:42:17Z

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

CLAassistant · 2026-06-01T17:42:17Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ dor-forer
✅ lerman25
❌ Ubuntu

Ubuntu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

jit-ci · 2026-06-01T17:44:04Z

🛡️ Jit Security Scan Results

✅ No security findings were detected in this PR

^{Security scan by Jit}

dor-forer and others added 20 commits June 1, 2026 15:12

Add NEON_HP SQ8↔FP16 IP kernel header [MOD-14972]

4f0534c

Add NEON_HP SQ8↔FP16 L2 kernel header [MOD-14972]

d3c6415

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Wire NEON_HP SQ8↔FP16 choosers [MOD-14972]

69cee3d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Dispatch SQ8↔FP16 to NEON_HP tier on AArch64 [MOD-14972]

1b36b38

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Extend SQ8↔FP16 tier-walk tests with NEON_HP [MOD-14972]

1af4812

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add SVE SQ8↔FP16 IP kernel header [MOD-14972]

0ce0bce

Add SVE SQ8↔FP16 L2 kernel header [MOD-14972]

eb4952a

Wire SVE/SVE2 SQ8↔FP16 choosers [MOD-14972]

fcb01bb

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Dispatch SQ8↔FP16 to SVE/SVE2 tiers on AArch64 [MOD-14972]

15fca69

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Extend SQ8↔FP16 tier-walk tests with SVE/SVE2 [MOD-14972]

0fcd7d0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Register ARM SQ8↔FP16 microbenchmarks [MOD-14972]

6a783f8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Apply clang-format [MOD-14972]

10c03aa

Apply clang-format 18.1.8 (matches CI) [MOD-14972]

e1647dc

lerman25 added the bm-spaces label Jun 1, 2026

This was referenced Jun 1, 2026

Pr 973 bench treatment #975

Draft

Add SQ8↔FP16 ARM SIMD distance kernels [MOD-14972] #973

Open

Pr 973 bench FMLAL (FEAT_FHM) #976

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pr 973 bench baseline#974

Pr 973 bench baseline#974
lerman25 wants to merge 20 commits into
mainfrom
pr-973-bench-baseline

lerman25 commented Jun 1, 2026

Uh oh!

jit-ci Bot commented Jun 1, 2026

Uh oh!

CLAassistant commented Jun 1, 2026

Uh oh!

jit-ci Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lerman25 commented Jun 1, 2026

Uh oh!

jit-ci Bot commented Jun 1, 2026

Uh oh!

CLAassistant commented Jun 1, 2026

Uh oh!

jit-ci Bot commented Jun 1, 2026

🛡️ Jit Security Scan Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants