feat(riscv): i64 Phase 2 — mul, shifts, rotates, compares, clz/ctz/popcnt, sign-extends#128
Open
avrabe wants to merge 1 commit into
Open
feat(riscv): i64 Phase 2 — mul, shifts, rotates, compares, clz/ctz/popcnt, sign-extends#128avrabe wants to merge 1 commit into
avrabe wants to merge 1 commit into
Conversation
Extends the RV32IMAC instruction selector with the harder i64 surface, building on the Phase-1 typed-vstack register-pair representation. Implemented (each lowering to an RV32IMAC sequence): - I64Mul — low-64 product via mul + mulhu carry + 2 cross-term muls - I64Shl / I64ShrS / I64ShrU — runtime-amount shifts with a data-dependent branch on shamt >= 32 (cross-word) vs. < 32 (within-word + carry) - I64Rotl / I64Rotr — composed from a pair of cross-word shifts ORed together - I64Clz / I64Ctz / I64Popcnt — base-ISA software sequences (no Zbb): clz/ctz branch on the hi/lo half and use an unrolled binary-search clz_word; ctz reuses clz via `x & -x`; popcnt is the SWAR mul-collapse - I64LtS/LtU/LeS/LeU/GtS/GtU/GeS/GeU — hi-then-lo compare ladder (hi signed or unsigned per op, lo always unsigned), reduced to less-than + invert - I64Extend8S / I64Extend16S / I64Extend32S — sub-word sign extension with sign propagation into the high word via srai 31 Deferred to Phase 3: - I64DivS / I64DivU / I64RemS / I64RemU — RV32 has no 64-bit divide; these need a __divdi3-style software long-division routine. They fall through to the existing `Unsupported` arm — fail loudly, no silent miscompile. Tests: 23 new shape-assertion tests (one+ per implemented op, both shift cross-word cases, the clz hi-vs-lo branch, deferred-op Unsupported check). Validation: cargo test (148 pass, was 125), clippy -D warnings clean, cargo fmt --check clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 2 of RV32IMAC i64 support, building on Phase 1 (v0.3.1, #119). 16 new i64 ops, all lowering to RV32IMAC base-ISA instruction sequences via the typed
VstackValregister-pair model.Implemented (16 ops)
I64Mul— low-64 product.lo = mul(al,bl);hi = mulhu(al,bl) + mul(al,bh) + mul(ah,bl).I64Shl/I64ShrS/I64ShrU— runtime-amount shifts with a data-dependentbneon bit 5 ofshamt & 63. Cross-half carry uses a two-step(lo>>1)>>(31-s)so it stays well-defined ats==0(avoids RV's>>32 == >>0masking).I64Rotl/I64Rotr— composed from two cross-word shifts ORed half-by-half.I64Clz/I64Ctz/I64Popcnt— base-ISA software (no Zbb).clzbranches hi-vs-lo;clz_wordis an unrolled branchless binary search.ctzvia31 - clz(x & -x).popcntis SWAR per half.I64{Lt,Le,Gt,Ge}{S,U}— hi-then-lo comparison ladder; Le/Gt/Ge derived from Lt by operand swap + invert.I64Extend8S/16S/32S—(x << (32-w)) >>s (32-w)thensrai lo,31to broadcast the sign.Deferred to Phase 3
I64DivS/DivU/RemS/RemU— RV32 has no 64-bit divide; an inline__divdi3-style long division would balloon the selector and there's no runtime-call path yet. They fall through to the existingSelectorError::Unsupportedarm (a test pins this).Tests
+23 tests (125 → 148 passing, 1 pre-existing ignored). One+ per op; both shift cross-word cases pinned (
i64_shl_big_case_zeroes_low_half,i64_shl_small_case_uses_register_shifts_and_carry_or), the clz hi-vs-lo branch pinned, signed + unsigned compares both covered.Note on the diff
git diff --statreports2312 insertions / 699 deletions, but no functions or tests were removed — verified: all 108 functions frommainare present in the branch (now 145, +37). The "deletions" are blank-line reflow artifacts fromcargo fmtinterleaving with the large additions. Function-name set diffmain→ branch is empty for removals.Validation
cargo test --package synth-backend-riscv— 148 pass, 0 fail, 1 ignored.cargo clippy --package synth-backend-riscv --all-targets -- -D warnings— clean.cargo fmt --check— clean.Follow-ups
__divdi3-style runtime or a runtime-call path.i64.load8_setc.) still unimplemented — noted in the module doc.🤖 Generated with Claude Code