Skip to content

The new Yul optimizer toolkit#519

Draft
xermicus wants to merge 199 commits into
mainfrom
cl/newyork
Draft

The new Yul optimizer toolkit#519
xermicus wants to merge 199 commits into
mainfrom
cl/newyork

Conversation

@xermicus
Copy link
Copy Markdown
Member

No description provided.

xermicus and others added 30 commits February 2, 2026 21:28
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Implement type-inference-driven value narrowing in newyork codegen:

1. Let-binding narrowing: i256 values with inferred width ≤I64 and
   unsigned are truncated to i64 at Let binding sites, enabling native
   RISC-V arithmetic downstream.

2. Arithmetic dispatch: Add/Mul check inferred result width. If result
   fits in i64, use ensure_min_width for native ops. Sub always uses
   i256 to handle negative results correctly.

3. Pointer-site narrowing: Memory offsets/lengths narrowed from i256
   to i64 at use sites (mstore, mload, codecopy, etc.).

4. Fix condition_stmts forward inference: The forward type inference
   pass was not visiting Let statements inside for-loop condition
   blocks, causing large constants (e.g., type(int256).min) to retain
   default I1 width and be incorrectly truncated.

5. Fix inliner for large contracts: Single-call functions larger than
   40 IR nodes are now deferred to LLVM's inliner instead of being
   inlined at IR level. This prevents creating monolithic dispatcher
   functions that cause regressions on OpenZeppelin contracts.

Results: OZ ERC20 -3.5%, OZ ERC721 -3.9%, small benchmarks -1.5% to -3.1%.
All 62 integration tests, 5851 retester tests, and make test pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a truncate-and-extend pattern after loading the free memory pointer
(mload(64)) to prove to LLVM that the value fits in 64 bits. This
enables LLVM to eliminate downstream overflow checks and simplify
surrounding arithmetic, yielding significant code size reductions.

Also adds revert(0,K) block deduplication using shared basic blocks
keyed by constant revert length.

Results vs standard pipeline:
- Integration ERC20: -10.8% (was -5.1%)
- Integration SHA1: -7.9% (was -1.4%)
- OZ ERC20: -11.7% (74,508 vs 84,364 bytes)
- OZ ERC721: -9.4% (84,950 vs 93,730 bytes)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Key changes:
- FMP range proof changed from 64-bit to 32-bit to match XLEN,
  eliminating 60 more overflow trap sites in OZ ERC20
- Added apply_range_proof() helper for reusable range proofs
- Added range proofs for timestamp (64), number (64), chainid (64), basefee (128)
- Documented which builtins already have implicit range proofs
- Memory region annotation in simplify pass using constant tracking
- Updated IR_PLAN.md with optimization analysis and failed experiments

Results: ERC20 -12.6%, Flipper -10.8%, Computation -10.5% vs standard.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cture

Conditional sbrk NoInline: mark __sbrk_internal as NoInline only for
contracts with >20 heap operations. ERC20 saves 359 bytes while small
contracts remain unaffected.

Interprocedural parameter narrowing: infrastructure to narrow function
parameter types from I256 based on body-internal usage analysis.
Currently no parameters meet the strict criteria but the code path is
ready for broader narrowing rules.

ERC20: 14287 bytes (-14.7% vs standard 16757)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The try_narrow_let_binding was overriding the 32-bit FMP range proof
(trunc i256 → i32; zext i32 → i256) with a weaker 64-bit proof
(trunc i256 → i64). Now detects zext source width and preserves
tighter existing proofs. Result: 17 fewer overflow checks in LLVM IR,
288 fewer IR lines, though PVM blob size unchanged (PolkaVM already
handles dead overflow blocks efficiently).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Major newyork IR optimizations:

1. Keccak256Pair IR node: fuses mstore(0,w0)+mstore(32,w1)+keccak256(0,64)
   into a single __revive_keccak256_two_words call. 13 sites in ERC20,
   saving ~529 bytes (-3.7%).

2. FMP constant propagation: replaces mload(0x40) with known constant
   when the free memory pointer hasn't been modified. Handles
   Statement::Block wrapping, Statement::Expr void calls, and builds
   transitive FMP-writer closure via call graph analysis.

3. Interprocedural parameter narrowing: propagates caller argument
   widths to callee parameters in the type inference forward pass.
   Narrows function signatures (e.g., assert_helper(i32),
   finalize_allocation(i64, i256)).

4. Conditional sbrk NoInline: only outline __sbrk_internal when the
   contract has >20 heap operations (saves ~359 bytes on ERC20 without
   regressing small contracts).

5. Load-after-store forwarding in mem_opt: when mload matches a tracked
   mstore to the same offset, forward the stored value directly.

6. Memory region annotation in simplify pass: tag MLoad/MStore with
   FreePointerSlot/Scratch/Dynamic regions from constant offset analysis.

Code size improvements (newyork vs previous):
- ERC20:    14287 → 13051 (-8.7%)
- SHA1:      6748 →  5930 (-12.1%)
- Computation: 1654 → 1577 (-4.7%)
- DivisionArithmetics: 13501 → 13243 (-1.9%)

All 62 integration tests pass with RESOLC_USE_NEWYORK=1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two new code size optimizations:

1. PanicRevert IR node: detects the Solidity panic pattern
   (mstore(0, 0x4e487b71...) + mstore(4, code) + revert(0, 0x24))
   and replaces it with a PanicRevert { code } statement that lowers
   to a shared helper function. Each panic code (0x11 overflow,
   0x22 encoding, 0x41 memory, etc.) gets its own shared block,
   deduplicating the 3-statement pattern across many call sites.

2. Shared return blocks: when multiple return(offset, length)
   statements have identical constant arguments, they branch to a
   single shared LLVM basic block instead of each generating their
   own exit sequence (sbrk + ptrtoint + seal_return).

Code size improvements:
- ERC20:              13051 → 12681 (-2.8%)
- DivisionArithmetics: 13243 → 12950 (-2.2%)

All 62 integration tests pass with RESOLC_USE_NEWYORK=1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. Common subexpression elimination for pure environment reads
   (calldatasize, callvalue, caller, origin, address). After
   inlining, multiple switch cases redundantly call these builtins.
   CSE replaces 2nd+ occurrences with references to the first
   binding, enabling LLVM to eliminate redundant loads.

2. Storage load/store by-value functions: pass key and value as
   i256 directly instead of through alloca+pointer. Eliminates
   2x alloca + 2x store overhead per storage operation.

3. Outlined caller_word function: moves the syscall + bswap + zext
   sequence into a shared function to avoid code duplication.

4. Fix clippy: remove useless .into() on BasicValueEnum arguments.

Code size improvements:
- ERC20:    12681 → 12386 (-2.3%)
- Flipper:   1507 →  1493 (-0.9%)

All 62 integration tests pass with RESOLC_USE_NEWYORK=1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
xermicus and others added 10 commits May 8, 2026 14:11
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Comment thread crates/newyork/src/lib.rs
Comment on lines +37 to +39
/// Environment variable: when set, dumps the newyork IR for every translated object to
/// `/tmp/newyork_ir_<object>.txt` after optimization passes have run.
pub const NEWYORK_DUMP_IR_ENV: &str = "NEWYORK_DUMP_IR";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When debugging LLVM, I find it useful having the option to control after which specific pass to dump (e.g -print-after=<pass> instead of -print-after-all). We could consider adding something like that as well. It'd likely help debugging the optimizer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we should consider CLI flags/arguments for this.

Comment thread crates/newyork/src/ssa.rs Outdated
Comment thread crates/newyork/src/ssa.rs Outdated
Comment thread crates/newyork/src/ssa.rs Outdated
Comment thread crates/newyork/src/ssa.rs Outdated
Comment thread crates/newyork/src/lib.rs Outdated
Comment thread crates/newyork/src/inline.rs
Comment thread crates/newyork/src/simplify.rs
Comment thread crates/newyork/src/simplify.rs Outdated
Late inline pass on top of the existing optimizer + cost-model retune + per-helper memory(...) attribute precision so GVN can fold across heap mstores. −8.95% on the OZ corpus vs the baseline.

Co-authored-by: xermicus <bigcyrill@hotmail.com>
Comment on lines +1844 to +1852
let removed_count = redirects.len();
redirect_calls_in_block(&mut object.code, &redirects);
for function in object.functions.values_mut() {
redirect_calls_in_block(&mut function.body, &redirects);
}

for dup_id in redirects.keys() {
object.functions.remove(dup_id);
}
Copy link
Copy Markdown
Contributor

@elle-j elle-j May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When redirecting the calls and removing the duplicate functinons (similarly in deduplicate_functions_fuzzy()), I don't see any updates to the canonicalized function's call_count.

I assume this would currently not set the NoInline attribute in some scenarios after deduplication even if the actual call count is >= 3, based on LlvmCodegen::set_inline_attribtutes()'s InlineDecision::CostBenefit match arm:

Some(crate::InlineDecision::CostBenefit) => {
let force_noinline = function.call_count >= 3
|| function.size_estimate >= LARGE_FUNCTION_NOINLINE_THRESHOLD;
if force_noinline {
Some(revive_llvm_context::PolkaVMAttribute::NoInline)
} else {
None
}
}


/// Results of memory optimization.
#[derive(Clone, Debug, Default)]
pub struct MemOptResults {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MemOptResults doesn't actually seem to be used. I noticed that FmpPropagation::loads_eliminated only seemed to be incremented but not read, and when taking a closer look the same seemed to be true for the fields of MemOptResults.

/// the constant 0x80 through them.
pub struct FmpPropagation {
/// Number of FMP loads eliminated.
pub loads_eliminated: usize,
Copy link
Copy Markdown
Contributor

@elle-j elle-j May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loads_eliminated doesn't seem to be used (see previous comment).

Comment on lines +831 to +832
let is_fmp_store =
region == MemoryRegion::FreePointerSlot || resolved_offset == Some(0x40);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't just a resolved_offset == Some(0x40) check sufficient on its own? If the additional MemoryRegion::FreePointerSlot check is necessary, perhaps extracting the whole region == MemoryRegion::FreePointerSlot || resolved_offset == Some(0x40) into e.g. an is_fmp_slot() would be beneficial for maintenance and make it clearer as to why this check includes the || (the check is used both for is_fmp_store and is_fmp_load).

if *region == MemoryRegion::Unknown && !pattern.is_aligned() {
if let Some(addr) = self.extract_static_offset(offset) {
if addr % 32 != 0 {
self.tainted_regions.insert(addr / 32 * 32);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addr / 32 * 32 is used very frequently in this file (and a few times in mem_opt.rs as well), should we introduce a word address helper?

neo and others added 8 commits May 20, 2026 20:57
Fix various soundness issues at the cost of ~1% code size growth.

Signed-off-by: xermicus <bigcyrill@hotmail.com>
Co-authored-by: xermicus <bigcyrill@hotmail.com>
Resolves conflicts after LLVM 21 → 22 upgrade and inkwell 0.8 → 0.9
landed on main:

* Cargo.toml: keep bumped versions from main; preserve revive-newyork.
* docs/: regenerated with mdbook against cl/newyork sources.
* function/mod.rs: keep ours' NoFree+NoUnwind PVM target attributes;
  keep main's is_middle_end_enabled gate around NoInline/OptimizeNone.
* context/mod.rs: drop narrow_divrem_instructions and
  strip_minsize_for_divrem — both were LLVM 21 backend workarounds
  removed by main in #501 once the upstream patch landed; LLVM 22
  bundles the fix.
* heap.rs: port inkwell 0.8 → 0.9 API breaks (custom_width_int_type
  now takes NonZeroU32 and returns Result).
* codesize.json: kept ours (LLVM 22 bumps integration sizes ~5–60%;
  update intentionally in a follow-up).

Verified: RESOLC_USE_NEWYORK=1 make test green (workspace + book).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a per-operation reference for the newyork IR to the compiler book
for easily being able to view printed syntax, operand and result types,
examples, etc. for any operation. Using easily readable syntax notation.
The new page is partly inspired by [MLIR's
docs](https://mlir.llvm.org/docs/Dialects/SCFDialect/).

<details>
  <summary>Example entry</summary>

### `and`

(`Expression::Binary` with `BinaryOperation::And`)

#### Description

Bitwise AND. The common idiom for type narrowing: a constant mask on the
right lets forward analysis pick up a tight result width.

#### Syntax

```text
and($lhs[: <type>], $rhs[: <type>])
```

#### Example

```text
let v2 := and(v0, v1)
let v3 := and(v0, 0xff)     // type inference narrows result to i8
```

#### Operands

| Name | Type | Notes |
|---|---|---|
| `lhs` | `i256` | — |
| `rhs` | `i256` | — |

#### Result and purity

| Result | Purity |
|---|---|
| `min(width(lhs), width(rhs))` — AND can only clear bits, so the result
fits in the narrower operand | Pure |

#### Annotations

None.

</details>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
* Drop the -O0 `optnone`/`noinline` gate from `set_default_attributes`
  (main #502). It pinned both attributes on every function when the
  middle end is off; with newyork-emitted IR (many small helpers
  expecting to be inlined) the backend's fast-isel produced PVM basic
  blocks over the 1000-instruction `BasicBlockTooLarge` limit. #502
  itself reported the change as NFC on main, so the safeguard was
  inert there but actively harmful here.

* Restore `narrow_divrem_instructions` after the LLVM pipeline.
  `strip_minsize_for_divrem` stays dropped — LLVM 22 carries the patch
  that made it unnecessary — but the narrowing safety net still helps
  i256 div/rem expand into compact backend code.

* In `LlvmCodegen::set_inline_attributes`, early-return when the
  middle end is disabled and route the attribute write through
  `PolkaVMFunction::set_attributes(force = true)`. The raw
  `add_attribute` call stacked `alwaysinline` on top of the
  forced `noinline`/`optnone`, which the LLVM IR verifier rejected
  during library post-link compilation.

`RESOLC_USE_NEWYORK=1 retester.sh`: 34888 passed, 6 failed, 144
ignored. The six failures are `blockhash(0xFFFF…)` at S- across all
Y M{0,1,2,3,s,z} — a pre-existing newyork edge case masked at S+ by
solc's optimizer folding the call.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pre-existing newyork edge case: `blockhash(0xFFFF…)` reverts instead
of returning 0. solc's optimizer constant-folds the call to 0 at S+,
so the failure is only visible across Y M{0,1,2,3,s,z} at S-.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: xermicus <bigcyrill@hotmail.com>
Comment thread crates/newyork/src/lib.rs
Comment on lines +139 to +141
if std::env::var(NEWYORK_DUMP_IR_ENV).is_ok() {
use std::io::Write;
let dump_path = format!("/tmp/newyork_ir_{}.txt", ir_object.name.replace('/', "_"));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IR dumps for NEWYORK_DUMP_IR seem to be hardcoded to "/tmp/newyork_ir_{}.txt". Could we use e.g. our current --debug-output-dir instead in order to make this configurable? (Similarly for RESOLC_DEBUG_*.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the IR dumps don't show the inferred widths at e.g. let bindings, operands, if-statement conditions, etc. even after the narrow pass has run (RESOLC_DEBUG_IR=1 dumps post-narrow). Some examples from compiling Flipper.sol:

let v19 := 0xffffffffffffffff
let v31 := 0x20
let v34 := slt(v7, v31)
if v34 { ... }
let v39 := iszero(v38)
let v49 := 0xff
let v50 := and(v39, v49)

I think it'd be more useful (and less ambiguous) if it showed the types in all cases, so it's easier to reason about if a type has been narrowed from looking at the IR. Something like:

let v19: i64 := 0xffffffffffffffff
let v31: i8 := 0x20
let v34: i1 := slt(v7: i64, v31: i8)
if v34: i1 { ... }
let v39: i1 := iszero(v38: i256)
let v49: i8 := 0xff
let v50: i64 := and(v39: i1, v49: i8) /* v50 may be i64 instead of i1 due to using a demanded width */

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function calls in subobjects are printed as func_<id> rather than <func_name>. From compiling ERC20.sol:

let v141 := func_0(v140, v681)        // should be abi_encode_string
let v154 := func_1(v689: i8)          // should be abi_decode_address
let v248 := func_3(v247, v221)        // should be checked_sub_uint256
let v270 := func_4(v269, v221)        // should be checked_add_uint256

But sometimes there is a name printed, e.g. extract_byte_array_length() (in the dump after setting NEWYORK_DUMP_IR) which is the name of a function in the parent object, so there seems to be a FunctionId looked up from the wrong map (or populated incorrectly). Worth also looking into whether this bug goes beyond the printer.

//! valid Yul but is designed to be human-readable and clearly show the SSA
//! structure with explicit value IDs.
//!
//! # Example Output
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment's example output doesn't correspond to the actual current output (e.g. it uses = instead of :=).

Comment on lines +487 to +495
Statement::For {
initial_values,
loop_variables,
condition_statements,
condition,
body,
post,
outputs,
..
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The post_input_variables is dropped (..) here. (Also, the for-loop indentation is a little broken.)

} => {
self.write_indent();
let arg_strs: Vec<String> =
arguments.iter().map(|a| format!("v{}", a.id.0)).collect();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably use the existing write_value()/write_value_id().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants