Skip to content

wip(benchmark): create simpler version of benchmark runner#2514

Draft
meyer9 wants to merge 16 commits intomainfrom
benchmark/port
Draft

wip(benchmark): create simpler version of benchmark runner#2514
meyer9 wants to merge 16 commits intomainfrom
benchmark/port

Conversation

@meyer9
Copy link
Copy Markdown
Contributor

@meyer9 meyer9 commented May 4, 2026

The web interface can stay in base/benchmark and the benchmark runner will live here eventually. This removes a lot of the old dependencies on Geth and simplifies the tool significantly.

meyer9 and others added 7 commits May 4, 2026 12:49
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…xpansion

Adds the base-benchmark crate with:
- BenchmarkConfig YAML types with full matrix variable expansion (cartesian product, ≤100 run guard)
- BenchmarkError unified error enum
- src/params.rs with rollup consts (keys, gas limits, EIP-1559 params)
- Axum/tokio CLI binary stub with all flags and BASE_BENCH_ env prefix
- Output helpers (random_id, gzip_file, write_result_json, dump_log_tail)
- 8 unit tests (config round-trip, matrix expansion, gzip round-trip)
- flate2, nix, prometheus-parse added to workspace dependencies

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…nager

Infrastructure utilities for child process orchestration:
- PortManager: TcpListener::bind probe over 10000-65535 with mutex-guarded HashSet, acquire/release/acquire_n
- ProcessHandle: tokio::process::Command with SIGINT via nix, 5s grace period, SIGKILL escalation, structured tracing
- SnapshotManager: sha256(command)[:12] cache key, force_clean, explicit datadir passthrough, external script invocation
- 20 tests passing (up from 8)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
… BuilderClient

Client abstraction layer for EL process orchestration:
- ExecutionClient trait (run/stop/rpc_url/auth_rpc_url/metrics_port/get_version/set_head/flashblocks_client/supports_flashblocks)
- BaseRethNodeClient: 4 ports (EL/Auth/Metrics/P2P), full reth arg list, txpool backup cleanup, 240s RPC readiness poll, SIGINT stop
- BuilderClient: wraps BaseRethNodeClient, acquires flashblocks WS port, appends --flashblocks.port/block-time/rollup.chain-block-time
- setup_node() dispatch fn mapping node_type string to boxed trait object
- FlashblocksClient stub in src/flashblocks/ (full WS impl in Phase 7)
- 25 tests passing (up from 20)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Implements BaseConsensusClient (wraps BaseEngineClient with FCU/getPayload/newPayload),
SequencerConsensusClient (proposes blocks with fake L1BlockInfo deposit tx and Holocene
EIP-1559 params), SyncingConsensusClient (replays payloads for validator sync benchmarking),
FakeMempool (deposit-txs-first ordering), BlockMetrics (Prometheus delta scraping,
Histogram/Gauge/Counter handling), MetricsCollector, write_metrics_json, check_thresholds,
and all metric name consts. 37/37 tests passing.
Adds axum-based JSON-RPC intercept proxy (eth_sendRawTransaction → FakeMempool,
everything else forwarded to upstream), PayloadWorker trait, and LoadTestPayloadWorker
that writes a temp YAML config and spawns base-load-test as a subprocess with
FUNDER_KEY env and duration=99999s. 41/41 tests passing.
@cb-heimdall
Copy link
Copy Markdown
Collaborator

cb-heimdall commented May 4, 2026

🟡 Heimdall Review Status

Requirement Status More Info
Reviews 🟡 0/1
Denominator calculation
Show calculation
1 if user is bot 0
1 if user is external 0
2 if repo is sensitive 0
From .codeflow.yml 1
Additional review requirements
Show calculation
Max 0
0
From CODEOWNERS 0
Global minimum 0
Max 1
1
1 if commit is unverified 1
Sum 2

@vercel
Copy link
Copy Markdown

vercel Bot commented May 4, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
base Ignored Ignored Preview May 5, 2026 2:37pm

Request Review

Implements FlashblocksClient (tokio-tungstenite WS consumer with auto-reconnect,
exponential backoff, and Flashblock::try_decode_message for JSON/Brotli messages)
and FlashblockReplayServer (axum WS broadcast server replaying stored flashblock
payloads to benchmarked validator clients). 43/43 tests passing.
Comment on lines +306 to +307
let fb_block_time = options.flashblocks_block_time_ms.unwrap_or(block_time_ms / 4);
let inner = BaseRethNodeClient::new(options, internal, port_manager, None);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BuilderClient creates a BaseRethNodeClient internally, which uses self.options.reth_bin as the binary (via self.binary()). The builder_bin field in ClientOptions is never read — so the builder client runs the reth binary instead of the builder binary.

You likely need to swap reth_bin for builder_bin here (or override the binary in the inner client):

Suggested change
let fb_block_time = options.flashblocks_block_time_ms.unwrap_or(block_time_ms / 4);
let inner = BaseRethNodeClient::new(options, internal, port_manager, None);
let mut options_for_inner = options.clone();
options_for_inner.reth_bin = options.builder_bin.clone();
let inner = BaseRethNodeClient::new(options_for_inner, internal, port_manager, None);

Comment on lines +63 to +68
let req: JsonRpcRequest = match serde_json::from_slice(&body) {
Ok(r) => r,
Err(_) => {
return proxy_raw(&state, &headers, body).await.into_response();
}
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only handles single JSON-RPC requests. If the load-test binary sends batch requests (JSON arrays), from_slice::<JsonRpcRequest> will fail and the entire batch gets forwarded to the upstream via proxy_raw — bypassing the mempool interception.

Consider also trying to parse as Vec<JsonRpcRequest> in the error branch, extracting any eth_sendRawTransaction calls from the batch.

Comment on lines +108 to +111
fn histogram_sum_count(buckets: &[prometheus_parse::HistogramCount]) -> (f64, f64) {
let sum = buckets.iter().filter(|b| b.less_than == f64::INFINITY).map(|b| b.count).sum();
let count = buckets.last().map(|b| b.count).unwrap_or(0.0);
(sum, count)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function looks incorrect for extracting histogram sum and count. In the prometheus_parse crate, HistogramCount represents bucket entries with less_than (the bucket boundary) and count (the cumulative count up to that boundary). The histogram _sum and _count are separate parsed entries — _sum is typically not accessible from the bucket list.

The current code:

  • sum: filters for less_than == +Inf and sums their count — but +Inf bucket's count is the total sample count, not the sum of observed values
  • count: takes the last bucket's count, which happens to be the same total count

So both sum and count end up returning the total count, and the delta computation becomes (count_delta) / (count_delta) == 1.0 always.

You likely need to extract _sum and _count from separate Sample entries rather than from bucket data.

Comment on lines +25 to +28
fn fake_beacon_root() -> B256 {
use alloy_primitives::keccak256;
keccak256(FAKE_BEACON_ROOT_PREIMAGE)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per CLAUDE.md: "All use imports must be at the top of the file or the top of a mod block. Never place use statements inside function bodies." This use alloy_primitives::keccak256 should be moved to the top of the file alongside the other alloy imports.

/// Required in every payload's transaction list. Since the benchmark has no
/// real L1 chain, all L1 fields are zero.
fn build_l1_info_deposit_tx() -> Bytes {
use alloy_rlp::Encodable;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue — use alloy_rlp::Encodable should be at the top of the file, not inside a function body.

Comment on lines +17 to +21
pub const BATCHER_KEY: B256 =
b256!("d2ba8e70072983384203c438d4e94bf399cbd88bbcafb82b61cc96ed12541707");

pub const PREFUND_KEY: B256 =
b256!("ad0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These keys don't match the values in PORTING_REFERENCE.md which specifies:

  • Batcher key: 0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80
  • Prefund key: 0x59c6995e998f97a5a0044966f0945389dc9e86dae88c7a8412f4603b6b78690d

If these were intentionally changed, the porting reference should be updated accordingly. If not, these need to match the Go source so benchmarks produce comparable results.

meyer9 added 7 commits May 4, 2026 13:36
Adds NetworkBenchmark (run_all/run_one loop), RunnerOptions (binary paths),
and full end-to-end orchestration: tempdir setup, snapshot preparation, node
lifecycle, proxy + load-test subprocess, SequencerConsensusClient block loop,
MetricsCollector, threshold violation checks, and per-run JSON result output.
44/44 tests passing.
Adds service::run_benchmark (reads YAML config, constructs NetworkBenchmark,
reports violations, fails on error-severity threshold breaches) and wires
src/bin/base_bench.rs to call it with PREFUND_KEY env support. 45/45 tests.
Adds examples/devnet.yaml (devnet throughput run with erc20-transfer payload,
warning/error metric thresholds) and rewrites README with architecture diagram,
usage instructions, config field reference, and output description.
Mark all phases complete (1-10), check off all task items, update module
layout to match actual files built (proxy/, no rollup.rs, service.rs role),
and condense verbose task descriptions to what was actually implemented.
… run

- BuilderClient: use builder_bin instead of reth_bin (was launching wrong binary)
- MetricsCollector: use node.metrics_port() instead of dead pre-acquired port
- Remove unused metrics_port acquisition from runner
- genesis_json_from_rollup_config: generate EL genesis.json from rollup config so
  fork timestamps (canyonTime, ecotoneTime, etc.) are properly picked up by reth
  chain spec loading -- without this, no hardforks were activated and FCU V3 was
  rejected as 'withdrawals pre-Shanghai'
- BaseConsensusClient: init_from_genesis() seeds head hash/number/timestamp from
  the node's actual genesis block via eth_getBlockByNumber
- BaseConsensusClient: pass real RollupConfig to engine client (not Default)
- Engine API: use FCU V3 + getPayload V4 + newPayload V4 (latest supported)
- PayloadAttributes: withdrawals=Some([]) required post-Canyon; None caused -38003
- README: tag architecture code block as 'text' to suppress doctest
- 45/45 tests passing; benchmark completes 20 blocks, success=true, 0 violations
Write per-run output directories with metadata.json, metrics-sequencer.json,
and metrics-validator.json matching the visualization dashboard's expected format.

- Metric keys use latency/fork_choice_updated, latency/new_payload, etc.
- Per-block metrics in nanoseconds, summary metrics in seconds
- gas/per_second uses wall-clock block-building time (sequencer) and
  newPayload duration (validator)
- Validator replay via SyncingConsensusClient collects Prometheus metrics
- Runner writes self-contained output dir with sequencer/validator logs
- config_path threaded through for sourceFile in metadata
- node_args wired from BenchmarkDefinition to extra_args on node clients
…plit

Fix a series of bugs preventing user transactions from landing in blocks:
- Pre-fund the funder account in genesis alloc using the prefund key address
- Set genesis timestamp to now() so FCU timestamps are never in the past
- Set FCU timestamp to now+block_time so the builder has time for flashblocks
- Add post-newPayload FCU with updated head hash to advance builder canonical chain
- Use proper L1BlockInfoEcotone calldata and addresses for the deposit tx so
  eth_getBlockReceipts does not error on block decoding
- Pipe load-test stdout to detect when funding is complete (removes warmup_blocks)
- Run unmeasured setup blocks via tokio::select! until load test signals ready,
  then start the measured test phase
- Reset child process signal mask via pre_exec and use process_group(0) to
  prevent inherited signal state from triggering spurious shutdowns
- Move install_signal_handler to after fund_accounts in the load-test binary
- Zero out TransferPayload default value (no ETH transfer, just gas consumption)
- Add sender_count and funding_amount fields to the load-test YAML config
- Add skip_serializing_if on WeightedTx optional fields to avoid YAML parse errors
- Add base-protocol, alloy-signer-local, and libc dependencies
Comment on lines +63 to +70
unsafe {
cmd.pre_exec(|| {
let mut sigset = std::mem::zeroed::<libc::sigset_t>();
libc::sigemptyset(&mut sigset);
libc::pthread_sigmask(libc::SIG_SETMASK, &sigset, std::ptr::null_mut());
Ok(())
});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing // SAFETY: comment on the unsafe block. Document why this is sound — e.g. that pre_exec runs after fork() in the child process, sigemptyset/pthread_sigmask are async-signal-safe, and no Rust invariants are broken because the child hasn't yet executed any Rust code post-fork.

Suggested change
unsafe {
cmd.pre_exec(|| {
let mut sigset = std::mem::zeroed::<libc::sigset_t>();
libc::sigemptyset(&mut sigset);
libc::pthread_sigmask(libc::SIG_SETMASK, &sigset, std::ptr::null_mut());
Ok(())
});
}
// SAFETY: `pre_exec` runs in the forked child process before exec.
// `sigemptyset` and `pthread_sigmask` are async-signal-safe (POSIX),
// so they are safe to call in this context. We clear the signal mask
// so the child process does not inherit blocked signals from the
// parent's tokio runtime.
unsafe {
cmd.pre_exec(|| {
let mut sigset = std::mem::zeroed::<libc::sigset_t>();
libc::sigemptyset(&mut sigset);
libc::pthread_sigmask(libc::SIG_SETMASK, &sigset, std::ptr::null_mut());
Ok(())
});
}

Comment on lines +231 to +236
let now_secs = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.expect("system time after epoch")
.as_secs();
let block_time_secs = block_time.as_secs().max(1);
let next_timestamp = now_secs + block_time_secs;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The Go source (per PORTING_REFERENCE.md §5) increments the timestamp from the previous block's timestamp by the block time, i.e. self.base.head_block_timestamp + block_time_secs. Using wall-clock time (SystemTime::now()) here means:

  1. Timestamps can drift if block production doesn't match real-time.
  2. If two blocks are proposed quickly (e.g. during setup), the timestamp could be the same or very close, causing the node to reject the payload.
  3. Results are non-reproducible across runs.
Suggested change
let now_secs = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.expect("system time after epoch")
.as_secs();
let block_time_secs = block_time.as_secs().max(1);
let next_timestamp = now_secs + block_time_secs;
let next_timestamp = self.base.head_block_timestamp + block_time.as_secs().max(1);

Comment on lines +385 to +401
let alloc = {
use alloy_signer_local::PrivateKeySigner;
let key = prefund_key.trim_start_matches("0x");
let signer = PrivateKeySigner::from_bytes(
&alloy_primitives::hex::decode(key)
.expect("valid hex prefund key")
.as_slice()
.try_into()
.expect("32-byte private key"),
)
.expect("valid private key");
let addr = format!("{:?}", signer.address());
serde_json::json!({
addr: {
"balance": "0x3635C9ADC5DEA00000000",
}
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three .expect() calls on user-supplied data (prefund_key). If the prefund key is invalid hex or not 32 bytes, this panics the benchmark runner instead of returning a BenchmarkError. Convert to ?-based error handling:

Suggested change
let alloc = {
use alloy_signer_local::PrivateKeySigner;
let key = prefund_key.trim_start_matches("0x");
let signer = PrivateKeySigner::from_bytes(
&alloy_primitives::hex::decode(key)
.expect("valid hex prefund key")
.as_slice()
.try_into()
.expect("32-byte private key"),
)
.expect("valid private key");
let addr = format!("{:?}", signer.address());
serde_json::json!({
addr: {
"balance": "0x3635C9ADC5DEA00000000",
}
})
let alloc = {
use alloy_signer_local::PrivateKeySigner;
let key = prefund_key.trim_start_matches("0x");
let key_bytes = alloy_primitives::hex::decode(key)
.map_err(|e| BenchmarkError::Config(format!("invalid prefund key hex: {e}")))?;
let key_array: [u8; 32] = key_bytes.as_slice()
.try_into()
.map_err(|_| BenchmarkError::Config(format!("prefund key must be 32 bytes, got {}", key_bytes.len())))?;
let signer = PrivateKeySigner::from_bytes(&key_array)
.map_err(|e| BenchmarkError::Config(format!("invalid prefund private key: {e}")))?;
let addr = format!("{:?}", signer.address());
serde_json::json!({
addr: {
"balance": "0x3635C9ADC5DEA00000000",
}
})
};

Also: use alloy_signer_local::PrivateKeySigner is inside a function body. Per CLAUDE.md, use imports must be at the top of the file.

Comment on lines +194 to +196
if let Err(e) = {
use std::io::Seek;
file.seek(io::SeekFrom::End(-(max_bytes as i64)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues:

  1. use inside function bodyuse std::io::Seek should be at the top of the file, per project conventions.

  2. Potential overflow-(max_bytes as i64) will produce a wrong value if max_bytes > i64::MAX as u64. While unlikely in practice (this is called with 1MB), a defensive check or using i64::try_from(max_bytes) would be safer.

Comment thread PORTING_REFERENCE.md
@@ -0,0 +1,746 @@
# Benchmark Tool — Rust Port Reference
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is a byte-for-byte copy of crates/infra/benchmark/PORTING_REFERENCE.md. Having it at the repo root seems unintentional — the benchmark-specific porting reference should live only in the benchmark crate directory. Consider removing this duplicate.

Comment on lines +257 to +260
node_type: "base-reth-node".to_string(),
extra_args: vec![],
reth_bin: self.options.reth_bin.clone(),
builder_bin: self.options.builder_bin.clone(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_payload already sets self.base.head_block_hash, head_block_number, and head_block_timestamp internally (see consensus/mod.rs:197). This manual reassignment on line 259 is redundant — you can remove it.

prometheus_parse emits histogram _sum/_count as separate Counter samples.
Add BlockMetrics::compute_delta_averages() which computes (delta_sum / delta_count)
relative to the previous scrape and inserts the result as <base>_avg, matching
the Go runner's per-interval histogram average logic.
Comment on lines +291 to +324
async fn batch_send_txs(&self, txs: &[Bytes]) -> Result<(), BenchmarkError> {
const BATCH_SIZE: usize = 100;
let client = reqwest::Client::new();
let mut set = JoinSet::new();

for batch in txs.chunks(BATCH_SIZE) {
let batch: Vec<serde_json::Value> = batch
.iter()
.map(|tx| {
serde_json::json!({
"jsonrpc": "2.0",
"method": "eth_sendRawTransaction",
"params": [format!("0x{}", hex::encode(tx))],
"id": 1,
})
})
.collect();

let url = self.rpc_url.clone();
let c = client.clone();
set.spawn(async move {
match c.post(&url).json(&batch).send().await {
Ok(resp) => {
if let Ok(body) = resp.text().await {
tracing::debug!(response = %body, "batch_send_txs response");
}
}
Err(e) => tracing::warn!(error = %e, "batch_send_txs http error"),
}
});
}

while set.join_next().await.is_some() {}
Ok(())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_send_txs silently swallows all HTTP errors — it logs warnings but always returns Ok(()). If the RPC endpoint is down or every batch fails, propose will continue as if transactions were submitted, producing empty blocks without any indication of failure to the caller.

Consider propagating at least one error from the JoinSet, or returning the count of failed batches so the caller can decide whether to continue:

async fn batch_send_txs(&self, txs: &[Bytes]) -> Result<(), BenchmarkError> {
    const BATCH_SIZE: usize = 100;
    let client = reqwest::Client::new();
    let mut set = JoinSet::new();

    for batch in txs.chunks(BATCH_SIZE) {
        let batch: Vec<serde_json::Value> = batch
            .iter()
            .map(|tx| {
                serde_json::json!({
                    "jsonrpc": "2.0",
                    "method": "eth_sendRawTransaction",
                    "params": [format!("0x{}", hex::encode(tx))],
                    "id": 1,
                })
            })
            .collect();

        let url = self.rpc_url.clone();
        let c = client.clone();
        set.spawn(async move {
            c.post(&url).json(&batch).send().await.map(|_| ())
        });
    }

    let mut last_err = None;
    while let Some(result) = set.join_next().await {
        if let Ok(Err(e)) = result {
            tracing::warn!(error = %e, "batch_send_txs http error");
            last_err = Some(e);
        }
    }
    if let Some(e) = last_err {
        return Err(BenchmarkError::Client(format!("batch tx send failed: {e}")));
    }
    Ok(())
}

impl Default for TransferPayload {
fn default() -> Self {
Self { min_value: U256::from(1_000u64), max_value: U256::from(100_000u64) }
Self { min_value: U256::ZERO, max_value: U256::ZERO }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes TransferPayload::default() from sending 1,000–100,000 wei transfers to sending zero-value transfers for all users of the load-test crate, not just the benchmark runner. If any other caller relies on the load-test binary sending non-zero transfers (e.g. to verify balance changes or trigger specific EVM behavior), this silently breaks them.

If zero-value is only needed for benchmarking, consider making this configurable via the YAML config instead of changing the default.

Comment on lines +222 to +234
let mut ready = std::pin::pin!(worker.wait_until_ready());
loop {
tokio::select! {
biased;
result = &mut ready => {
result?;
break;
}
result = sequencer.propose(&mempool, block_time, gas_limit) => {
result?;
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cancellation safety issue: sequencer.propose() is not cancel-safe. When ready completes, the in-flight propose future is dropped. If propose was mid-way through (e.g., new_payload completed but the subsequent update_fork_choice(None) on line 260 hasn't run yet), the sequencer's internal state (head_block_hash, etc.) will be stale or inconsistent for the next propose call in the measured loop.

Consider running the setup propose to completion before re-checking readiness:

loop {
    tokio::select! {
        biased;
        result = &mut ready => {
            result?;
            break;
        }
        result = sequencer.propose(&mempool, block_time, gas_limit) => {
            result?;
            // Check readiness after each full block rather than mid-propose
        }
    }
}

Actually, the current structure already does this since propose is a single arm — but biased means ready is polled first, and if ready resolves exactly when propose has partially completed an await point, the propose future is cancelled. A safer pattern would be to poll ready only between blocks:

loop {
    if worker.is_ready() { break; }
    sequencer.propose(&mempool, block_time, gas_limit).await?;
}

Comment on lines +52 to +104
use prometheus_parse::Value;

let sum_keys: Vec<String> = self
.execution_metrics
.keys()
.filter(|k| k.ends_with("_sum"))
.cloned()
.collect();

for sum_key in sum_keys {
let base = &sum_key[..sum_key.len() - 4];
let count_key = format!("{base}_count");

let cur_sum = match self.execution_metrics.get(&sum_key).copied() {
Some(v) => v,
None => continue,
};
let cur_count = match self.execution_metrics.get(&count_key).copied() {
Some(v) => v,
None => continue,
};

let prev_sum = self
.prev_prometheus
.get(&sum_key)
.and_then(|s| if let Value::Counter(v) | Value::Gauge(v) | Value::Untyped(v) = s.value { Some(v) } else { None })
.unwrap_or(0.0);
let prev_count = self
.prev_prometheus
.get(&count_key)
.and_then(|s| if let Value::Counter(v) | Value::Gauge(v) | Value::Untyped(v) = s.value { Some(v) } else { None })
.unwrap_or(0.0);

let delta_count = cur_count - prev_count;
if delta_count > 0.0 {
let avg = (cur_sum - prev_sum) / delta_count;
if !avg.is_nan() {
self.execution_metrics.insert(format!("{base}_avg"), avg);
}
}
}
}

/// Update a metric from a new Prometheus sample, computing deltas for
/// Histogram and Summary types (deltaSum / deltaCount).
///
/// NaN results (e.g. from zero delta count) are silently skipped.
pub fn update_prometheus_metric(
&mut self,
name: &str,
current: &prometheus_parse::Sample,
) {
use prometheus_parse::Value;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per project conventions (CLAUDE.md): use prometheus_parse::Value; appears inside function/method bodies at lines 52 and 104. These should be moved to the file-level imports at the top.

jwt_secret: alloy_rpc_types_engine::JwtSecret,
cfg: Arc<RollupConfig>,
) -> Result<Self, BenchmarkError> {
let dummy_l1: Url = "http://127.0.0.1:1".parse().unwrap();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare .unwrap() on a hardcoded URL parse. While this will never fail, the project convention is to avoid .unwrap() in non-test code. Use .expect() with a message:

Suggested change
let dummy_l1: Url = "http://127.0.0.1:1".parse().unwrap();
let dummy_l1: Url = "http://127.0.0.1:1".parse().expect("valid hardcoded dummy L1 URL");

Comment on lines 139 to 142
let mut runner = LoadRunner::new(load_config.clone())?;
runner.set_config_summary(config_summary.clone());

// Install signal handler before any long-running work. First signal sets
// the stop flag so `run()` exits its loop gracefully and the drain sequence
// runs. A second signal force-exits.
let stop_flag = runner.stop_flag();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stop_flag binding on line 142 is now unused between its creation and passing to run_test_phases. Previously, install_signal_handler(stop_flag) was called here — now that's moved inside run_test_phases. This leaves a blank line and the stop_flag binding dangling far from its use as a function argument. Minor, but worth cleaning up:

Suggested change
let mut runner = LoadRunner::new(load_config.clone())?;
runner.set_config_summary(config_summary.clone());
// Install signal handler before any long-running work. First signal sets
// the stop flag so `run()` exits its loop gracefully and the drain sequence
// runs. A second signal force-exits.
let stop_flag = runner.stop_flag();
let mut runner = LoadRunner::new(load_config.clone())?;
runner.set_config_summary(config_summary.clone());
let stop_flag = runner.stop_flag();

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

Review Summary

This PR adds a new base-benchmark crate (~5700 lines) that ports a Go benchmark runner to Rust, plus minor changes to the load-test binary. The overall structure is reasonable — clean module separation, proper error enum with thiserror, and async tokio-based orchestration. However, there are several correctness and safety issues that should be addressed before merge.

Critical Issues

# File Issue
1 metrics/mod.rs:155-158 histogram_sum_count is incorrect — extracts +Inf bucket count as "sum" and last bucket count as "count", making both the same value. Delta averages will always be ~1.0. (existing comment)
2 consensus/mod.rs:231-236 Wall-clock timestamp instead of incremental — uses SystemTime::now() instead of head_block_timestamp + block_time, causing non-reproducible results and potential node rejection. (existing comment)
3 client/mod.rs:309 Builder runs reth binaryBuilderClient uses reth_bin instead of builder_bin for the inner node client. (existing comment)
4 consensus/mod.rs:291-324 batch_send_txs swallows all errors — always returns Ok(()) even if every HTTP request fails, leading to silent empty blocks. (new)
5 runner/mod.rs:222-234 Cancel-unsafe tokio::select! in setup loop — dropping a partially-completed propose future can leave sequencer consensus state inconsistent. (new)

Moderate Issues

# File Issue
6 runner/mod.rs:388-395 .expect() on user-supplied prefund key — panics instead of returning BenchmarkError. (existing)
7 process.rs:63-70 Missing // SAFETY: comment on unsafe block. (existing)
8 proxy/mod.rs:63-68 Only intercepts single JSON-RPC requests; batch arrays bypass mempool. (existing)
9 params.rs:17-21 Batcher/prefund keys don't match PORTING_REFERENCE.md. (existing)
10 runner/mod.rs:259 Redundant head_block_hash reassignment after new_payload. (existing)
11 PORTING_REFERENCE.md Duplicate file at repo root. (existing)
12 load-tests/transfer.rs:31 Changing TransferPayload::default() to zero-value affects all load-test users, not just benchmark. (new)

Convention Issues

# File Issue
13 consensus/mod.rs:26,414-416 use statements inside function bodies. (existing)
14 metrics/mod.rs:52,104 use prometheus_parse::Value inside method bodies. (new)
15 output.rs:195 use std::io::Seek inside function body. (existing)
16 runner/mod.rs:386 use alloy_signer_local::PrivateKeySigner inside function body. (existing)
17 consensus/mod.rs:99 Bare .unwrap() on hardcoded URL parse — should be .expect(). (new)

Notes

  • The load-test change (moving install_signal_handler after funding) makes sense for benchmark integration — the runner reads "Accounts funded." from stdout to detect readiness.
  • The params.rs module is missing a //! doc comment, but it's not a mod.rs so this doesn't violate the specific CLAUDE.md rule.
  • The Cargo.toml dependency ordering doesn't follow the workspace's waterfall style, but this is a new crate so it can be cleaned up separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants