Skip to content

Eiger pallet final#559

Open
illuzen wants to merge 13 commits into
mainfrom
illuzen/eiger-pallet-final
Open

Eiger pallet final#559
illuzen wants to merge 13 commits into
mainfrom
illuzen/eiger-pallet-final

Conversation

@illuzen
Copy link
Copy Markdown
Contributor

@illuzen illuzen commented May 8, 2026

Addressing some final issues:

REVERSIBLE-F-01 — set_high_security
No exit path has been added. recover_funds still does not clear HighSecurityAccounts or GuardianIndex. The team documented this permanence as intentional, so we understand that the risk is accepted.

Added documentation that recover_funds should be callable in perpetuity

REVERSIBLE-F-03 — orphaned holds
recover_funds now skips failed releases instead of aborting, which prevents new orphaned holds from being created. However, holds that were already detached from PendingTransfer metadata still have no repair path. No force-release extrinsic was added. One possible improvement would be to add a root-gated force_release_hold(account, asset, amount) extrinsic that can release holds directly.

This seems to only be possible due to migration, which we are skipping for testnet, as it would add code that wouldn't run on mainnet.

The following informational findings remain unaddressed: TREASURY-O-03, TREASURY-O-04, RUNTIME-O-01

Addressed the first two, the third was already addressed in a previous PR.


Note

High Risk
Touches core runtime pallets for multisig execution, high-security fund recovery, and block reward minting; logic/weight changes can affect funds movement, fee burning, and proposal execution constraints if misconfigured.

Overview
Strengthens runtime safety around multisig proposals/execution, high-security recovery, and block reward minting.

pallet-multisig now validates and decodes proposed calls up front, enforces a configurable MaxInnerCallWeight, recalculates/refunds weights based on actual call size and HS/non-HS path, and re-checks high-security whitelists and weight limits again at execute time (including after HS is enabled or limits change). It also revises proposal accounting/limits and benchmarking to reflect the updated behaviors.

pallet-reversible-transfers updates recover_funds to skip transfers whose hold release fails (emitting TransferRecoveryFailed) instead of aborting, preserving metadata for later manual cancellation; it also adds explicit docs that set_high_security is intentionally permanent and recovery should remain callable indefinitely.

pallet-mining-rewards adjusts end-of-block reward distribution to account for burned fees in supply calculations, introduces a minting “from” account for proof recording, and adds explicit fallback/error events when minting to the miner/treasury fails (including redirect-to-treasury and a critical TreasuryMintFailed signal).

Reviewed by Cursor Bugbot for commit 5866a1f. Bugbot is set up for automated code reviews on this repo. Configure here.

@n13
Copy link
Copy Markdown
Collaborator

n13 commented May 14, 2026

I have enough context now. Let me give you a focused review of PR #559 (Eiger pallet final).

Overview

Small (≈109 / -38), focused finalization PR responding to remaining Eiger audit items across five pallets. Most of the diff is doc improvements; the substantive logic changes are localized and defensible.

File Type of change
pallets/mining-rewards/src/lib.rs New TreasuryMintFailed event in two existing error paths
pallets/multisig/src/lib.rs Re-check inner call weight at execute time (belt-and-suspenders)
pallets/reversible-transfers/src/lib.rs Docs + defensive invariant check + remove 2 unused errors
pallets/scheduler/src/lib.rs ok_ordefensive_ok_or in do_cancel_retry
pallets/treasury/src/lib.rs Documentation only
Two test files Stale comment cleanup

What looks good

  • mining-rewards: TreasuryMintFailed { reward } is deposited in exactly the two spots where the existing log::error! already fired (catastrophic redirect failure and no-miner treasury mint failure). On-chain visibility for a fund-loss scenario that was previously only logged. Clean change, fail-loud compatible.
  • multisig execute re-check: The pre-dispatch weight annotation is WeightInfo::execute(MaxCallSize) + MaxInnerCallWeight. The new check at pallets/multisig/src/lib.rs:1071-1077 enforces current_call_weight <= MaxInnerCallWeight so the post-dispatch fallback info.actual_weight.unwrap_or(current_call_weight) is provably bounded by what was reserved. The scenario this guards — MaxInnerCallWeight lowered via runtime upgrade between propose and execute — is real and worth defending.
  • reversible-transfers recover_funds: The let-else continue + defensive! at pallets/reversible-transfers/src/lib.rs:572-579 is the right shape: storage inconsistency between PendingTransfersBySender and PendingTransfers is a true invariant, not a user error, and emitting defensive! then skipping (rather than aborting the whole call) preserves the "skip-failed, continue-others" property the PR is trying to maintain. Removing RecovererCannotBeSelf / CallDecodingFailed is safe — git grep across the PR branch shows no remaining references.
  • Docs: The added paragraphs on permanence of HS mode + repeat-callable recover_funds (pallets/reversible-transfers/src/lib.rs:360-364, 532-548) and the expanded treasury module docs are a meaningful improvement and directly justify the F-01 risk acceptance.

Issues to consider

1. Scheduler defensive_ok_or covers a user-input path (minor correctness)

	fn do_cancel_retry(
		origin: &T::PalletsOrigin,
		(when, index): TaskAddressOf<T>,
	) -> Result<(), DispatchError> {
		let agenda = Agenda::<T>::get(when);
		// When called via cancel_retry_named, the task address comes from Lookup, so
		// the Agenda entry should exist. Use defensive check for Lookup/Agenda sync.
		let scheduled = agenda
			.get(index as usize)
			.and_then(Option::as_ref)
			.defensive_ok_or(Error::<T>::NotFound)?;

The comment's premise only holds for one of the two callers. cancel_retry_named looks up the address from Lookup (premise valid) — but cancel_retry at pallets/scheduler/src/lib.rs:617 takes a TaskAddressOf<T> directly from the caller. ScheduleOrigin (root/fellowship) supplying a stale or wrong (when, index) is user error, not an invariant violation; defensive_ok_or will produce a misleading "INVARIANT VIOLATION" log line and panic in debug / try-runtime builds for what's a legitimate NotFound.

Suggested fix: either keep .ok_or(Error::<T>::NotFound)? here and move the defensive check inside cancel_retry_named (after Lookup hit, before calling do_cancel_retry), or split into two helpers. The on-chain result is unchanged either way; this is purely about not crying wolf.

2. No tests for the new failure-path events / re-check

  • TreasuryMintFailed, TransferRecoveryFailed, and the CallWeightExceedsLimit-at-execute branch are not exercised by tests (git grep finds them only in the implementing pallets). The multisig case is the most worth covering, since the explicit motivation is a runtime-upgrade scenario — a tiny mock with two different MaxInnerCallWeight values would pin the behavior down.

3. Stale comment in multisig (cosmetic)

			// Calculate actual weight: bookkeeping + inner call's actual weight
			// Use current_call_weight (computed at execute time) as fallback when post-dispatch
			// info is unavailable. This is more accurate than the stored weight since
			// dispatch_info could have changed via runtime upgrade.

If proposal.call_weight is no longer consulted on the execute path, consider noting (here or in ProposalData) that the stored call_weight is now only used at propose-time validation. Not blocking, just keeps reader expectations aligned.

On the audit findings the PR explicitly does not fully close

  • REVERSIBLE-F-01 (no HS exit): permanence is now justified in code docs as protection for future inbound funds — good narrative, but the absence of a force_disable_high_security extrinsic still implies any account that turns this on is permanently restricted. Accepted risk; docs make it visible to users.
  • REVERSIBLE-F-03 (pre-existing orphan holds): no force_release_hold. The PR correctly stops creating new orphans by skipping-not-aborting in recover_funds. Leftover orphans from prior versions remain unreachable — accepted since no testnet→mainnet migration is planned.

Verdict

Approve with two small asks:

  1. Narrow the defensive_ok_or in scheduler::do_cancel_retry to the cancel_retry_named path only.
  2. Add a minimal test for the multisig execute-time CallWeightExceedsLimit branch, since the whole justification for the change is a future runtime upgrade.

Everything else is doc/cleanup and looks fine. Nothing in here changes the funds-critical happy paths; the new code is purely defensive logging and clearer error surfaces.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

Comment thread pallets/multisig/src/lib.rs
@illuzen illuzen requested a review from n13 May 15, 2026 07:15
@n13
Copy link
Copy Markdown
Collaborator

n13 commented May 21, 2026

Now I have everything needed. Here is my comprehensive review:

PR #559 Review — Security & Soundness

Overall this is a careful "audit-cleanup" PR — mostly docs, defensive logging, and an extra weight re-check. Most of it is solid hardening. But there's one critical issue and a few minor ones.


🔴 Critical — Missing storage migration for ProposalData

In pallets/multisig/src/lib.rs, the ProposalData struct's call_weight: Weight field is removed, and STORAGE_VERSION is bumped from 01:

#[derive(Encode, Decode, MaxEncodedLen, Clone, TypeInfo, RuntimeDebug, PartialEq, Eq)]
pub struct ProposalData<AccountId, Balance, BlockNumber, BoundedCall, BoundedApprovals> {
	/// Account that proposed this transaction
	pub proposer: AccountId,
	/// The encoded call to be executed
	pub call: BoundedCall,
	/// Expiry block number
	pub expiry: BlockNumber,
	/// List of accounts that have approved this proposal
	pub approvals: BoundedApprovals,
	/// Deposit held for this proposal (returned only when proposal is removed)
	pub deposit: Balance,
	/// Current status of the proposal
	pub status: ProposalStatus,
}
	/// The in-code storage version.
	///
	/// This establishes an explicit baseline for future storage migrations.
	/// Increment this and add a migration hook when storage layout changes.
	///
	/// Version history:
	/// - 0: Initial version
	/// - 1: Removed `call_weight` field from ProposalData (weight is recomputed at execute time)
	const STORAGE_VERSION: StorageVersion = StorageVersion::new(1);

But no OnRuntimeUpgrade migration is implemented, and the runtime's Executive has an empty migrations tuple:

/// Executive: handles dispatch to the various modules.
pub type Executive = frame_executive::Executive<
	Runtime,
	Block,
	frame_system::ChainContext<Runtime>,
	Runtime,
	AllPalletsWithSystem,
	(),
>;

What will happen on upgrade

Substrate's StorageMap::get swallows SCALE decode failures and returns None after logging an error (Corrupted state at ...):

pub fn get<T: Decode + Sized>(key: &[u8]) -> Option<T> {
    sp_io::storage::get(key).and_then(|val| {
        Decode::decode(&mut &val[..]).map(Some).unwrap_or_else(|e| {
            log::error!(target: "runtime::storage", "Corrupted state at ...", ...);
            None

So for every Proposals entry that was written under the old layout (which contains an extra 16 bytes of call_weight between call and expiry):

  1. Every read via Proposals::<T>::get(...) returns None.
  2. execute, approve, cancel, remove_expired, and claim_deposits all early-exit with ProposalNotFound.
  3. cleanup_expired_proposals_for_signer uses iter_prefix and silently filter_maps the broken entries away.
  4. The proposer's deposit stays reserved in pallet_balances forever — no extrinsic can release it because every release path requires decoding the proposal first.
  5. proposals_per_signer and proposal_nonce in MultisigData are never decremented/cleared either — proposers can be permanently pushed toward TooManySigners.

This is a silent, permanent fund loss for any in-flight proposal at the moment of upgrade — exactly the kind of behavior that violates the project's own "fail early, never silently" principle.

Why "testnet only" is not a defense for this change

The PR description's "we are skipping [migration] for testnet, as it would add code that wouldn't run on mainnet" justification is about REVERSIBLE-F-03 (orphaned holds), not about this storage change. The treasury multisig is configured in genesis on every preset (development, heisenberg, planck), and the treasury workflow is precisely where proposals get created and may be in-flight during a runtime upgrade. Any pending or expired-not-yet-claimed proposal at upgrade time is bricked.

Recommended fixes (pick one)

  • Preferred — write a v0→v1 migration in pallets/multisig/src/migrations/v1.rs that iterates Proposals, decodes as the old ProposalDataV0 (with call_weight), drops the field, and re-inserts. Then wire it into the runtime via frame_support::migrations::VersionedMigration and put it in the Migrations tuple of Executive.
  • If you're truly certain Proposals is empty on every chain being upgraded — at minimum add a try_runtime pre_upgrade check that asserts Proposals::iter_keys().next().is_none(), plus an OnRuntimeUpgrade::on_runtime_upgrade that calls STORAGE_VERSION.put::<Pallet<T>>() and Proposals::<T>::clear(u32::MAX, None) to wipe whatever might exist. Document why this is safe.
  • Defer the field removal until after a network reset, since the actual runtime savings of removing one Weight field per proposal is negligible.

A test that snapshots a v0 Proposals entry and exercises the migration end-to-end should accompany whichever path you choose.


🟡 Medium — Double Agenda::<T>::get(when) in cancel_retry_named

let (when, index) = Lookup::<T>::get(id).ok_or(Error::<T>::NotFound)?;
// Lookup succeeded, so Agenda should have the task. Defensive check for sync.
let agenda = Agenda::<T>::get(when);
agenda
    .get(index as usize)
    .and_then(Option::as_ref)
    .defensive_ok_or(Error::<T>::NotFound)?;
let task = (when, index);
Self::do_cancel_retry(origin.caller(), task)?;

do_cancel_retry immediately performs let agenda = Agenda::<T>::get(when); again internally, so the agenda vec is fetched (and decoded) twice per call. Not a correctness bug, but the per-block agenda can be large, and a redundant storage read on a privileged extrinsic is wasted weight that is not reflected in the WeightInfo.

Two clean options:

  1. Move the defensive check inside do_cancel_retry, right after its existing Agenda::<T>::get(when), e.g. .defensive_ok_or(...) instead of .ok_or(...). Then both cancel_retry and cancel_retry_named benefit and there's no duplicate read.
  2. Make do_cancel_retry accept a (Vec<Option<Scheduled>>, index) so the caller can hand it the already-fetched agenda.

Either avoids the double read while keeping the invariant signal.


🟡 Low — Failed scheduler cancellation in recover_funds can race the scheduler

// Release succeeded - now remove metadata and cancel scheduler
PendingTransfers::<T>::remove(tx_id);

if let Some(id) = Self::make_schedule_id(tx_id).ok() {
    if let Err(e) = T::Scheduler::cancel_named(id) {
        log::warn!(
            "Failed to cancel scheduled task for tx {:?}: {:?} (funds already released)",
            tx_id,
            e
        );
    }
}

num_cancelled = num_cancelled.saturating_add(1);
Self::deposit_event(Event::TransactionCancelled {
    who: who.clone(),
    tx_id: *tx_id,
});

This path is safe because do_execute_transfer re-reads PendingTransfers::<T>::get(tx_id) and returns PendingTxNotFound when the entry has already been removed (so funds can't double-spend). But the cancellation failure is logged at warn and the function still increments num_cancelled and emits TransactionCancelled — both of which now claim "this transfer was cancelled" while the scheduler is going to invoke a now-stale task at delay time.

Consider one of:

  • Emit a distinct event (ScheduledTaskCancellationFailed { tx_id }) so off-chain indexers can flag the dangling scheduler entry.
  • Promote the log::warn! to defensive! for cancel_named failures here, since by construction (we have a valid tx_id whose pending transfer existed a moment ago) this is an invariant violation.
  • Two make_schedule_id(tx_id).ok() — using if let Some(id) = ... .ok() here treats a "schedule id derivation failed" as silently skipping cancellation. make_schedule_id only fails on encoding overflow which is essentially impossible for T::Hash, but if it ever does fail you'll permanently leave a scheduled task in place. Worth a defensive!.

🟢 Things that look good

  • Execute-time weight re-check (pallets/multisig/src/lib.rs:1071-1078) correctly closes the window where a runtime upgrade lowers MaxInnerCallWeight between propose and execute, and the corresponding test execute_rejects_call_when_max_weight_lowered_after_propose exercises it via the new DynamicMaxInnerCallWeight thread-local. Nice.
  • CEI ordering in execute: proposal removal + deposit return happen before call.dispatch, with the inner call running in its own nested storage layer so its failure cannot revert the deposit return. Good.
  • actual_weight.unwrap_or(current_call_weight) is provably bounded by the reserved MaxInnerCallWeight because of the new pre-dispatch check. Weight accounting stays sound.
  • TreasuryMintFailed event in pallets/mining-rewards/src/lib.rs:262-292 adds on-chain visibility to what was previously only log::error!. Aligns with the "fail loudly" principle for a fund-loss scenario.
  • The let-else { defensive!(...); continue; } pattern at pallets/reversible-transfers/src/lib.rs:572-579 is the right shape — invariant violation, not user error, and post-loop retain self-heals the orphaned sender-index entry.
  • Narrowing the defensive check to cancel_retry_named (per the previous review) was the right call.

🟢 Nits

  • README drift: pallets/multisig/README.md:319 still shows call_weight: Weight as a field of ProposalData and lines 489/649/651 describe the old "stored call_weight" semantics. Worth a quick scrub now while it's fresh.
  • Stale comments: The propose block at pallets/multisig/src/lib.rs:585-592 calls let call_weight = decoded_call.get_dispatch_info().call_weight; — perfectly fine, but it's no longer stored. A one-liner clarifying "validated here at propose-time and recomputed at execute-time; not persisted" would help future readers.
  • Test coverage gap: TreasuryMintFailed (mining-rewards) and TransferRecoveryFailed (reversible-transfers) are new events surfaced by changes in this PR but neither is asserted by a test. Given mining-rewards is funds-critical, even a minimal mock test that forces Currency::mint_into to error would pin the behavior.

Summary

Severity Issue
Critical Removing ProposalData::call_weight + STORAGE_VERSION bump without a migration silently bricks any existing proposal and locks deposits forever
Medium cancel_retry_named reads Agenda twice
Low cancel_named failure in recover_funds still emits TransactionCancelled and only log::warn! — consider a distinct event or defensive!
Nit README, stale comment in propose, missing tests for the new failure-path events

The execute-time hardening, CEI ordering, and observability improvements are clean wins. The blocker before merge (or at least before a runtime upgrade that touches a chain with any existing multisig proposals) is the multisig storage migration. Happy to draft the v0→v1 migration if you'd like — it's straightforward.

@n13
Copy link
Copy Markdown
Collaborator

n13 commented May 21, 2026

Follow up on migration - we don't have any proposals... isn't that good enough?

Mostly yes — that removes the immediate "stuck deposits" scenario, but a few subtler concerns remain. Let me walk through them.

What zero proposals actually mitigates

If Proposals is empty on every chain you're upgrading at the moment the upgrade lands:

  • No SCALE decode attempts → no silent None returns
  • No bricked deposits, no inflated proposals_per_signer, no stuck proposal_nonce

So the fund-loss vector goes away. That's the critical part.

What's still a problem even with zero proposals

1. Race window between "we checked" and "upgrade applied"

The treasury multisig is in every preset (planck_treasury_account, heisenberg_treasury_account, dev). Any signer can call Multisig::propose between when you check storage and when the runtime upgrade is enacted. With governance latency (referenda enactment, sudo timing, etc.) that window can be hours to days. Even one proposal in flight = one bricked deposit.

The fix is to make the assumption enforced rather than assumed — see below.

2. In-code vs on-chain storage version will permanently desync

#[pallet::storage_version(...)] sets the in-code version. The on-chain pointer only moves when something calls STORAGE_VERSION.put::<Pallet<T>>(), which normally happens inside OnRuntimeUpgrade::on_runtime_upgrade. With no migration, after this PR ships:

  • In code: 1
  • On-chain: 0 (forever)

This bites you next time you add a v2 migration. The natural pattern is VersionedMigration<1, 2, ...> which gates on on-chain version being 1 — it won't fire, because on-chain is still 0. You'd then have to write a VersionedMigration<0, 2, ...> that handles "could be v0-layout or already-empty-because-v1-shipped" — annoying and error-prone.

3. It normalizes "bump STORAGE_VERSION without a migration"

This is a fragile precedent for a chain that explicitly lists "fail loudly" as a core principle. The next person who removes a field from ProposalData will look at this commit and conclude no migration is needed.

Minimum hygiene I'd ask for before merge

A trivial OnRuntimeUpgrade that:

  1. Asserts emptiness in pre_upgrade (try-runtime CI catches a violated assumption before it hits a real chain).
  2. Bumps the on-chain version so future migrations have a correct baseline.
  3. Defensively clears Proposals in case the assumption is wrong on a chain you didn't check — better to wipe one proposal you missed than to silently brick it.

Something like:

pub struct MigrateV0ToV1<T>(PhantomData<T>);

impl<T: Config> OnRuntimeUpgrade for MigrateV0ToV1<T> {
    fn on_runtime_upgrade() -> Weight {
        let on_chain = Pallet::<T>::on_chain_storage_version();
        if on_chain != 0 {
            return T::DbWeight::get().reads(1);
        }
        let cleared = Proposals::<T>::clear(u32::MAX, None);
        if cleared.unique > 0 {
            log::error!(
                target: "multisig",
                "v0->v1 migration cleared {} proposal(s); deposits remain reserved",
                cleared.unique,
            );
        }
        STORAGE_VERSION.put::<Pallet<T>>();
        T::DbWeight::get().reads_writes(
            cleared.unique as u64 + 1,
            cleared.unique as u64 + 1,
        )
    }

    #[cfg(feature = "try-runtime")]
    fn pre_upgrade() -> Result<Vec<u8>, sp_runtime::TryRuntimeError> {
        ensure!(
            Proposals::<T>::iter().next().is_none(),
            "v0->v1 assumption violated: Proposals is non-empty"
        );
        Ok(Vec::new())
    }
}

And wire it into Executive's migrations tuple (currently ()).

If clearing-on-mismatch feels too aggressive, swap the clear for a defensive! + leave the entries — but then at least the version bump is real and you'll loudly notice the bricked entries instead of silently shipping them.

One thing to double-check on your end

You said "zero proposals on testnet" — single chain or all of them? Worth a 1-minute polkadot.js query against each live network you intend to upgrade (planck, heisenberg, anything internal) of multisig.proposals.entries(). If you can paste the storage query results in the PR description, that closes the loop nicely for reviewers.

So: yes, the empty-state observation downgrades this from "critical, do not merge" to "merge-able with a small no-op migration + a try-runtime guard." The migration is ~30 lines, doesn't add code that won't run (it runs once, on the v0→v1 transition, and is a noop afterward), and keeps you out of the version-drift trap.

@n13
Copy link
Copy Markdown
Collaborator

n13 commented May 21, 2026

I think this looks good after we make sure we don't bork the testnet msig with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants