feat: drop guards and cancellation wrappers to "fix" stuck pubkeys by thlorenz · Pull Request #1172 · magicblock-labs/magicblock-validator

thlorenz · 2026-04-30T09:37:58Z

Summary

Make repeated per-pubkey coordination cancellation-safe by adding guard-based cleanup and stale-entry recovery to both the outer pending_requests queue (FetchCloner) and inner fetching_accounts queue (RemoteAccountProvider). This prevents pubkeys from getting stuck when owner tasks are cancelled during subscription setup or due to edge cases.

This should fix the issue we saw where specific pubkeys time out to get fetched via
getAccountInfo().

Details

This implementation adds the "smallest effective change first" cancellation-safety rollout, addressing the core race condition where dropped owner tasks leave stale entries in coordination maps that block subsequent operations on the same pubkey.

FetchCloner Cancellation Safety (fetch_cloner)

New Module: pending_request_guard.rs

Explicit ownership model via PendingRequestGuard for the outer dedup queue
On drop, removes stale entry and notifies waiters with PendingRequestCompletion::Failed
Prevents one missed edge case from wedging a pubkey forever

Changes to fetch_cloner/mod.rs

claim_pending_request() now:
- Detects stale entries (>15s old)
- Evicts stale entries and notifies waiters with explicit error
- Creates fresh owner entry for new caller
- Returns PendingRequestClaim::Owner or PendingRequestClaim::Waiter
finish_pending_request() cleans up entries on successful completion
fetch_and_clone_accounts_with_dedup() rewritten to:
- Track owner guards during fetch
- Dismiss guards after fetch task starts (transfer ownership)
- Propagate errors from waiters instead of silently continuing

RemoteAccountProvider Cancellation Safety (remote_account_provider)

Changes to remote_account_provider/mod.rs

Existing FetchingAccountGuard now with explicit Drop cleanup
try_get_multi() insertion loop now:
- Detects stale fetching_accounts entries (>15s old)
- Evicts stale entries and notifies waiters with AccountResolutionsFailed
- Creates fresh entry with guard for new caller
- Guards transferred to fetch task on dismissal
Lifecycle logging added:
- Debug log on ownership transfer to fetch task
- Warn log on guard drop after owner cancellation

Observability

Both modules now log lifecycle events:

Debug logs: Ownership transfer from caller to background fetch task
Debug logs: Pending request cleanup on normal completion
Warn logs: Guard drop cleanup after owner cancellation
Warn logs: Stale entry eviction with age, pubkey, and waiter count

Summary by CodeRabbit

New Features
- Support for custom string-based error messages.
- Improved in-flight request deduplication with owner/waiter coordination and stale-request eviction.
- Result objects are now cloneable for safer broadcasting.
Bug Fixes
- Waiters are reliably notified on owner cancellation/timeouts to avoid stuck requests.
- Subscription setup rollback now confines to only newly created subscriptions on partial failures.
Tests
- Extensive async and integration tests covering lifecycles, timeouts, eviction, reconciliation, LRU behavior, and subscription control.
Chores
- Ignore prompts/ directory.

coderabbitai · 2026-04-30T09:38:06Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds ChainlinkError::Custom(String). Replaces pending-fetch dedup in the fetch-cloner with generation-based PendingRequestState, PendingRequestGuard, PendingRequestCompletion, per-waiter oneshot coordination, PENDING_REQUEST_STALE_AFTER/PENDING_REQUEST_TIMEOUT, stale eviction, and owner-vs-waiter claim/finish semantics; refactors fetch_and_clone_accounts_with_dedup and normalizes subscribe_to_account success mapping. In remote-account-provider, introduces FetchingAccountState/FetchingAccountGuard, 30s staleness eviction, generation-aware ownership and rollback-aware SubscribeResult/unsubscribe logic. FetchAndCloneResult now derives Clone. Adds many unit and integration tests and test helpers, plus instrumentation in ChainPubsubClientMock.

Suggested reviewers

GabrielePicco
bmuddha

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch thlorenz/fix-stuck-pubkeys

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get your free trial and get 200 agent minutes per Slack user (a $50 value).

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-30T09:38:07Z

Manual Deploy Available

You can trigger a manual deploy of this PR branch to testnet:

Deploy to Testnet 🚀

Alternative: Comment /deploy on this PR to trigger deployment directly.

⚠️ Note: Manual deploy requires authorization. Only authorized users can trigger deployments.

Comment updated automatically when the PR is synchronized.

Amp-Thread-ID: https://ampcode.com/threads/T-019ddd97-fbe9-70e6-9bc5-62c631b63536 Co-authored-by: Amp <amp@ampcode.com>

Amp-Thread-ID: https://ampcode.com/threads/T-019ddda0-55b3-7629-bddf-22a807f099d0 Co-authored-by: Amp <amp@ampcode.com>

Amp-Thread-ID: https://ampcode.com/threads/T-019ddda2-a5d6-7068-afa5-688b3cb44c61 Co-authored-by: Amp <amp@ampcode.com>

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

magicblock-chainlink/src/remote_account_provider/mod.rs (1)
971-993: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Rollback currently tears down pre-existing subscriptions.

setup_subscriptions() treats every Ok(()) as a newly created subscription, but subscribe() also returns Ok(()) for pubkeys that were already watched and only got LRU-promoted. On a partial failure, this loop unsubscribes those long-lived watches too, which can drop unrelated updates. Track only subscriptions created by this call, or have subscribe() return Created vs AlreadyWatching.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@magicblock-chainlink/src/remote_account_provider/mod.rs` around lines 971 -
993, setup_subscriptions() currently treats every Ok(()) from subscribe() as a
newly created watch and then unsubscribes all succeeded entries on partial
failure, which tears down pre-existing subscriptions; change the behavior so
only subscriptions actually created by this call are rolled back: either modify
subscribe() to return a discriminated result (e.g., an enum like
SubscribeResult::Created | SubscribeResult::AlreadyWatching) and treat only
Created as eligible for unsubscribe in the error cleanup loop, or have
setup_subscriptions() track which pubkeys were newly created (via a boolean flag
returned or a separate created list) and call unsubscribe() only for those
entries; adjust matching/code around unsubscribe and RemoteAccountProviderError
handling accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs`:
- Around line 34-35: The 15s hardcoded PENDING_REQUEST_STALE_AFTER can evict
healthy in-flight fetches; change it so the dedup layer doesn't mark a live
fetch stale after a short fixed interval: replace the constant
PENDING_REQUEST_STALE_AFTER = Duration::from_secs(15) with a configurable
timeout (e.g., read from config/env or derived from the remote fetch timeout) or
increase it to a safe default (e.g., 2+ minutes), and update the dedup staleness
checks that reference PENDING_REQUEST_STALE_AFTER to use the new configurable
value (or to consult request progress/status before evicting) so live
fetch+clone operations aren't prematurely considered stale.
- Around line 1923-1941: The 5s timeout in the waiter loop is too short and
causes spurious ChainlinkError::Custom("timeout waiting...") failures; update
the timeout used in the tokio::time::timeout call (replace
Duration::from_secs(5)) to a much longer, configurable value (e.g.,
PENDING_REQUEST_TIMEOUT = Duration::from_secs(60) or higher) or remove the
timeout to await the receiver indefinitely; adjust the await_pending loop where
tokio::time::timeout(Duration::from_secs(5), receiver).await is invoked and
ensure the new constant or config is referenced so callers can tune it if needed
(keep handling of PendingRequestCompletion variants unchanged).

In `@magicblock-chainlink/src/chainlink/fetch_cloner/pending_request_guard.rs`:
- Around line 13-17: PendingRequestCompletion::Success currently carries no
payload, which discards the owner's FetchAndCloneResult metadata; change the
enum so Success holds the owner's FetchAndCloneResult (e.g.,
Success(FetchAndCloneResult)), then update all sites that construct or match on
PendingRequestCompletion (in pending_request_guard.rs and any completion/notify
helpers) to pass through and propagate that FetchAndCloneResult, and update
fetch_and_clone_accounts_with_dedup() waiter-handling to extract and return the
carried FetchAndCloneResult instead of an empty result so waiters observe
not_found_on_chain / missing_delegation_record metadata.

In `@magicblock-chainlink/src/chainlink/fetch_cloner/tests.rs`:
- Around line 3947-3963: The sleep is racy because it doesn't ensure task_waiter
registered as a waiter in fetch_cloner before aborting task_owner; replace the
fixed sleep with a short polling loop (with an overall timeout) that queries
fetch_cloner.pending_requests for the entry keyed by account_pubkey and verifies
the waiter count/state indicates the waiter is registered, then only call
task_owner.abort(); use symbols fetch_cloner, task_waiter,
fetch_and_clone_accounts_with_dedup, pending_requests and task_owner.abort in
the check and fail the test if the timeout elapses without observing the waiter.

In `@magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs`:
- Around line 411-418: The loop in wait_for_subscribe_attempts registers the
notified() future after checking subscribe_attempts, which can race; change the
pattern in wait_for_subscribe_attempts to create the waiter first (let notified
= self.subscribe_notify.notified()), then check
self.subscribe_attempts.load(AtomicOrdering::SeqCst) and only await the
previously created notified if the condition still isn't met; apply the same
change to the other similar loop that uses subscribe_attempts and
subscribe_notify so the waiter is always registered before re-checking the
condition.

In `@magicblock-chainlink/src/remote_account_provider/mod.rs`:
- Line 86: The stale cutoff is too short (const FETCHING_ACCOUNT_STALE_AFTER)
and can evict a still-active fetch() owner because fetch() can run ~24s (10 * 2s
timeouts + 9 * 400ms backoffs); increase FETCHING_ACCOUNT_STALE_AFTER to exceed
that retry budget (e.g., set to Duration::from_secs(30) or at least 25) so
in-flight owners in the fetching_accounts path are not prematurely considered
stale and evicted.
- Around line 237-245: The Drop implementation for FetchingAccountGuard
currently calls self.fetching_accounts.lock().unwrap(), which can panic if the
mutex is poisoned; change it to handle the Result from lock() safely (match or
map_or_else) so the drop path never panics: attempt to acquire the mutex with
self.fetching_accounts.lock() and on Ok(guard) remove the entry with
guard.remove(&self.pubkey), and on Err(poisoned) obtain the inner guard via
poisoned.into_inner() or otherwise handle the poisoned case (log the poison and
still attempt to remove the waiter) so the waiter notification path still runs;
reference FetchingAccountGuard::drop, fetching_accounts, and pubkey when
applying the fix.

In `@magicblock-chainlink/src/remote_account_provider/tests.rs`:
- Around line 40-60: The test drops the forward receiver `_forward_rx` before
returning from setup, which can cause sends on `forward_tx` to fail; modify the
setup/init function (where `forward_tx` and `_forward_rx` are created—e.g., in
`setup_provider()` or `init_remote_account_provider()`) to store the receiver on
`ProviderTestCtx` (add a field like `forward_rx` or `_fwd_rx`) and return that
context so the receiver lives for the full test lifetime; update
`ProviderTestCtx` and call sites to accept/ignore the stored receiver but keep
it alive.
- Around line 196-220: The test races because it sleeps a fixed 50ms before
aborting first_task_handle; instead, poll provider.fetching_accounts until the
entry for the pubkey has a non-empty waiters list to ensure second_task_handle
has registered as a waiter before aborting the owner. Locate the test code
around try_get_multi/second_task_handle and replace the fixed sleep with a
short-loop that reads provider.fetching_accounts (or its relevant map), breaks
when waiters.len() > 0 for that pubkey (with a timeout to avoid hangs), then
abort first_task_handle and proceed to _pubsub_client.release_subscribe().

---

Outside diff comments:
In `@magicblock-chainlink/src/remote_account_provider/mod.rs`:
- Around line 971-993: setup_subscriptions() currently treats every Ok(()) from
subscribe() as a newly created watch and then unsubscribes all succeeded entries
on partial failure, which tears down pre-existing subscriptions; change the
behavior so only subscriptions actually created by this call are rolled back:
either modify subscribe() to return a discriminated result (e.g., an enum like
SubscribeResult::Created | SubscribeResult::AlreadyWatching) and treat only
Created as eligible for unsubscribe in the error cleanup loop, or have
setup_subscriptions() track which pubkeys were newly created (via a boolean flag
returned or a separate created list) and call unsubscribe() only for those
entries; adjust matching/code around unsubscribe and RemoteAccountProviderError
handling accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: dc3c1b11-b8f0-472f-919e-9f1a8bc5e8a7

📥 Commits

Reviewing files that changed from the base of the PR and between 9d821a3 and c0142be.

📒 Files selected for processing (7)

magicblock-chainlink/src/chainlink/errors.rs
magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs
magicblock-chainlink/src/chainlink/fetch_cloner/pending_request_guard.rs
magicblock-chainlink/src/chainlink/fetch_cloner/tests.rs
magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs
magicblock-chainlink/src/remote_account_provider/mod.rs
magicblock-chainlink/src/remote_account_provider/tests.rs

Amp-Thread-ID: https://ampcode.com/threads/T-019de167-5f93-7236-80c7-dce5dcf9c9df Co-authored-by: Amp <amp@ampcode.com>

Amp-Thread-ID: https://ampcode.com/threads/T-019de16a-1b5d-7651-82ca-46258254a983 Co-authored-by: Amp <amp@ampcode.com>

Amp-Thread-ID: https://ampcode.com/threads/T-019de16c-3929-737d-915b-e9411e64d39e Co-authored-by: Amp <amp@ampcode.com>

Amp-Thread-ID: https://ampcode.com/threads/T-019de16f-9f67-752d-bca6-d9a903e2837d Co-authored-by: Amp <amp@ampcode.com>

Amp-Thread-ID: https://ampcode.com/threads/T-019de171-b0ae-75da-b665-b039a04aa511 Co-authored-by: Amp <amp@ampcode.com>

Amp-Thread-ID: https://ampcode.com/threads/T-019de173-d5f3-70cd-94ed-a99bf05e9dfa Co-authored-by: Amp <amp@ampcode.com>

Amp-Thread-ID: https://ampcode.com/threads/T-019de175-944e-76c9-a17c-5d1716543766 Co-authored-by: Amp <amp@ampcode.com>

Amp-Thread-ID: https://ampcode.com/threads/T-019de177-33bd-70ea-87c4-882d939a0c6c Co-authored-by: Amp <amp@ampcode.com>

Amp-Thread-ID: https://ampcode.com/threads/T-019de179-29ef-70d9-a5a3-3056fc404db0 Co-authored-by: Amp <amp@ampcode.com>

Amp-Thread-ID: https://ampcode.com/threads/T-019de17a-9f6f-7516-8bda-94558b59504f Co-authored-by: Amp <amp@ampcode.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

magicblock-chainlink/src/remote_account_provider/mod.rs (1)
803-890: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Only claimed pubkeys should subscribe and fetch.

Lines 874-890 still make every caller run setup_subscriptions() and fetch() for the full pubkeys list, even when this call only joined an existing fetching_accounts entry as a waiter. That means a waiter can win the race and resolve all receivers with its own mark_empty_if_not_found / fetch_start_slot / program_ids, or roll back a subscription it created even though another in-flight owner depends on it.

Restrict owner side effects to the pubkeys inserted or stale-replaced by this call; waiter-only pubkeys should just await their oneshot result.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@magicblock-chainlink/src/remote_account_provider/mod.rs` around lines 803 -
890, The bug is that setup_subscriptions() and fetch() are being called for all
pubkeys even when this call only joined as a waiter; change the loop around
fetching.entry to track which pubkeys this call actually inserted or replaced
(e.g., collect claimed_pubkeys and only push their (pubkey, receiver) into
subscription_overrides and create FetchingAccountGuard for them), while for
waiter-only pubkeys only register the receiver to be awaited (do not add to
subscription_overrides or owner_guards); then call
self.setup_subscriptions(&subscription_overrides).await? and
self.fetch(claimed_pubkeys, ...) using only the claimed_pubkeys so side effects
(setup_subscriptions, fetch, dismiss via owner_guards) are restricted to entries
this call owns.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs`:
- Around line 294-344: The stale-entry replacement is unsafe because ownership
is only keyed by Pubkey; add a per-entry generation token to PendingRequestState
(e.g., a monotonic u64 or reuse created_at as a unique token) and store that
token in the PendingRequestGuard when you create Owner in claim_pending_request
(both the vacant and stale-replace paths). Update finish_pending_request (and
the PendingRequestGuard Drop) to check the map entry for the given pubkey and
only remove/complete it when the stored token matches the current map entry's
token; when evicting/replacing, increment/set a new token so old guards no
longer match. Apply the same token-checking pattern to the other path mentioned
(the code around 357-375) so removes only occur when the entry identity matches
the guard's token.

---

Outside diff comments:
In `@magicblock-chainlink/src/remote_account_provider/mod.rs`:
- Around line 803-890: The bug is that setup_subscriptions() and fetch() are
being called for all pubkeys even when this call only joined as a waiter; change
the loop around fetching.entry to track which pubkeys this call actually
inserted or replaced (e.g., collect claimed_pubkeys and only push their (pubkey,
receiver) into subscription_overrides and create FetchingAccountGuard for them),
while for waiter-only pubkeys only register the receiver to be awaited (do not
add to subscription_overrides or owner_guards); then call
self.setup_subscriptions(&subscription_overrides).await? and
self.fetch(claimed_pubkeys, ...) using only the claimed_pubkeys so side effects
(setup_subscriptions, fetch, dismiss via owner_guards) are restricted to entries
this call owns.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 18a748ab-7401-4b28-8384-3bd34f6a80b6

📥 Commits

Reviewing files that changed from the base of the PR and between c0142be and 888a015.

📒 Files selected for processing (7)

magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs
magicblock-chainlink/src/chainlink/fetch_cloner/pending_request_guard.rs
magicblock-chainlink/src/chainlink/fetch_cloner/tests.rs
magicblock-chainlink/src/chainlink/fetch_cloner/types.rs
magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs
magicblock-chainlink/src/remote_account_provider/mod.rs
magicblock-chainlink/src/remote_account_provider/tests.rs

Amp-Thread-ID: https://ampcode.com/threads/T-019de1a4-82d5-71d9-aab2-b6550a41c062 Co-authored-by: Amp <amp@ampcode.com>

Add waiter reconciliation check to ensure accounts are in valid terminal state before accepting owner's result. If not, perform fresh fetch for that waiter pubkey only, making ownership/waiter race deterministic. Amp-Thread-ID: https://ampcode.com/threads/T-019de1e0-a654-777c-bdd6-264e5b7c7d53 Co-authored-by: Amp <amp@ampcode.com>

Amp-Thread-ID: https://ampcode.com/threads/T-019de1e3-72f7-74a0-81eb-0a1cb30ef122 Co-authored-by: Amp <amp@ampcode.com>

Adds two integration tests in magicblock-chainlink/tests/09_waiter_reconciliation_race.rs: 1. test_waiter_reconciliation_detects_valid_delegated_state() - Verifies that when an account is pre-populated in the bank as delegated at the current slot, multiple concurrent fetch requests all succeed with the valid terminal state accepted. 2. test_multiple_concurrent_requests_with_valid_delegated_state() - Verifies that three concurrent requests on the same pre-populated delegated account all succeed without unnecessary fresh fetches. These tests exercise the full system-level behavior of waiter_reconciliation_check() from Step 1, demonstrating how concurrent tasks interact with the deduplication logic and the reconciliation check ensures valid terminal states are accepted. Tests use the TestContext integration test harness with mocked RPC client, pubsub client, and accounts bank, following the pattern established by existing integration tests in the chainlink/tests directory. Amp-Thread-ID: https://ampcode.com/threads/T-019de1ea-b5a6-742a-b14f-135943ef0b13 Co-authored-by: Amp <amp@ampcode.com>

superseded

GabrielePicco

I do not think this should land in its current shape. The PR improves some abort paths, but the production stuck-pubkey failure mode is still mostly handled by passive stale timers. The main issue is that ownership is still tied to caller futures and unbounded awaits, rather than to an owned operation with a hard deadline and guaranteed completion.

The current 30s/60s/120s values are too high, but lowering them alone is not the right fix. A pending pubkey operation should be an owned task with a total budget, cancellation token, generation, and shared completion channel. The owner task should always remove the pending entry and notify waiters with Ok/Err/TimedOut. Stale eviction should be a diagnostic fallback, not a correctness mechanism.

I also think the dedup key is too weak: requests are deduped by pubkey even though mark_empty_if_not_found, min_context_slot/slot, (e.g. incoming request with higher min_context_slot) and fetch semantics affect the result (this may require a huge refactoring in the cloning, so I think we can accept pubkey as keys for now).

Instead of caller-owned pending entries plus stale timers, I believe a better approach is to:

Insert Pending { generation, deadline, waiters, cancel }.
Spawn exactly one owner task for that operation.
Run the full lifecycle under timeout(total_budget, ...).
The owner task always removes {pubkey, generation} and sends a terminal result to every waiter.
Waiters only await shared completion; their cancellation removes their waiter, not the owner.
Manage subscriptions with per-pubkey state/refcount, not last-writer rollback tokens.

This design makes completion active and bounded by construction.

* master: chore(ci): cut integration test wall-clock (#1178)

thlorenz requested a review from GabrielePicco April 30, 2026 09:38

thlorenz and others added 5 commits April 30, 2026 18:24

feat: Make FetchCloner.pending_requests cancellation-safe

6c3e95d

test: Add outer-queue cancellation regression tests

97df8b3

Amp-Thread-ID: https://ampcode.com/threads/T-019ddd97-fbe9-70e6-9bc5-62c631b63536 Co-authored-by: Amp <amp@ampcode.com>

feat: add fetching_accounts cancellation safety

7501510

Amp-Thread-ID: https://ampcode.com/threads/T-019ddda0-55b3-7629-bddf-22a807f099d0 Co-authored-by: Amp <amp@ampcode.com>

test: add inner-queue cancellation regression tests

19a43b3

Amp-Thread-ID: https://ampcode.com/threads/T-019ddda2-a5d6-7068-afa5-688b3cb44c61 Co-authored-by: Amp <amp@ampcode.com>

feat: add stale-entry recovery and observability to both queues

c0142be

thlorenz force-pushed the thlorenz/fix-stuck-pubkeys branch from 308a465 to c0142be Compare April 30, 2026 11:28

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

thlorenz and others added 10 commits May 1, 2026 09:41

fix: raise PENDING_REQUEST_STALE_AFTER to 2 minutes

260e437

Amp-Thread-ID: https://ampcode.com/threads/T-019de167-5f93-7236-80c7-dce5dcf9c9df Co-authored-by: Amp <amp@ampcode.com>

fix: increase pending request waiter timeout to 60s

7a3ed62

Amp-Thread-ID: https://ampcode.com/threads/T-019de16a-1b5d-7651-82ca-46258254a983 Co-authored-by: Amp <amp@ampcode.com>

fix: carry FetchAndCloneResult on dedup Success

acddef3

Amp-Thread-ID: https://ampcode.com/threads/T-019de16c-3929-737d-915b-e9411e64d39e Co-authored-by: Amp <amp@ampcode.com>

test: poll waiter registration before aborting owner

fe61992

Amp-Thread-ID: https://ampcode.com/threads/T-019de16f-9f67-752d-bca6-d9a903e2837d Co-authored-by: Amp <amp@ampcode.com>

fix: avoid lost notify race in subscribe_notify loops

d87a073

Amp-Thread-ID: https://ampcode.com/threads/T-019de171-b0ae-75da-b665-b039a04aa511 Co-authored-by: Amp <amp@ampcode.com>

fix: raise FETCHING_ACCOUNT_STALE_AFTER to 30s

ba4ede9

Amp-Thread-ID: https://ampcode.com/threads/T-019de173-d5f3-70cd-94ed-a99bf05e9dfa Co-authored-by: Amp <amp@ampcode.com>

fix: poison-safe FetchingAccountGuard drop

1d559aa

Amp-Thread-ID: https://ampcode.com/threads/T-019de175-944e-76c9-a17c-5d1716543766 Co-authored-by: Amp <amp@ampcode.com>

test: keep forward_rx alive on ProviderTestCtx

c51d73b

Amp-Thread-ID: https://ampcode.com/threads/T-019de177-33bd-70ea-87c4-882d939a0c6c Co-authored-by: Amp <amp@ampcode.com>

test: poll waiter registration before owner abort

1343e12

Amp-Thread-ID: https://ampcode.com/threads/T-019de179-29ef-70d9-a5a3-3056fc404db0 Co-authored-by: Amp <amp@ampcode.com>

fix: only roll back created subscriptions

888a015

Amp-Thread-ID: https://ampcode.com/threads/T-019de17a-9f6f-7516-8bda-94558b59504f Co-authored-by: Amp <amp@ampcode.com>

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

Comment thread magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs Outdated

thlorenz and others added 9 commits May 1, 2026 10:42

fix: guard pending request generations

3ae81cc

fix: scope try_get_multi work to claimed pubkeys

8b930f7

Amp-Thread-ID: https://ampcode.com/threads/T-019de1a4-82d5-71d9-aab2-b6550a41c062 Co-authored-by: Amp <amp@ampcode.com>

test: consolidate remote account provider tests

01e00bb

test: move remote account provider lru tests

4776842

chore: ignore prompts

70fa556

test: add regression test for waiter reconciliation in fetch_cloner

b6aca86

Amp-Thread-ID: https://ampcode.com/threads/T-019de1e3-72f7-74a0-81eb-0a1cb30ef122 Co-authored-by: Amp <amp@ampcode.com>

chore: fmt

284c22b

This comment was marked as outdated.

Sign in to view

GabrielePicco previously requested changes May 6, 2026

View reviewed changes

GabrielePicco requested changes May 6, 2026

View reviewed changes

thlorenz added 9 commits May 7, 2026 17:15

fix(chainlink): add owned pending operation module

455af15

fix(fetch-cloner): switch dedup to owned per-pubkey operations

e9a3eec

test(fetch-cloner): add owned operation coverage

8644ad7

fix(remote-account-provider): add subscription reason ownership

16b4214

fix(fetch-cloner): convert subscription flow to reason ownership

6800489

fix(fetch-cloner): delete obsolete pending request machinery

03254c9

fix(remote-account-provider): remove rollback-token apis

50e82b0

fix(remote-account-provider): remove stale fetching-account ownership

3dc750f

test(chainlink): add final stuck-pubkey audit coverage

c7f1457

thlorenz marked this pull request as draft May 7, 2026 10:32

thlorenz added 15 commits May 8, 2026 11:32

chore: renaming release/acquire subscription

7edac50

chore: remove dead code suppressions

44ff16d

fix: centralize program data release

9ca2f52

chore: drop unused fetch deadline

5ece91c

test: rename delegated fetch race

e7ce5ae

fix: fmt

8f8826d

Merge branch 'master' into thlorenz/fix-stuck-pubkeys

eb7a838

* master: chore(ci): cut integration test wall-clock (#1178)

fix: serialize subscription acquire release

1df65a5

feat: release all subscription reason refs

b6a7da6

fix: clean delegated subscriptions

22f0bed

feat: add pending owner deadline handles

d7a6b9f

feat: expose pending cancellation APIs

63b64a7

test: add pending deadline handle coverage

46287c4

test: add pending cancellation integration coverage

c5c07ae

test(chainlink): add bulk pending cancellation coverage

586d8b8

Conversation

thlorenz commented Apr 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

FetchCloner Cancellation Safety (fetch_cloner)

RemoteAccountProvider Cancellation Safety (remote_account_provider)

Observability

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Suggested reviewers

Uh oh!

github-actions Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Manual Deploy Available

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

GabrielePicco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thlorenz commented Apr 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading

github-actions Bot commented Apr 30, 2026 •

edited

Loading