Use BestBlock for chain state serialization (and somewhat parallelize init)#4266
Use BestBlock for chain state serialization (and somewhat parallelize init)#4266TheBlueMatt wants to merge 12 commits intolightningdevkit:mainfrom
BestBlock for chain state serialization (and somewhat parallelize init)#4266Conversation
|
👋 I see @jkczyz was un-assigned. |
9fc766e to
4efd4f8
Compare
|
Oops, I forgot about this PR lol. Rebased it should be ready for review! |
4efd4f8 to
f54a087
Compare
joostjager
left a comment
There was a problem hiding this comment.
No major findings, although this part of the code is completely new to me, so I might miss things.
The PR could have been split in two. One with the reorg fix, and another one with the perf optimizations. They are closely related though.
Final comment is whether BestBlock is now not descriptive enough anymore.
| /// | ||
| /// These ensure we can find the fork point of a reorg if our block source no longer has the | ||
| /// previous best tip after a restart. | ||
| pub previous_blocks: [Option<BlockHash>; ANTI_REORG_DELAY as usize * 2], |
There was a problem hiding this comment.
This const really can't be changed, because it would break serialization. Does that need to be noted here?
There was a problem hiding this comment.
ANTI_REORG_DELAY*2 shows up in various locations, maybe make it a const too
There was a problem hiding this comment.
Indeed. I noted this by simply dropping the constant reference in the serialization logic so that any constant change will be a compilation failure.
There was a problem hiding this comment.
The ANTI_REORG_DELAY as usize * 2 is still present though multiple times.
| for hash_opt in self { | ||
| match hash_opt { | ||
| Some(hash) => hash.write(w)?, | ||
| None => ([0u8; 32]).write(w)?, |
There was a problem hiding this comment.
Is there a reason to not use the standard Option serialization and go with the magic all-zeroes instead?
There was a problem hiding this comment.
Less read branching felt worth it given something had to be implemented anyway it seemed simpler.
There was a problem hiding this comment.
I'd go for the standard solution here. Interested to hear 2nd reviewer opinion.
There was a problem hiding this comment.
The "standard solution" here just being to read/write the Option<[u8; 32]> directly in the loop rather than matching? Not sure its worth more branching to save 5 LoC. In general, stream-based deserialization logic is super branch-intensive and generally kinda slow because of it.
There was a problem hiding this comment.
I mean avoiding the magic value. Not a blocker, but I do think that it are those small things that influence reader overhead.
| #[cfg(not(c_bindings))] | ||
| use lightning::util::async_poll::MaybeSend; | ||
| use lightning::util::logger::Logger; | ||
| #[cfg(not(c_bindings))] |
| @@ -0,0 +1 @@ | |||
| ../../lightning/src/util/async_poll.rs No newline at end of file | |||
There was a problem hiding this comment.
Does this symlinking work on Windows? I can't believe that it is necessary to use this hack. I think it can be called a hack?
There was a problem hiding this comment.
AFAIU yes it does. git used to default to not allowing it but i think that has since changed. We already have other symlinks in the repo for our lockorder stuff and maybe crypto.
There was a problem hiding this comment.
Saw that indeed. I don't think it is ideal, but apparently that's what rust wants you to do...
| .connect_blocks(common_ancestor, most_connected_blocks, &mut chain_poller) | ||
| .await | ||
| .map_err(|(e, _)| e)?; | ||
| while !most_connected_blocks.is_empty() { |
There was a problem hiding this comment.
Isn't the parallel fetching useful in ChainNotifier.connect_blocks too?
There was a problem hiding this comment.
Shouldn't be all that useful/important. At runtime we should only get a block every ~ten minutes so having two come in at once such that we need to parallel fetch should be pretty rare. Technically on startup we may not do a full sync and may just sync to a common tip rather than the current tip, making the first sync slower, but (a) no one does that today afaik and (b) connecting blocks marginally more slowly at runtime/immediately after startup isn't really a problem, and may even be a good thing by reserving CPU cycles for other tasks.
|
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
f54a087 to
6bd7c74
Compare
|
✅ Added second reviewer: @jkczyz |
|
🔔 1st Reminder Hey @jkczyz! This PR has been waiting for your review. |
1 similar comment
|
🔔 1st Reminder Hey @jkczyz! This PR has been waiting for your review. |
|
🔔 2nd Reminder Hey @jkczyz! This PR has been waiting for your review. |
1 similar comment
|
🔔 2nd Reminder Hey @jkczyz! This PR has been waiting for your review. |
|
🔔 3rd Reminder Hey @jkczyz! This PR has been waiting for your review. |
1 similar comment
|
🔔 3rd Reminder Hey @jkczyz! This PR has been waiting for your review. |
|
🔔 4th Reminder Hey @jkczyz! This PR has been waiting for your review. |
1 similar comment
|
🔔 4th Reminder Hey @jkczyz! This PR has been waiting for your review. |
|
🔔 5th Reminder Hey @jkczyz! This PR has been waiting for your review. |
1 similar comment
|
🔔 5th Reminder Hey @jkczyz! This PR has been waiting for your review. |
6bd7c74 to
cf994a3
Compare
|
Rebased and added an msrv fix. @jkczyz if you don't have time to look at this PR can you request a different reviewer? |
a8beeae to
2454f0c
Compare
| .find_difference_from_best_block(best_header, old_best_block, &mut chain_poller) | ||
| .await?; | ||
| if difference.common_ancestor.block_hash != old_best_block.block_hash { | ||
| chain_notifier.disconnect_blocks(difference.common_ancestor); |
There was a problem hiding this comment.
When find_difference_from_best_block resolved a fallback from previous_blocks (because the block source couldn't find old_best_block.block_hash), the common_ancestor could be the fallback block itself (if it's on the best chain). In that case, disconnect_blocks is called with a BestBlock::new(...) that has empty previous_blocks. When the listener (e.g., ChannelMonitor) stores this as its best_block, all chain history below the fork point is lost.
After the subsequent block_connected calls rebuild the history via advance(), the previous_blocks will only cover the fork_point onward. If the node is serialized at this point and restarted later with the block source again unable to find some blocks, the shallower history may reduce resilience.
This is a design limitation (the ValidatedBlockHeader at the fork point doesn't carry chain history), but worth noting.
a85b089 to
07801d9
Compare
07801d9 to
668d8f2
Compare
The deserialization of `ChannelMonitor`, `ChannelManager`, and `OutputSweeper` is implemented for a `(BlockHash, ...)` pair rather than on the object itself. This ensures developers are pushed to think about initial chain sync after deserialization and provides the latest chain sync state conviniently at deserialization-time. In the previous commit we started storing additional recent block hashes in `BestBlock` for use during initial sync to ensure we can handle reorgs while offline if the chain source loses the reorged-out blocks. Here, we move the deserialization routines to be on a `(BestBlock, ...)` pair instead of `(BlockHash, ...)`, providing access to those recent block hashes at deserialization-time.
In 403dc1a we converted the `Listen` disconnect semantics to only pass the fork point, rather than each block being disconnected. We did not, however, update the semantics of `lightning-block-sync`'s `Cache` to reduce patch size. Here we go ahead and do so, dropping `ChainDifference::disconnected_blocks` as well as its no longer needed.
On restart, LDK expects the chain to be replayed starting from where it was when objects were last serialized. This is fine in the normal case, but if there was a reorg and the node which we were syncing from either resynced or was changed, the last block that we were synced as of might no longer be available. As a result, it becomes impossible to figure out where the fork point is, and thus to replay the chain. Luckily, changing the block source during a reorg isn't exactly common, but we shouldn't end up with a bricked node. To address this, `lightning-block-sync` allows the user to pass in `Cache` which can be used to cache recent blocks and thus allow for reorg handling in this case. However, serialization for, and a reasonable default implementation of a `Cache` was never built. Instead, here, we start taking a different approach. To avoid developers having to persist yet another object, we move `BestBlock` to storing some number of recent block hashes. This allows us to find the fork point with just the serialized state. In a previous commit, we moved deserialization of various structs to return the `BestBlock` rather than a `BlockHash`. Here we move to actually using it, taking a `BestBlock` in place of `BlockHash` to `init::synchronize_listeners` and walking the `previous_blocks` list to find the fork point rather than relying on the `Cache`.
In the previous commit, we moved to relying on `BestBlock::previous_blocks` to find the fork point in `lightning-block-sync`'s `init::synchronize_listeners`. Here we now drop the `Cache` parameter as we no longer rely on it. Because we now have no reason to want a persistent `Cache`, we remove the trait from the public interface. However, to keep disconnections reliable we return the `UnboundedCache` we built up during initial sync from `init::synchronize_listeners` which we expect developers to pass to `SpvClient::new`.
In the previous commit we moved to hard-coding `UnboundedCache` in the `lightning-block-sync` interface. This is great, except that its an unbounded cache that can use arbitrary amounts of memory (though never really all that much - its just headers that come in while we're running). Here we simply limit the size, and while we're at it give it a more generic `HeaderCache` name.
In the next commit we'll fetch blocks during initial connection in parallel, which requires a multi-future poller. Here we add a symlink to the existing `lightning` `async_poll.rs` file, making it available in `lightning-block-sync`
In `init::synchronize_listeners` we may end up spending a decent chunk of our time just fetching block data. Here we parallelize that step across up to 36 blocks at a time. On my node with bitcoind on localhost, the impact of this is somewhat muted by block deserialization being the bulk of the work, however a networked bitcoind would likely change that. Even still, fetching a batch of 36 blocks in parallel happens on my node in ~615 ms vs ~815ms in serial.
In `lightning-blocksync::init::synchronize_listeners`, we may have many listeners we want to do a chain diff on. When doing so, we should make sure we utilize our header cache, rather than querying our chain source for every header we need for each listener. Here we do so, inserting into the cache as we do chain diffs. On my node with a bitcoind on localhost, this brings the calculate-differences step of `init::synchronize_listeners` from ~500ms to under 150ms.
When `synchronize_listeners` runs, it returns a cache of the headers it needed when doing chain difference-finding. This allows us to ensure that when we start running normally we have all the recent headers in case we need them to reorg. Sadly, in some cases it was returning a mostly-empty cache. Because it was only being filled during block difference reconciliation it would only get a block around each listener's fork point. Worse, because we were calling `disconnect_blocks` with the cache the cache would assume we were reorging against the main chain and drop blocks we actually want. Instead, we avoid dropping blocks on `disconnect_blocks` calls and ensure we always add connected blocks to the cache.
668d8f2 to
d37321e
Compare
| while !most_connected_blocks.is_empty() { | ||
| #[cfg(not(test))] | ||
| const MAX_BLOCKS_AT_ONCE: usize = 6 * 6; // Six hours of blocks, 144MiB encoded | ||
| #[cfg(test)] | ||
| const MAX_BLOCKS_AT_ONCE: usize = 2; | ||
|
|
||
| let mut fetch_block_futures = | ||
| Vec::with_capacity(core::cmp::min(MAX_BLOCKS_AT_ONCE, most_connected_blocks.len())); | ||
| for header in most_connected_blocks.iter().rev().take(MAX_BLOCKS_AT_ONCE) { | ||
| let fetch_future = chain_poller.fetch_block(header); | ||
| fetch_block_futures | ||
| .push(ResultFuture::Pending(Box::pin(async move { (header, fetch_future.await) }))); | ||
| } | ||
| let results = MultiResultFuturePoller::new(fetch_block_futures).await.into_iter(); | ||
|
|
||
| const NO_BLOCK: Option<(u32, crate::poll::ValidatedBlock)> = None; | ||
| let mut fetched_blocks = [NO_BLOCK; MAX_BLOCKS_AT_ONCE]; | ||
| for ((header, block_res), result) in results.into_iter().zip(fetched_blocks.iter_mut()) { | ||
| let block = block_res?; | ||
| header_cache.block_connected(header.block_hash, *header); | ||
| *result = Some((header.height, block)); | ||
| } | ||
| debug_assert!(fetched_blocks.iter().take(most_connected_blocks.len()).all(|r| r.is_some())); | ||
| // TODO: When our MSRV is 1.82, use is_sorted_by_key | ||
| debug_assert!(fetched_blocks.windows(2).all(|blocks| { | ||
| if let (Some(a), Some(b)) = (&blocks[0], &blocks[1]) { | ||
| a.0 < b.0 | ||
| } else { | ||
| // Any non-None blocks have to come before any None entries | ||
| blocks[1].is_none() | ||
| } | ||
| })); | ||
|
|
||
| for (listener_height, listener) in chain_listeners_at_height.iter() { |
There was a problem hiding this comment.
Bug: chain_listeners_at_height is never updated, which means the height > listener_height filtering on line 211 uses the original common ancestor height across ALL batches. This works correctly only because fetched_blocks is re-created each loop iteration with only the current batch's blocks. However, this leads to a subtle invariant: the ordering guarantees for Listen::block_connected (which asserts prev_blockhash == best_block.block_hash in ChannelManager) are maintained across batches only because:
- Each batch processes ascending-height blocks
- Between batches, the listener's internal
best_blockwas updated by the previous batch'sblock_connectedcalls - The first block of the next batch builds on the last block of the previous batch
If any of these invariants were violated (e.g., if fetched_blocks ever contained blocks out of order), the ChannelManager assertion would panic. This implicit coupling between batch ordering, listener internal state, and the never-updated listener_height is fragile. Consider adding a comment documenting this invariant.
| let bounded_fee_estimator = LowerBoundedFeeEstimator::new(fee_estimator); | ||
| self.transactions_confirmed(header, txdata, height, broadcaster, &bounded_fee_estimator, logger) |
There was a problem hiding this comment.
Semantic change: Previously, block_connected unconditionally set self.best_block = BestBlock::new(block_hash, height) BEFORE calling transactions_confirmed. Now, best_block is only updated at the end of transactions_confirmed (lines 5687-5691), and only when height > self.best_block.height.
This means during block_connected, best_block is stale throughout the entire transaction processing phase (lines 5440-5685). I verified self.best_block is not directly read in that range so this appears safe, but the old code was more defensive — it ensured best_block was always current before processing.
Also: if block_connected is called with height <= self.best_block.height (shouldn't happen per Listen contract but could via direct API misuse), the new code silently skips the best_block update, whereas the old code always updated it.
| } | ||
|
|
||
| pub fn best_block_at_height(&self, height: usize) -> BestBlock { | ||
| let mut previous_blocks = [None; 12]; |
There was a problem hiding this comment.
Nit: The array size 12 is hardcoded here, which will silently go out of sync if ANTI_REORG_DELAY ever changes. Consider using the constant:
| let mut previous_blocks = [None; 12]; | |
| let mut previous_blocks = [None; lightning::chain::channelmonitor::ANTI_REORG_DELAY as usize * 2]; |
Or import the constant and use it, to stay in sync with the BestBlock definition.
| let difference = chain_notifier | ||
| .find_difference_from_best_block(best_header, old_best_block, &mut chain_poller) | ||
| .await?; |
There was a problem hiding this comment.
Missing test coverage for the previous_blocks fallback path. The find_difference_from_best_block method iterates prev_best_block.previous_blocks as a fallback when the tip block_hash can't be resolved. However, none of the tests in this file exercise this path — all test BestBlock objects are constructed via best_block_at_height() / best_block() with hashes that the mock blockchain can resolve directly.
Consider adding a test where the listener's BestBlock.block_hash is NOT available from the block source (e.g., a hash from a fork chain that was pruned), but previous_blocks contains a hash that IS on the current chain. This would verify the core reorg-resilience feature of this PR.
| fn blocks_disconnected(&mut self, fork_point: &ValidatedBlockHeader) { | ||
| self.0.retain(|_, block_info| block_info.height <= fork_point.height); |
There was a problem hiding this comment.
The blocks_disconnected method removes all blocks at heights above the fork point. The old per-block block_disconnected only removed specific blocks by hash. This means if the cache contains headers from a stale fork (at heights above the fork point), they'll now be pruned. This is likely fine (stale fork headers shouldn't be in the cache for the ongoing SpvClient), but it's a behavioral change worth noting — the cache is now more aggressively pruned on disconnection.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4266 +/- ##
==========================================
- Coverage 86.19% 86.18% -0.01%
==========================================
Files 161 162 +1
Lines 107459 107693 +234
Branches 107459 107693 +234
==========================================
+ Hits 92621 92813 +192
- Misses 12219 12257 +38
- Partials 2619 2623 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
🔔 18th Reminder Hey @valentinewallace! This PR has been waiting for your review. |
On restart, LDK expects the chain to be replayed starting from
where it was when objects were last serialized. This is fine in the
normal case, but if there was a reorg and the node which we were
syncing from either resynced or was changed, the last block that we
were synced as of might no longer be available. As a result, it
becomes impossible to figure out where the fork point is, and thus
to replay the chain.
Luckily, changing the block source during a reorg isn't exactly
common, but we shouldn't end up with a bricked node.
To address this,
lightning-block-syncallows the user to pass inCachewhich can be used to cache recent blocks and thus allow forreorg handling in this case. However, serialization for, and a
reasonable default implementation of a
Cachewas never built.Instead, here, we start taking a different approach. To avoid
developers having to persist yet another object, we move
BestBlockto storing some number of recent block hashes. Thisallows us to find the fork point with just the serialized state.
In conjunction with 403dc1a (which
allows us to disconnect blocks without having the stored header),
this should allow us to replay chain state after a reorg even if
we no longer have access to the top few blocks of the old chain
tip.
While we only really need to store
ANTI_REORG_DELAYblocks (as wegenerally assume that any deeper reorg won't happen and thus we
don't guarantee we handle it correctly), its nice to store a few
more to be able to handle more than a six block reorg. While other
parts of the codebase may not be entirely robust against such a
reorg if the transactions confirmed change out from under us, its
entirely possible (and, indeed, common) for reorgs to contain
nearly identical transactions.
While we're here, we also parallelize a few parts of the initial sync and utilize the cache to speed it up. As such, this is based on #4147 which is based on #4175..