[WIP] withdraw: persist tx before state transitions#1105
[WIP] withdraw: persist tx before state transitions#1105starius wants to merge 1 commit intolightninglabs:masterfrom
Conversation
Persist finalized withdrawal tx pointers to all selected deposits before triggering state transitions. This avoids restart races where deposits can remain in WITHDRAWING without a recoverable finalized tx reference. Recovery now groups withdrawing deposits by tx hash and restores missing tx pointers when there is exactly one in-flight withdrawal cluster. Recovered pointers are written back so subsequent restarts remain stable. Add a regression test which exercises recoverWithdrawals end-to-end and verifies restart recovery for WITHDRAWING deposits missing FinalizedWithdrawalTx.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness of the withdrawal process by addressing potential data inconsistencies during system restarts. It ensures that critical transaction information is durably stored before any state changes occur, thereby preventing deposits from becoming unrecoverable. The improved recovery logic can now intelligently reconstruct withdrawal clusters, making the system more resilient to unexpected interruptions. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the withdrawal manager to improve recovery logic for deposits that are in the WITHDRAWING state but might be missing their FinalizedWithdrawalTx pointers after a restart. This includes extracting deposit grouping and missing transaction recovery logic into a new helper function, groupWithdrawingDepositsByTx, and centralizing the persistence of finalized withdrawal transactions into persistFinalizedWithdrawalTx. A new test case was added to validate this recovery mechanism. The review comments suggest optimizing the iteration logic within both groupWithdrawingDepositsByTx and persistFinalizedWithdrawalTx to enhance efficiency and readability by combining multiple loops into single passes.
| depositsByWithdrawalTx := make(map[chainhash.Hash][]*deposit.Deposit) | ||
| hash2tx := make(map[chainhash.Hash]*wire.MsgTx) | ||
|
|
||
| // Build an index of all known finalized withdrawal transactions. | ||
| for _, d := range withdrawingDeposits { | ||
| if d.FinalizedWithdrawalTx == nil { | ||
| continue | ||
| } | ||
|
|
||
| txid := d.FinalizedWithdrawalTx.TxHash() | ||
| hash2tx[txid] = d.FinalizedWithdrawalTx | ||
| } | ||
|
|
||
| // If exactly one tx hash is present, we can recover missing tx pointers | ||
| // from that single cluster. | ||
| var fallbackTx *wire.MsgTx | ||
| if len(hash2tx) == 1 { | ||
| for _, tx := range hash2tx { | ||
| fallbackTx = tx | ||
| } | ||
| } | ||
|
|
||
| for _, d := range withdrawingDeposits { | ||
| withdrawalTx := d.FinalizedWithdrawalTx | ||
| if withdrawalTx == nil { | ||
| if fallbackTx == nil { | ||
| log.Warnf("Skipping withdrawing deposit %v "+ | ||
| "during recovery: missing finalized "+ | ||
| "withdrawal tx", d.OutPoint) | ||
|
|
||
| continue | ||
| } | ||
|
|
||
| // Persist the recovered tx pointer so future restarts | ||
| // don't depend on in-memory fallback recovery. | ||
| d.Lock() | ||
| d.FinalizedWithdrawalTx = fallbackTx | ||
| d.Unlock() | ||
|
|
||
| err := m.cfg.DepositManager.UpdateDeposit(ctx, d) | ||
| if err != nil { | ||
| return nil, nil, fmt.Errorf("unable to "+ | ||
| "persist recovered finalized "+ | ||
| "withdrawal tx for deposit %v: %w", | ||
| d.OutPoint, err) | ||
| } | ||
|
|
||
| log.Warnf("Recovered missing finalized withdrawal tx "+ | ||
| "for deposit %v", d.OutPoint) | ||
|
|
||
| withdrawalTx = fallbackTx | ||
| } | ||
|
|
||
| txid := withdrawalTx.TxHash() | ||
| hash2tx[txid] = withdrawalTx | ||
| depositsByWithdrawalTx[txid] = append( | ||
| depositsByWithdrawalTx[txid], d, | ||
| ) | ||
| } |
There was a problem hiding this comment.
The current implementation iterates over withdrawingDeposits twice. This can be made more efficient and arguably clearer by partitioning the deposits into those with and without a finalized transaction in a single pass. This avoids redundant work and simplifies the logic.
depositsByWithdrawalTx := make(map[chainhash.Hash][]*deposit.Deposit)
hash2tx := make(map[chainhash.Hash]*wire.MsgTx)
var missingTxDeposits []*deposit.Deposit
// Partition deposits into those with and without a finalized tx, and
// group the ones that have a tx.
for _, d := range withdrawingDeposits {
if d.FinalizedWithdrawalTx == nil {
missingTxDeposits = append(missingTxDeposits, d)
continue
}
txid := d.FinalizedWithdrawalTx.TxHash()
hash2tx[txid] = d.FinalizedWithdrawalTx
depositsByWithdrawalTx[txid] = append(
depositsByWithdrawalTx[txid], d,
)
}
// If there are deposits with missing tx pointers and there is exactly
// one cluster of deposits with a known tx, we can recover.
if len(missingTxDeposits) > 0 && len(hash2tx) == 1 {
var fallbackTx *wire.MsgTx
var fallbackTxID chainhash.Hash
for txid, tx := range hash2tx {
fallbackTx = tx
fallbackTxID = txid
break
}
for _, d := range missingTxDeposits {
// Persist the recovered tx pointer so future restarts
// don't depend on in-memory fallback recovery.
d.Lock()
d.FinalizedWithdrawalTx = fallbackTx
d.Unlock()
err := m.cfg.DepositManager.UpdateDeposit(ctx, d)
if err != nil {
return nil, nil, fmt.Errorf("unable to "+
"persist recovered finalized "+
"withdrawal tx for deposit %v: %w",
d.OutPoint, err)
}
log.Warnf("Recovered missing finalized withdrawal tx "+
"for deposit %v", d.OutPoint)
// Add the recovered deposit to the cluster.
depositsByWithdrawalTx[fallbackTxID] = append(
depositsByWithdrawalTx[fallbackTxID], d,
)
}
} else if len(missingTxDeposits) > 0 {
// Log deposits that could not be recovered.
for _, d := range missingTxDeposits {
log.Warnf("Skipping withdrawing deposit %v "+
"during recovery: missing finalized "+
"withdrawal tx", d.OutPoint)
}
}| for _, d := range deposits { | ||
| d.Lock() | ||
| d.FinalizedWithdrawalTx = finalizedTx | ||
| d.Unlock() | ||
| } | ||
|
|
||
| for _, d := range deposits { | ||
| err := m.cfg.DepositManager.UpdateDeposit(ctx, d) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to update deposit %v: %w", | ||
| d.OutPoint, err) | ||
| } | ||
| } |
There was a problem hiding this comment.
The function persistFinalizedWithdrawalTx currently iterates through the deposits slice twice: once to update the FinalizedWithdrawalTx in memory, and a second time to persist each deposit to the database. These two loops can be combined into a single loop for better efficiency and readability. In each iteration, you can update the in-memory object and then immediately call UpdateDeposit. This also improves consistency in case of a failure during persistence.
for _, d := range deposits {
d.Lock()
d.FinalizedWithdrawalTx = finalizedTx
d.Unlock()
err := m.cfg.DepositManager.UpdateDeposit(ctx, d)
if err != nil {
return fmt.Errorf("failed to update deposit %v: %w",
d.OutPoint, err)
}
}
Persist finalized withdrawal tx pointers to all selected deposits before triggering state transitions. This avoids restart races where deposits can remain in WITHDRAWING without a recoverable finalized tx reference.
Recovery now groups withdrawing deposits by tx hash and restores missing tx pointers when there is exactly one in-flight withdrawal cluster. Recovered pointers are written back so subsequent restarts remain stable.
Add a regression test which exercises recoverWithdrawals end-to-end and verifies restart recovery for WITHDRAWING deposits missing FinalizedWithdrawalTx.
Pull Request Checklist
release_notes.mdif your PR contains major features, breaking changes or bugfixes