Simplex reconfiguration framework - Part III (MSM implementation) by yacovm · Pull Request #365 · ava-labs/Simplex

yacovm · 2026-04-16T19:32:30Z

Add block building to msm.go
Add verification.go which contains logic for block verification
Add tests that mimic Simplex flow (fake_node_test.go)

- Add block building to msm.go - Add verification.go which contains logic for block verification - Add tests that mimic Simplex flow (fake_node_test.go) Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

Addresses review comments: finalized blocks were a prefix of notarized blocks, so keeping two separate slices was redundant and error prone (e.g. tryFinalizeNextBlock panicked when the two went out of sync). Replaces both with a single blocks []blockState slice where each entry carries a finalized flag. Also removes the duplicate lookup in the GetBlock test fixture that would return finalized blocks as non-finalized. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

samliok · 2026-05-07T03:16:43Z

+	bs := make(blockStore)
+	bs[0] = &outerBlock{block: genesisBlock}
+
+	var testConfig testConfig
+	testConfig.blockStore = bs
+	testConfig.validatorSetRetriever.result = NodeBLSMappings{


maybe we should have constructors for helpers? seems like it would set them up to be more re-usable + maintainable in the future?

maybe we can do that after i re-introduce verification to do that in bulk.

samliok · 2026-05-07T14:04:18Z

 type SignatureAggregator interface {
 	AggregateSignatures(signatures ...[]byte) ([]byte, error)
+
+	IsQuorum(approverWeights []uint64, totalWeight uint64) bool


i don't think we should redefine this interface in the msm. Would it be better to use the interface defined in api.go?

I think it's better for 2 reasons, the first being we would currently need to implement these SignatureAggregator interfaces using two different structs because function names cannot be overloaded. More importantly, I think we should just have one implementation of IsQuorum. I think its reasonable to assume the SignatureAggregator would have the weights of each nodes in a PoS setting, so we can just use the IsQuorum method defined in api.go:SignatureAggregator

The problem is that the signature aggregator cannot know which P-chain block we're referring to, and therefore if it gets an invocation from an epoch or from an MSM it has no clue which P-chain to lookup against.

I agree that it's best that the IsQuorum will be a unified and single API.

What do you think about doing the complete opposite - to have the Epoch use IsQuorum(approverWeights []uint64, totalWeight uint64) bool and also add weights to the membership of the communication? This will simplify the implementation of IsQuorum in the avalanchego side.

to recap our quick sync offline: let's initialize SignatureAggregators with a set of node <-> weight mappings, and then use the newly initialized signature aggregator when calling IsQuorum in the MSM approval logic. This allows us to not worry about passing in weights and keep the abstraction in AvalancheGo as is.

samliok · 2026-05-07T14:58:11Z

+	seq := pmd.Seq
+
+	if seq == 0 {
+		return fmt.Errorf("attempted to build a genesis inner block")


this is in verification though, not block building

yeah but the idea is "hey, someone tried to build a genesis block".

Also it shouldn't be "inner" - it's a copy-paste error I thought I fixed.

Will fix it to something more reasonable like "received a genesis block"

samliok · 2026-05-07T15:22:58Z

 type SignatureAggregator interface {
 	AggregateSignatures(signatures ...[]byte) ([]byte, error)
+
+	IsQuorum(approverWeights []uint64, totalWeight uint64) bool


to recap our quick sync offline: let's initialize SignatureAggregators with a set of node <-> weight mappings, and then use the newly initialized signature aggregator when calling IsQuorum in the MSM approval logic. This allows us to not worry about passing in weights and keep the abstraction in AvalancheGo as is.

samliok · 2026-05-07T19:09:01Z

+	}
+	simplexEpochInfo := constructSimplexZeroBlock(pChainHeight, newValidatorSet, prevVMBlockSeq)
+
+	return sm.buildBlockImpatiently(ctx, parentBlock, simplexMetadata, simplexBlacklist, simplexEpochInfo, pChainHeight)


🤔

the caller BuildBlock is only called when we are selected to build a block. We also do the wait impatiently thingy here, but more often than not we will actually not have an actual block to build. I say this because when we start a chain we will probably not begin issuing transactions right away.

Therefore, more often than not the first Simplex block will just be an empty one that is produced right after the simplex chain is created. This also means that most block "zero" will be the same.

Because most zero blocks will be the same I'm thinking we could either hardcode the zeroth block and remove a lot of the branching that is caused by treating zeroth blocks so specially(no need for the build block zero method, the verify method becomes simple and a lot of branching is removed).

Another possibility -> why rush to build the zeroth block? can't we just wait like the normal blocks?

Maybe my logic is wrong somewhere, but I generally have a feeling there is a much simpler solution with handling the zeroth case.

Because most zero blocks will be the same I'm thinking we could either hardcode the zeroth block and remove a lot of the branching that is caused by treating zeroth blocks so specially(no need for the build block zero method, the verify method becomes simple and a lot of branching is removed).

we can't hardcode the zero block because it has metadata fields we cannot precompute, such as ICM epoch and timestamp, previous VM block seq, etc.

Another possibility -> why rush to build the zeroth block? can't we just wait like the normal blocks?

so we can easily expand the validator set. Essentially, if we spin up a single node network, we can then add a node, add another node, add another node, all by interacting with the P-chain, and we will never need any user traffic.

If we need user traffic to build the zero epoch, we're stuck.

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

samliok

w

samliok · 2026-05-11T21:38:47Z

+	case stateBuildBlockNormalOp:
+		return sm.buildBlockNormalOp(ctx, parentBlock, simplexMetadataBytes, simplexBlacklistBytes, prevBlockSeq)
+	case stateBuildCollectingApprovals:
+		return sm.buildBlockCollectingApprovals(ctx, parentBlock, simplexMetadataBytes, simplexBlacklistBytes, prevBlockSeq)


what happens if we are collecting approvals for a validator set change v1, but the pchain height has advanced to a new validator set v2. am i correct to say the current code would first finish collecting approvals for v1 and then transition to v2 after all approvals are collected?

would it be better to stop the transition of v1 when we notice v2? curious about your thoughts

Once we have the next P-chain reference height (we moved from normal block to collecting approvals) we are essentially "locked" on that P-chain height for the epoch transition and cannot change it.

am i correct to say the current code would first finish collecting approvals for v1 and then transition to v2 after all approvals are collected?

Yes, in such a case we'll have two epoch changes one after the other.

would it be better to stop the transition of v1 when we notice v2? curious about your thoughts

So, I did think about this "problem" as well when I designed the MSM, but I don't think this is a use case we should optimize for.

We might just be able to solve this problem passively by using P-chain from ICM epochs, because they change their P-chain height slowly every time frame (I think 30 seconds or so) and therefore it makes implicit batching of P-chain changes.

I'll look at the code once I re-introduce ICM epoching and see if I can make it more robust.

Thanks for asking this 👍

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

samliok · 2026-05-13T14:56:39Z

 	NextEpochApprovals *NextEpochApprovals `canoto:"pointer,7"`
 	// SealingBlockSeq is the block sequence of the sealing block of the current epoch.
 	// It defines the validator set of the next epoch.
 	SealingBlockSeq uint64 `canoto:"uint,8"`


is this only set when we are building telocks?

Yes. It's set once we build the first Telock and then we copy over the previous Telock's sealing block seq in induction.

samliok · 2026-05-13T15:01:09Z

+	}
+
+	// TODO: This P-chain height should be taken from the ICM epoch
+	childBlock, err := sm.BlockBuilder.BuildBlock(ctx, sm.GetPChainHeight())


🤔 we are building a new block for the epoch, but a couple things can change between epochs

validator set ordering -> this means we might have been the leader for seq x in epoch 1, but now that the epoch has changed we may not be the leader for seq x in epoch 2.

Our node may be kicked out of validator set, so we shouldn't be building a block anyways

we are building a new block for the epoch, but a couple things can change between epochs

To be precise, we're building a block for a new epoch.

validator set ordering -> this means we might have been the leader for seq x in epoch 1, but now that the epoch has changed we may not be the leader for seq x in epoch 2.

Our node may be kicked out of validator set, so we shouldn't be building a block anyways

A new epoch, a new Epoch instance, right?

Why does it matter which node is the leader at a current round? The MSM assumes nothing about the relative position of the node within the validator set. Eventually there would be a non faulty node that will be a leader and will execute this line. When a node executes this line, it will already have the new epoch instance with the new validator set, right?

Keep in mind that the API of the MSM is very primitive - we can either BuildBlock or VerifyBlock. There is no assumption of any kind of continuation. We can only build a single block at any given round.

So, if in the first time we execute this function we haven't sealed (finalized the sealing block) our epoch, we are running under the context of the previous epoch. Otherwise, If the epoch has been sealed then this is executed under the context of the new epoch, and therefore a node that has been evicted isn't expected to execute it (and even if it has, no one will accept such a block).

Does that make sense?

ah makes sense, thanks for the explaination 👍

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

Simplex reconfiguration framework - Part III (MSM implementation)

7bf5e97

- Add block building to msm.go - Add verification.go which contains logic for block verification - Add tests that mimic Simplex flow (fake_node_test.go) Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

yacovm force-pushed the reconfig-3 branch 2 times, most recently from 62fef9a to 7430e36 Compare May 5, 2026 20:04

rebase

b960465

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

yacovm force-pushed the reconfig-3 branch from 7430e36 to b960465 Compare May 5, 2026 20:13

yacovm and others added 2 commits May 5, 2026 22:40

Remove RetrievingOpts in favor of explicit parameters

84c7696

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

samliok mentioned this pull request May 5, 2026

Refactor NodeBLSMappings weight APIs to use uint64 #380

Closed

samliok reviewed May 7, 2026

View reviewed changes

yacovm added 3 commits May 8, 2026 16:48

Use signature aggregator creator

5911aee

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

Remove findFirstSimplexBlock in favor of explicit initialization

54ab6d9

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

Create explicit constructor for MSM

e197d39

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

samliok reviewed May 11, 2026

View reviewed changes

fix spelling errors

cdd6dd1

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

yacovm force-pushed the reconfig-3 branch from 7850ff1 to cdd6dd1 Compare May 11, 2026 21:52

yacovm added 4 commits May 12, 2026 16:42

nits + gofmt

c04eb9e

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

Move around definitions

b49e577

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

Remove verification to introduce it later

a955d7a

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

Move helpers to misc_test.go

dff27ea

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

samliok reviewed May 12, 2026

View reviewed changes

Comment thread api.go Outdated

samliok reviewed May 12, 2026

View reviewed changes

Comment thread api.go Outdated

Comment thread epoch.go Outdated

Comment thread pos_test.go

Comment thread msm/msm.go

Comment thread msm/msm.go

Comment thread msm/msm.go Outdated

Comment thread msm/msm.go Outdated

Comment thread msm/msm.go Outdated

Address code review comments

f2b6f7e

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

yacovm force-pushed the reconfig-3 branch from e9c6649 to f2b6f7e Compare May 12, 2026 22:11

samliok reviewed May 12, 2026

View reviewed changes

Address code review comments II

1900140

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

samliok reviewed May 13, 2026

View reviewed changes

Refine comment about SealingBlockSeq

5ef3c85

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

yacovm force-pushed the reconfig-3 branch from 9c83a6b to 5ef3c85 Compare May 13, 2026 15:16

samliok approved these changes May 13, 2026

View reviewed changes

Fix test TestMSMFirstSimplexBlockAfterPreSimplexBlocks

931a7a9

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>

yacovm force-pushed the reconfig-3 branch from 680a700 to 931a7a9 Compare May 13, 2026 16:14

Merge branch 'main' into reconfig-3

c790cf8

yacovm merged commit c7114b6 into main May 13, 2026
5 checks passed

samliok mentioned this pull request May 13, 2026

Simplex reconfiguration framework - Part IV (MSM implementation - verification) #381

Open

Conversation

yacovm commented Apr 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samliok left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment