We brainstormed with Max and Pieguy today the consensus for the beacon chain. The result of the conversation was the protocol that we are convinced works.
These are the original prerequisites for the consensus algorithm:
- The set of witnesses must be able to reach a consensus on what the content of the block is;
- The set of witnesses must be able to reach a consensus on who signs the block and sign it, with censoring out signatures being as hard as possible (since the reward is only paid if the witness signs the block);
- The blocks must be staggered out in time (e.g. approximately 1 minute apart from each other), ideally without reliance on the synchronized clocks;
- (1) and (2) assumes < ⅓ adversaries among the witnesses of each block, but since this is not a reasonable assumption, forks must be tolerated;
We still rely on Thresholded Proof of Stake to choose and rotate validators on the Beacon chain. The motivation there remains that:
- We want the minimum stake to participate in validation to be as low as possible (TPoS with our defaults enables everyone with 1/1M of the total stake to participate);
- Once a validator, the variance on number of produced blocks per period of time shall be small (TPoS effectively makes the variance 0), since high variance encourages pooling.
This write-up is the promised continuation that addresses Agreement of Illia’s post on TPoS.
Reaching consensus and signatures
For now let’s assume that we do rely on the clock.
The validators of each block launch an instance of TxFlow, on which the messages with transactions that will be included into the block are accumulated (in the simplest version the only kind of a transaction is a participant asking to become a validator). The witnesses run the full algorithm, agreeing on the epoch blocks, and posting kickout messages where applicable, but such epoch blocks are immediately discarded.
There are six special kinds of messages, separated into two groups: A, B, C, A’, B’ and C’.
Assuming we want to produce blocks roughly once a minute, 40 seconds into the time frame each witness posts on TxFlow exactly one message of type A. If we use a Schnorr signatures, such message also contains a commit.
Then witnesses continue posting messages regularly (possibly still accumulating transactions). Whenever a witness posts the first message that approves more than ⅔ of messages of type A, such a message has type B.
The very first representative message that results in an Epoch Block that approves more than ⅔ of messages of type B has type C. Thus there will only be one message of type C unless more than ⅓ of participants collude. The TxFlow subgraph of this message represents the content of the block.
The first message from every witness that approves the message of type C has type A’. Such message must contain the multisignature part.
Similarly to B and C, The first message from every witness that approves more than ⅔ of messages of type A’ has type B’, and the first representative message that results in an epoch block that approves more than ⅔ of B’ is C’.
The subgraph of C’ is sufficient to reconstruct the block. The content of the block is defined by what is reachable from the message of type C that is reachable from C’, and only the signatures reachable from C’ are used to create the multisignature.
First, I assume above that it is safe to accumulate fewer signature parts than commits used to create the collective commit. This needs further verification (but ultimately we can probably use BLS and skip the commit phase altogether). PieGuy also brought up this paper where they use a three-phase accumulation, I haven’t looked into it yet.
There’s a certain concern that ultimately it is a single witness that decides on the content of the block, and the set of signatures. Further analysis is needed, but the intuition why I find it acceptable is that for the message to have type C it needs to approve ⅔ of Bs, but the actual payload is what is reachable from messages of type A. The only malicious intent that a publisher of C can exercise is censor out some A, but assuming such A was not delayed significantly compared to others, and was approved by more than ⅓ of Bs, it becomes impossible to censor it out.
Why do we need Bs (and do we need more types in between)
The system would still be somewhat meaningful without Bs, i.e. if C was created once it observes ⅔ of As. But then under normal operation only ⅔ of signatures will be accumulated each time (so ⅓ of slowest nodes will not get their reward / won’t contribute to the security of the system), and censorship will be trivial.
Note that having just one extra round is a heuristic, and in practice having more than one might be desirable (e.g. > ⅔ As is B, > ⅔ Bs is C, > ⅔ Cs is D, and then the first representative message resulting in an epoch block with D is E). Each such extra round adds diminishing increase on how many people will end up signing the block, and a diminishing decrease on likelihood of censorship, but also increases time to reach the consensus.
Forks choice rule
Forks will naturally occur, since with TPoS the possibility of corrupting witnesses for one particular block is not impossible (for example see this blog post), so a meaningful fork choise rule is needed.
The fork choice rule is the largest total (possibly normalized) stake among all the signers on all the blocks. Normalized here would refer to dividing the stake by the total amount of tokens in circulation. Whether it shall be normalized or not can be figured out separately.
The analysis of this fork choice rule (and how to deal with history attacks) is beyond the scope of this post.
More than ⅓ adversaries
If there’s more than ⅓ of adversaries, a consensus on a block cannot be reached, since adversaries can just stall the production of blocks of type B (or any other type, or stall TxFlow in general).
The witnesses will need to resort to some other algorithm to create a block in such case, figuring this protocol out is future work.
Removing the dependency on time
The easiest way to remove the dependency on time is using VDFs.
Once any witness observes the previous block, they start computing a VDF based on the has of the block and their public key, with VDF calibrated to take approximately 40 seconds for a median participant. A participant can only post a message of type A either with the output of the VDF, or if it approves more than ⅓ (this constant can be moved up or down) of messages of type A with VDF outputs. A witness that posted the VDF output gets slightly higher reward than the one who didn’t. A witness that posted a message of type A without the VDF output can post the output at any point later, and get a reward for the VDF for as long as the message with the VDF output is approved by the message of type C.