paxos vs raft for leader election

Oliver Young picture Oliver Young · Aug 25, 2017 · Viewed 11.6k times · Source

After reading paxos and raft paper, I have following confusion: paxos paper only describe consensus on single log entry, which is equivalent the leader election part of the raft algorithm. What's the advantage of paxos's approach over the simple random timeout approach in raft's leader election?

Answer

simbo1905 picture simbo1905 · Sep 2, 2017

It is a common misconception that the original Paxos papers don't use a stable leader. In Paxos Made Simple on page 6 in the section entitled “The Implementation” Lamport wrote:

The algorithm chooses a leader, which plays the roles of the distinguished proposer and the distinguished learner.

This is simply achieved using the Phase 1 messaging of prepare and promises.

Then on pages 9 and 10 under the section “Implementing a State Machine” we have:

In normal operation, a single server is elected to be the leader, which acts as the distinguished proposer (the only one that tries to issue proposals) in all instances of the consensus algorithm.

Here it is using the term 'state machine' in the most generic sense covering the obvious cases such as a key value store or database server where we replicate a log of actions applied to the store.

People get confused about this because of the way Lamport proved Paxos which is now the way it is taught. Lamport proved the correctness of a class of applications known as Paxos by stripping it down to a mathematical model that can be reasoned about. He called this “The Single-Decree Synod” in the original paper The Part-Time Parliament:

Paxon religious leaders asked mathematicians to formulate a protocol for choosing the Synod’s decree. The protocol’s requirements and assumptions were essentially the same as those of the later Parliament except that instead of containing a sequence of decrees, a ledger would have at most one decree. The resulting Synod protocol is described here; the Parliamentary protocol is described in Section 3.

If you find that statement confusing don’t worry it is a bad joke; literally. A translation of this in my own words would be:

“In order to prove the correctness of the consensus algorithm for choosing a stream of commands we can first demonstrate the correctness of a mathematical model which chooses a single command. The mathematical model for selecting a single command can then be extended to the practical algorithm for selecting a stream of commands (Section 3) as long as the invariants of the single command mathematical model are not violated.” – simbo1905

In order to justify my interpretation we can look at Section 3 entitled “The Multi-Decree Parliament” which says:

Instead of passing just one decree, the Paxon Parliament had to pass a series of numbered decrees. As in the Synod protocol, a president was elected. Anyone who wanted a decree passed would inform the president, who would assign a number to the decree and attempt to pass it. Logically, the parliamentary protocol used a separate instance of the complete Synod protocol for each decree number. However, a single president was selected for all these instances, and he performed the first two steps of the protocol just once.

To labour the point both the original “The Part-Time Parliment” paper introducing Paxos as interesting to computer scientists because of its multi-degree algorithm; the parliament protocol. That and the clarification paper “Paxos Made Simple” both define Paxos as having a distinguished leader assigning sequence numbers to a stream of commands. Furthermore, the distinguished leader only sends “prepare” messages when it assumes leadership; after that in steady-state the distinguished leader streams only “accept” messages. He also says elsewhere in the paper to collapse the roles and have all servers run all three roles of the algorithm.

Where you ask "What's the advantage of Paxos's approach over the simple random timeout approach in raft's leader election?" I am not sure what approach you are referring to? With Paxos, you can just randomise the timeouts and issue Prepare messages. The Paxos Made Simple paper indicates that you are free to timeouts or some other faster mechanism long as you follow the protocol which will ensure safety:

The famous result of Fischer, Lynch, and Pat- terson 1 implies that a reliable algorithm for electing a proposer must use either randomness or real time—for example, by using timeouts. However, safety is ensured regardless of the success or failure of the election.

Randomised timeouts are very easy to code and are very understandable. Yet in the worse case, they can lead to a long delay in recovery. You don't like that you can use your own leader election mechanism. For example this one.