Distributed System: Leader Election

Theis F. Hinz picture Theis F. Hinz · Apr 17, 2013 · Viewed 13.5k times · Source

Im currently working on a Distributed System where we have to implement some kind of Leader Election. The problem is that we would like to avoid that all computers have to know each other - but only the leader. Is there a fast way where we can use for instance Broadcast to achieve what we want?

Or does we simply have to know at least one, to perform a good Leader Election?

It is assumable that all computers is on same subnet.

Thanks for your help.

Answer

Marc Brooker picture Marc Brooker · Apr 14, 2014

The problem is that we would like to avoid that all computers have to know each other - but only the leader.

Leader election is the problem of picking a single leader out of a set of potential leader candidates. Look at it as having two required properties: liveness and safety. Here, liveness would mean "most of the time, there is a leader", while safety would mean "there are either zero or one leaders". Let's consider how we would solve this safety property in your example, using broadcast.

Let's pick a simple (broken) algorithm, assuming every node has a unique ID. Each node broadcasts its ID and listens. When receiving a higher ID than its own, it stops participating. If it receives a lower ID than its own, it sends broadcasts its own again. Assuming a synchronous network, the last ID everybody receives is the leader's ID. Now, introduce a network partition. The protocol will happily continue on either side of the partition, and two leaders will be elected.

That's true of this broken protocol, but it's also true of all possible protocols. How do you tell the difference between nodes you can't communicate with and nodes that don't exist if you don't know (at least) how many nodes exist? So there's a first safety result: you need to know how many nodes exist, or you can't ensure there is only one leader.

Now, let's relax our safety constraint to be a probabilistic one: "there can be zero or more leaders, but most of the time there is one". That makes the problem tractable, and a widely-used solution is gossip (epidemic protocols). For example, see A Gossip-Style Failure Detection Service which discusses a variant of this exact problem. The paper mainly concerns itself with probabilistically correct failure detection and enumeration, but if you can do that you can do probabilistically correct leader election too.

As far as I can tell, you can't have safe non-probabilistic leader election in general networks without at least enumerating the participants.