Why do we need an 'arbiter' in MongoDB replication?

卢声远 Shengyuan Lu picture 卢声远 Shengyuan Lu · Aug 13, 2013 · Viewed 22.5k times · Source

Assume we setup a MongoDB replication without arbiter, If the primary is unavailable, the replica set will elect a secondary to be primary. So I think it's kind of implicit arbiter, since the replica will elect a primary automatically.

So I am wondering why do we need a dedicated arbiter node? Thanks!

Answer

Bruno Bronosky picture Bruno Bronosky · Mar 29, 2017

I created a spreadsheet to better illustrate the effect of Arbiter nodes in a Replica Set.

enter image description here

It basically comes down to these points:

  1. With an RS of 2 data nodes, losing 1 server brings you below your voting minimum (which is "greater than N/2"). An arbiter solves this.
  2. With an RS of even numbered data nodes, adding an Arbiter increases your fault tolerance by 1 without making it possible to have 2 voting clusters due to a split.
  3. With an RS of odd numbered data nodes, adding an Arbiter would allow a split to create 2 isolated clusters with "greater than N/2" votes and therefore a split brain scenario.

Elections are explained [in poor] detail here. In that document it states that an RS can have 50 members (even number) and 7 voting members. I emphasize "states" because it does not explain how it works. To me it seems that if you have a split happen with 4 members (all voting) on one side and 46 members (3 voting) on the other, you'd rather have the 46 elect a primary and the 4 to be a read-only cluster. But, that's exactly what "limited voting" prevents. In that situation you will actually have a 4 member cluster with a primary and a 46 member cluster that is read only. Explaining how that makes sense is out of the scope of this question and beyond my knowledge.