Why big companies use Mnesia instead of using Riak or CouchDB

securecurve picture securecurve · Apr 20, 2014 · Viewed 13.2k times · Source

I can see 2 big companies like Klarna and Whatsapp are using Mnesia as their in memory database (not sure how they persist data with Mnesia with 2GB limit). My question is: why companies like those, and may be more I don't know, use Mnesia instead of Riak or couchDB, both are Erlang, where both databases support faster in memory databases, better painless persistence, and much more features. Do I miss something here?

Answer

I GIVE CRAP ANSWERS picture I GIVE CRAP ANSWERS · Apr 20, 2014

You are missing a number of important points:

First of all, mnesia has no 2 gigabyte limit. It is limited on a 32bit architecture, but hardly any are present anymore for real work. And on 64bit, you are not limited to 2 gigabyte. I have seen databases on the order of several hundred gigabytes. The only problem is the initial start-up time for those.

Mnesia is built to handle:

  • Very low latency K/V lookup, not necessarily linearizible.
  • Proper transactions with linearizible changes (C in the CAP theorem). These are allowed to run at a much worse latency as they are expected to be relatively rare.
  • On-line schema change
  • Survival even if nodes fail in a cluster (where cluster is smallish, say 10-50 machines at most)

The design is such that you avoid a separate process since data is in the Erlang system already. You have QLC for datalog-like queries. And you have the ability to store any Erlang term.

Mnesia fares well if the above is what you need. Its limits are:

  • You can't get a machine with more than 2 terabytes of memory. And loading 2 teras from scratch is going to be slow.
  • Since it is a CP system and not an AP system, the loss of nodes requires manual intervention. You may not need transactions as well. You might also want to be able to seamlessly add more nodes to the system and so on. For this, Riak is a better choice.
  • It uses optimistic locking which gives trouble if many processes tries to access the same row in a transaction.

My normal goto-trick is to start out with Mnesia in Erlang-systems and then switch over to another system as the data size grows. If data sizes grows slowly, then you can keep everything in memory in Mnesia and get up and running extremely quickly.