What is the design & architecture behind facebook's status update mechanism?

Ari53nN3o picture Ari53nN3o · Aug 16, 2011 · Viewed 9.2k times · Source

I'm planning on creating a social network and I don't think I quite understand how the status update module of facebook is designed. Hoping I can find some help here. At algorithmic and datastructure level, what is the most efficient way to create a status update mechanism in a social network?

A full table scan for all friends and then sorting their updates is very naive and costly. Do we use some sort of mechanism based on hashing or something else? Please let me know.

P.S: I'm not talking about their EdgeRank algorithm but the basic status update. How do they find and fetch them from the database?

Thanks in advance for the help!

Answer

Nick Zalutskiy picture Nick Zalutskiy · Aug 16, 2011

Here is a great presentation that answers your question. The specific answer comes up at around minute 55:40, but I suggest that you watch the entire presentation to understand how the solution fits into the entire architecture.

In short:

  1. A particular server ("leaf") stores all feed items for a particular user. So data for each of your friends is stored entirely at a specific destination.
  2. When you want to view your news feed, one of the aggregator servers sends request to all the leaf servers for your friends and ranks the results. The aggregator knows which servers to send requests to based on the userid of each friend.

This is terribly simplified, of course. This only works because all of it is memcached, the system is designed to minimize latency, some ranking is done at the leaf server that contains the friend's feed items, etc.

You really don't want to be hitting the database for any of this to work at a reasonable speed. FB use MySql mostly as a key-value store; JOINing tables is just impossible at their scale. Then they put memcache servers in front of the databases and application servers.

Having said that, don't worry about scaling problems until you have them (unless, of course, you are worrying about them for the fun of it.) On day one, scaling is the least of your problems.