I'm planning on creating a social network and I don't think I quite understand how the status update module of facebook is designed. Hoping I can find some help here. At algorithmic and datastructure level, what is the most efficient way to create a status update mechanism in a social network?
A full table scan for all friends and then sorting their updates is very naive and costly. Do we use some sort of mechanism based on hashing or something else? Please let me know.
P.S: I'm not talking about their EdgeRank algorithm but the basic status update. How do they find and fetch them from the database?
Thanks in advance for the help!
Here is a great presentation that answers your question. The specific answer comes up at around minute 55:40, but I suggest that you watch the entire presentation to understand how the solution fits into the entire architecture.
In short:
This is terribly simplified, of course. This only works because all of it is memcached, the system is designed to minimize latency, some ranking is done at the leaf server that contains the friend's feed items, etc.
You really don't want to be hitting the database for any of this to work at a reasonable speed. FB use MySql mostly as a key-value store; JOINing tables is just impossible at their scale. Then they put memcache servers in front of the databases and application servers.
Having said that, don't worry about scaling problems until you have them (unless, of course, you are worrying about them for the fun of it.) On day one, scaling is the least of your problems.