What's the best manner of implementing a social activity stream?

mort picture mort · Oct 14, 2008 · Viewed 51.8k times · Source

I'm interested in hearing your opinions in which is the best way of implementing a social activity stream (Facebook is the most famous example). Problems/challenges involved are:

  • Different types of activities (posting, commenting ..)
  • Different types of objects (post, comment, photo ..)
  • 1-n users involved in different roles ("User x replied to User y's comment on User's Z post")
  • Different views of the same activity item ("you commented .." vs. "your friend x commented" vs. "user x commented .." => 3 representations of a "comment" activity)

.. and some more, especially if you take it to a high level of sophistication, as Facebook does, for example, combining several activity items into one ("users x, y and z commented on that photo"

Any thoughts or pointers on patterns, papers, etc on the most flexible, efficient and powerful approaches to implementing such a system, data model, etc. would be appreciated.

Although most of the issues are platform-agnostic, chances are I end up implementing such a system on Ruby on Rails

Answer

heyman picture heyman · Oct 15, 2008

I have created such system and I took this approach:

Database table with the following columns: id, userId, type, data, time.

  • userId is the user who generated the activity
  • type is the type of the activity (i.e. Wrote blog post, added photo, commented on user's photo)
  • data is a serialized object with meta-data for the activity where you can put in whatever you want

This limits the searches/lookups, you can do in the feeds, to users, time and activity types, but in a facebook-type activity feed, this isn't really limiting. And with correct indices on the table the lookups are fast.

With this design you would have to decide what metadata each type of event should require. For example a feed activity for a new photo could look something like this:

{id:1, userId:1, type:PHOTO, time:2008-10-15 12:00:00, data:{photoId:2089, photoName:A trip to the beach}}

You can see that, although the name of the photo most certainly is stored in some other table containing the photos, and I could retrieve the name from there, I will duplicate the name in the metadata field, because you don't want to do any joins on other database tables if you want speed. And in order to display, say 200, different events from 50 different users, you need speed.

Then I have classes that extends a basic FeedActivity class for rendering the different types of activity entries. Grouping of events would be built in the rendering code as well, to keep away complexity from the database.