Achievements / Badges system

bluedaniel picture bluedaniel · Nov 16, 2009 · Viewed 7.5k times · Source

I have been browsing this site for the answer but I'm still a little unsure how to plan a similar system in its database structure and implementation.

In PHP and MySQL it would be clear that some achievements are earned immediately (when a specialized action is taken, in SO case: Filled out all profile fields), although I know SO updates and assigns badges after a certain amount of time. With so many users & badges wouldn't this create performance problems (in terms of scale: high number of both users & badges).

So the database structure I assume would something as simple as:

Badges     |    Badges_User      |    User
----------------------------------------------
bd_id      |    bd_id            |  user_id
bd_name    |    user_id          |  etc
bd_desc    |    assigned(bool)   |  
           |    assigned_at      |

But as some people have said it would be better to have an incremental style approach so a user who has 1,000,000 forum posts wont slow any function down.

Would it then be another table for badges that could be incremental or just a 'progress' field in the badges_user table above?

Thanks for reading and please focus on the scalability of the desired system (like SO thousands of users and 20 to 40 badges).

EDIT: to some iron out some confusion I had assigned_at as a Date/Time, the criteria for awarding the badge would be best placed inside prepared queries/functions for each badge wouldn't it? (better flexibility)

Answer

just somebody picture just somebody · Nov 16, 2009

regarding the sketch you included: get rid of the boolean column on badges_user. it makes no sense there: that relation is defined in terms of the predicate "user user_id earned the badge bd_id at assigned_at".

as for your overall question: define the schema to be relational without regard for speed first (that'll get you rid of half of potential perf. problems, possibly in exchange for different perf. problems), index it properly (what's proper depends on the query patterns), then if it's slow, derive a (still relational) design from that that's faster. like you may need to have some aggregates precomputed, etc.