Memcached, Locking and Race Conditions

Micael picture Micael · Oct 22, 2009 · Viewed 20.2k times · Source

We are trying to update memcached objects when we write to the database to avoid having to read them from database after inserts/updates.

For our forum post object we have a ViewCount field containing the number of times a post is viewed.

We are afraid that we are introducing a race condition by updating the memcached object, as the same post could be viewed at the same time on another server in the farm.

Any idea how to deal with these kind of issues - it would seem that some sort of locking is needed but how to do it reliably across servers in a farm?

Answer

Nathan picture Nathan · Oct 22, 2009

If you're dealing with data that doesn't necessarily need to be updated realtime, and to me the view count is one of them, then you could add an expires field to the objects that are stored in memcache.

Once that expiration happens, it'll go back to the database and read the new value, but until then it will leave it alone.

Of course for new posts you may want this updated more often, but you can code for this.

Memcache only stores one copy of your object in one of its instances, not in many of them, so I wouldn't worry about object locking or anything. That is for the database to handle, not your cache.

Edit:

Memcache offers no guarantee that when you're getting and setting from varied servers that your data won't get clobbered.

From memcache docs:

  • A series of commands is not atomic. If you issue a 'get' against an item, operate on the data, then wish to 'set' it back into memcached, you are not guaranteed to be the only process working on that value. In parallel, you could end up overwriting a value set by something else.

Race conditions and stale data

One thing to keep in mind as you design your application to cache data, is how to deal with race conditions and occasional stale data.

Say you cache the latest five comments for display on a sidebar in your application. You decide that the data only needs to be refreshed once per minute. However, you neglect to remember that this sidebar display is renderred 50 times per second! Thus, once 60 seconds rolls around and the cache expires, suddenly 10+ processes are running the same SQL query to repopulate that cache. Every time the cache expires, a sudden burst of SQL traffic will result.

Worse yet, you have multiple processes updating the same data, and the wrong one ends up dating the cache. Then you have stale, outdated data floating about.

One should be mindful about possible issues in populating or repopulating our cache. Remember that the process of checking memcached, fetching SQL, and storing into memcached, is not atomic at all!