What is a distributed cache?

Joey picture Joey · Mar 17, 2013 · Viewed 38.9k times · Source

I am confused about the concept of Distributed Cache. I kinda know what it is from google search. A distributed cache may span multiple servers so that it can grow in size and in transactional capacity. However, I do not really understand how it works or how it distribute the data.

For example, let's say we have Data 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 and 2 cache servers A and B. If we use distributed cache, then one of possible solution is that Data 1, 3, 5, 7, 9 are stored in Cache Server A, and 2, 4, 6, 8, 10 are stored in cache Server B.

So is this correct or did I misunderstand it?

Second question is that I usually heard the word server node. What is it? In the above example, Server A is a server node, right?

Third question, if a server (let's say Server A) goes down, what can we do about that? I mean if my example above is correct, we cannot get the data 1, 3, 5, 7, 9 from cache when Server A is down, then what could Cache Server do in this case?

Answer

nirvana picture nirvana · Mar 17, 2013
  1. Yes, half the data on server a, and half on server b would be a distributed cache. There are many methods of distributing the data, though some sort of hashing of the keys seems to be most popular.

  2. The terms server and node are generally interchangeable. A node is generally a single unit of some collection, often called a cluster. A server is generally a single piece of hardware. In erlang, you can run multiple instances of the erlang runtime on a single server, and thus you'd have multiple erlang nodes... but generally you'd want to have one node per server for more optimum scheduling. (For non-distributed languages and platforms you have to manage your processes based on your needs.)

  3. If a server goes down, and it is a cache server, then the data would have to come from its original source. EG: A cache is usually a memory based database designed for quick retrieval. The data in the cache sticks around only so long as its being used regularly, and eventually will be purged. But for distributed systems where you need persistence, a common technique is to have multiple copies. EG: you have servers A, B, C, D, E, and F. For data 1, you would put it on A, and then a copy on B and C. Couchbase and Riak do this. For data 2, it could be on B, and then copies on C and D. This way if any one server goes down you still have two copies.