When to use a key-value store for web development?

Jacjoi picture Jacjoi · Aug 4, 2011 · Viewed 10.1k times · Source

When would someone use a key-value (Redis, memcache, etc) store for web development? An actual use case would be most helpful.

My confusion is that a simple database seems so much more functional because, to my understanding, it can do everything a key-value store can do PLUS it also allows you to do filtering/querying. Meaning, to my understanding, you can NOT do filter like:

select * homes where price > 100000

with a key-value store.

Example

Let's pretend that StackOverflow uses a key-value store (memcache, redis, etc).

How would a key-value store help benefit Stackoverflow hosting needs?

Answer

Kevin Cox picture Kevin Cox · Apr 9, 2013

I can't answer the question of when to use a key-value (herein kv) data store but I can show you some of the examples, and answer your stackoverflow example.

With database access, most of what you need is a kv store. For example, a user logs in with the username "joe". So you look up "user:joe" in your database and retrieve his password (hash of course). Or maybe you have his password under "user:pass:joe", it really doesn't matter. If it was stack overflow and you were rendering the page http://stackoverflow.com/questions/6935566/when-to-use-a-key-value-store-for-web-development, you would look up "question:6935566" and use that. It is simple to see how kv stores can solve most of your problems.

I would like to say that a kv store is a subset of functionality provided by a traditional RDMS. This is because the design of the traditional RDMS provides many scaling issues, and generally loses features as you scale. kv stores don't come with these features, so they don't limit you. However, these features can often be created anyways, designed from the core to be scalable (because it becomes immediately obvious if they are not).

However that doesn't mean that there are things that you can't do. For example you mention querying. This is a pitfall of many kv stores, as they are generally agnostic of the value (not always true, example, redis and more) and have no way of finding what you are looking for. Worse, they are not designed to do that quickly, they are just really quick looking up by key.

One solution to this problem is to sort your keys lexicographically and allow range queries. This is essentially "give me everything between question:1 and question:5". Now that example is fairly useless, but there are many uses of range queries.

You said you want all houses more then $100 000. If you wanted to be able to do this you would create an index of houses by price. Say you had the following houses.

house:0 -> {"color":"blue","sold":false,"city":"Stackoverville","price":500000}
house:1 -> {"color":"red","sold":true,"city":"Toronto","price":150000}
house:2 -> {"color":"beige","sold":false,"city":"Toronto","price":40000}
house:3 -> {"color":"blue","sold":false,"city":"The Blogosphere","price":110000}

In SQL you would store each field in a column rather then having it all in one (in this case JSON) document. And could SELECT * FROM houses WHERE price > 100000. This seems all fine and dandy but, if there isn't an index built, this requires looking at every house in your table and checking its price, which if you have a couple million houses, could be slow. So with a kv store you need an index as well. The main difference is that the SQL database would silently do the slow thing, where the kv store wouldn't be able.

If you don't have range queries you would need to stick your index in a single document, which makes safely updating it a pain and means that you would have to download the whole index for every query, again, limiting scalability.

house:index:price -> [{"price":500000,"id":"0"},{"price":150000,"id":"1"},{"price":110000,"id":"3"},{"price":40000,"id":"2"}]

But if you have range queries (often called keyscans) you can create an index like this:

house:index:price:040000 -> 2
house:index:price:110000 -> 3
house:index:price:150000 -> 1
house:index:price:500000 -> 0

And then you could request the keys between house:index:price:100000 and house:index:price:: (the ':' character is the character after '9') and you would get [3,1,0] which is all the houses more expensive than $100 000 (they are also helpfully in order). Another nice thing about this is that they will likely be on one "partition" of your cluster so this query will take about the same time as a singe get (plus the tiny extra transfer overhead) or two gets if your range happens to go over a server boundary (but these can be done in parallel!).

So that shows how to do queries in a kv store. You can query anything that can be ordered as a string (just about anything) and look it up very quickly. If you don't have range queries you will need to store your whole index under one key which sucks, but if you have range queries it is very nice, and very fast. Here is a more complex example.

I want unsold houses in Toronto that are less then $100 000. I simply have to design my index. (I added in a couple of houses to make it more meaningful) At first thought you might just build another index for every property, but you will quickly realize that that means that you have to select every unsold house and download it from the database. (This is what I meant when I said scaling problems are immediately obvious.) The solution is to use a multi-index. Once built you can select exactly the values you want.

house:index:sold:city:price:f~Fooville~000010:5        -> ""
house:index:sold:city:price:f~Toronto~040000:2         -> ""
house:index:sold:city:price:f~Toronto~140000:4         -> ""
house:index:sold:city:price:t~Stackoverville~500000:0  -> ""
house:index:sold:city:price:t~The Blogosphere~110000:3 -> ""
house:index:sold:city:price:t~Toronto~150000:1         -> ""

Now, unlike the last example I put the id in the key. This allows two houses have the same properties. I could have merged them in the value but then adding a removing indexes becomes more difficult. I also chose to separate my data with a ~. This is because it is lexicographically after all of the letters, ensuring that the full name will be sorted and I don't have to pad every city to the same length. In a production system I would probably use the byte 255 or 0.

Now the range house:index:sold:city:price:f~Toronto~100000 - house:index:sold:city:price:f~Toronto~~ will select all houses that match the query. And the important thing to note is that query scales linearly with the number of results. This does mean that you have to build an index for every set of properties that you want to index (although the index in our example also works for sold, and sold-city queries). This may seem like a lot of work but in the end you realize that it is just that you are doing it, not your database. I'm sure we will begin to see libraries for this kind of thing coming out soon :D

After stretching the topic a bit, I have shown:

  • Some uses of a kv store.
  • How to do queries in a kv store.

I think that you will find that kv-stores are enough for many applications and can often provide better performance and availability than traditional RDMS. That being said, every app is different and therefore, it is impossible to answer the original question.