What NoSQL database (categories) support versioning?

Sridhar Sarnobat picture Sridhar Sarnobat · Apr 7, 2014 · Viewed 7.1k times · Source

I thought that regardless of whether a NoSQL aggregate store is a key-value, column-family or document database, it would support versioning of values. After a bit of Googling, I'm concluding that this assumption is wrong and that it just depends on the DBMS implementation. Is this true?

I know that Cassandra and BigTable support it (both column-family stores). It SEEMS that Hbase (column family) and Riak (Key-Value) do but Redis and Hadoop (Key-Value) do not. Mongo DB (document) doesCouchbase does but MongoDB does not (document stores). I don't see any pattern here. Is there a rule of thumb? (for example, "key value stores generally do not have versioning, while column-family and document databases do")

What I'm trying to do: I want to create a database of website screenshots from URL to PNG image. I'd rather use a key-value store since, versioning aside, it is the simplest solution that satisfies the problem. But when website changes or is decomissioned and I update my database I don't want to lose old images. Even if I select a key-value database that has versioning, I want to have the luxury to switch to a different key-value database without the constraint that many key-value DBs do not support versioning. So I'm trying to understand at what level of sophistication in the continuum of aggregate NoSQL databases does versioning become a feature implicit to the data model.

Answer

sleeplessnerd picture sleeplessnerd · Oct 27, 2014

You don't really need versioning support from the Key-Value store.

The only thing you really need from the data Store is an efficient scanning/range query feature.

This means the datastore can retrieve entries in lexicographical order.

Most KV-stores do, so this is easy.

This is how you do it:

  1. Create versioned keys.

    In case you cant hash the original name to a fixed length, prepend the length of the original key. then put in the hash of the key or the original key itself, and end with a fixed length encoded version number (so it is lexicographically ordered from high version to low by inverting the number against the max version).

  2. Query

    Do a range query from the maximum possible version up to version 0, but only retrieving exactly one key.

Done

If you dont need explicit versions, you can also use a timestamp, so you can insert without getting the last version.