Why are key value pair noSQL db's faster than traditional relational DBs

Ankur picture Ankur · Mar 1, 2010 · Viewed 13.3k times · Source

It has been recommended to me that I investigate Key/Value pair data systems to replace a relational database I have been using.

What I am not quite understanding is how this improves efficiency of queries. From what I understand you are going to be throwing away a lot information that would help to make queries more efficient, by simply turning your structure database into one big long list of keys and values?

Have I missed the point completely?

Answer

Xorlev picture Xorlev · Mar 1, 2010

The key advantage of a relational database is the ability to relate and index information. Most 'NoSQL' systems don't provide a relational algebra or a great query language.

What you need to ask yourself is, does switching make sense for my intended use case?

You have kind of missed the point. The point is, you sometimes don't have an index (in the way you do with a general relational DB anyways). Even when you do have an index, the ability to relate it together is difficult and what relational databases excel at. NoSQL solutions have a number of novel structure which make many usecases trivially easy, e.g. Redis is a data-structure oriented DB well-suited to rapidly building anything with queues or its pub-sub architecture. MongoDB is a freeform document database which stores documents as JSON (BSON) and excels at rapid development. BigTable solutions are a little less structured than that, but expand the idea of a row to have families of columns — key value pairs contained in each row arranged efficiently on disk. You can build an inverted index on top of this with a technology like ElasticSearch.

Not everything needs the consistency guarantees or disk layout of a traditional RDBMS. Another major use case of NoSQL is massive scalability, many solutions (e.g. BigTable -- HBase/Cassandra) are designed to shard and scale horizontally easily (not so easy with SQL!). Cassandra in particular is designed for no SPOF. Further, column-oriented datastores are meant to optimize disk speeds via sequential reads (and reduce write-amplification). That being said, unless you really need it, a traditional SQL server is generally good enough.

There's advantages and disadvantages. Personally, I use a mix of both. Use the right tool for the right job, which may end up being PostgreSQL or MySQL more often than not.

You can liken a basic key-value system to making an SQL table with two columns, a unique key and a value. This is quite fast. You have no need to do any relations or correlations or collation of data. Just find the value and return it. This is an oversimplification, NoSQL databases do have a lot of interesting functionality and application beyond simple K,V stores.

I don't know if your scientific data is well suited to most NoSQL implementations, that depends on the data. If you look at HBase or Cassandra, it may well suit a scientist's needs (with proper rowkey design -- timestamp must not be first, check out OpenTSDB). I know of many companies that store sensor readings in Cassandra by using a random-order partitioner and the UUID of the sensor to roll up readings into daily fat rows. Every day new databases are created around specific use cases, so that answer may change. For specific use cases, you can reap huge rewards for using specific datastores at the cost of flexibility and tooling.