What scalability problems have you encountered using a NoSQL data store?

knorv picture knorv · Feb 17, 2010 · Viewed 17.9k times · Source

NoSQL refers to non-relational data stores that break with the history of relational databases and ACID guarantees. Popular open source NoSQL data stores include:

  • Cassandra (tabular, written in Java, used by Cisco, WebEx, Digg, Facebook, IBM, Mahalo, Rackspace, Reddit and Twitter)
  • CouchDB (document, written in Erlang, used by BBC and Engine Yard)
  • Dynomite (key-value, written in Erlang, used by Powerset)
  • HBase (key-value, written in Java, used by Bing)
  • Hypertable (tabular, written in C++, used by Baidu)
  • Kai (key-value, written in Erlang)
  • MemcacheDB (key-value, written in C, used by Reddit)
  • MongoDB (document, written in C++, used by Electronic Arts, Github, NY Times and Sourceforge)
  • Neo4j (graph, written in Java, used by some Swedish universities)
  • Project Voldemort (key-value, written in Java, used by LinkedIn)
  • Redis (key-value, written in C, used by Craigslist, Engine Yard and Github)
  • Riak (key-value, written in Erlang, used by Comcast and Mochi Media)
  • Ringo (key-value, written in Erlang, used by Nokia)
  • Scalaris (key-value, written in Erlang, used by OnScale)
  • Terrastore (document, written in Java)
  • ThruDB (document, written in C++, used by JunkDepot.com)
  • Tokyo Cabinet/Tokyo Tyrant (key-value, written in C, used by Mixi.jp (Japanese social networking site))

I'd like to know about specific problems you - the SO reader - have solved using data stores and what NoSQL data store you used.

Questions:

  • What scalability problems have you used NoSQL data stores to solve?
  • What NoSQL data store did you use?
  • What database did you use prior to switching to a NoSQL data store?

I'm looking for first-hand experiences, so please do not answer unless you have that.

Answer

Brian picture Brian · Feb 23, 2010

My current project actually.

Storing 18,000 objects in a normalised structure: 90,000 rows across 8 different tables. Took 1 minute to retrieve and map them to our Java object model, that's with everything correctly indexed etc.

Storing them as key/value pairs using a lightweight text representation: 1 table, 18,000 rows, 3 seconds to retrieve them all and reconstruct the Java objects.

In business terms: first option was not feasible. Second option means our app works.

Technology details: running on MySQL for both SQL and NoSQL! Sticking with MySQL for good transaction support, performance, and proven track record for not corrupting data, scaling fairly well, support for clustering etc.

Our data model in MySQL is now just key fields (integers) and the big "value" field: just a big TEXT field basically.

We did not go with any of the new players (CouchDB, Cassandra, MongoDB, etc) because although they each offer great features/performance in their own right, there were always drawbacks for our circumstances (e.g. missing/immature Java support).

Extra benefit of (ab)using MySQL - the bits of our model that do work relationally can be easily linked to our key/value store data.

Update: here's an example of how we represented text content, not our actual business domain (we don't work with "products") as my boss'd shoot me, but conveys the idea, including the recursive aspect (one entity, here a product, "containing" others). Hopefully it's clear how in a normalised structure this could be quite a few tables, e.g. joining a product to its range of flavours, which other products are contained, etc

Name=An Example Product
Type=CategoryAProduct
Colour=Blue
Size=Large
Flavours={nice,lovely,unpleasant,foul}
Contains=[
Name=Product2
Type=CategoryBProduct
Size=medium
Flavours={yuck}
------
Name=Product3
Type=CategoryCProduct
Size=Small
Flavours={sublime}
]