I am considering a Proof of concept for handling large volumes of data like > 10 G which requires atleast 200+ writes per second and about 50+ reads per second of spatial related data. This is a growing system as well. Currently I am considering moving this big volume data into a NoSql big table kind of db for performance reasons.
I have considered and taken some closer look at MongoDB and cassandra. As far as my reading goes,
Mongodb: - seems to have a writer lock problem - one of the posts in stackoverflow suggested this db if there is no need for multiple servers - indexes kept on memory. So the bigger the index growth, the performance is said to deteriorate - advantage is Mongodb has direct support for spatial data & indexing along with features like finding nearby locations etc., - I see this post Cassandra Or MongoDB For Our Location Based Application suggesting mongodb as the best choice
Cassandra:
- Seems to be the best of among the related dbs
- Seems to have great write as well as read performance
- Does not natively support spatial indexing but this can be extended via geohashing
My heart actually goes out for mongodb because of its good documentation and direct support for spatial data. Has any body had bad experience using mongodb for such big systems? I actually see lot of posts on mongodb iostat for performance.
If mongodb is not suited, can someone give some pointers on geohashing using cassandra? I saw the link http://code.google.com/p/geospatialweb/ for creating the hashes. But there are questions on how to query etc.?
I realize this is an older question and I know that it doesn't directly answer your question, but depending on your queries, Cassandra may not be the best option, And getting your queries to work with indexing in MongoDB can be problematic as well (in my own experience). Mongo has a slight edge over Cassandra for heavy geo data and queries imho.
I'd suggest also consider looking into ElasticSearch, which depending on your data shape and the types of queries you'll be making is probably the best solution. When you posted your question it was likely less of an option than today though.