I have been using HBase for the past six months and I came to know about DynamoDB by Amazon. Maintenance wise dynamo db looks easier to handle since its taken care by Amazon. But whether to switch to dynamo db from hbase is a question to me.
I could not find satisfying reason to switch from hbase to dynamo db except for maintaining the cluster.
Can somebody share the thoughts regarding this.
You have to essentially look for your requirements, DynamoDB provides great scalability and performance with minimal maintenance effort and an attractive financial cost. However, Apache HBase is much more flexible in terms of what you can store (size and data type wise).
Another very important point to evaluate is which data model , Column Wide or Key-Value, fits better your use cases.
Apache HBase gives you the option to have very flexible row key data types, whereas DynamoDB only allows scalar types for the primary key attributes. DynamoDB on the other hand provides very easy creation and maintenance of secondary indexes, something that you have to do manually in Apache HBase.
More information in the link below: http://d0.awsstatic.com/whitepapers/AWS_Comparing_the_Use_of_DynamoDB_and_HBase_for_NoSQL.pdf
Here is a summary of the key points:
In summary, both Amazon DynamoDB and Apache HBase define data models that allow efficient storage of data to optimize query performance. Amazon DynamoDB imposes a restriction on its item size to allow efficient processing and reduce costs.
Apache HBase uses the concept of column families to provide data locality for more efficient read operations.
Amazon DynamoDB supports both scalar and multi-valued sets to accommodate a wide range of unstructured datasets. Similarly, Apache HBase stores its key/value pairs as arbitrary arrays of bytes, giving it the flexibility to store any data type.
Amazon DynamoDB supports built-in secondary indexes and automatically updates and synchronizes all indexes with their parent tables. With Apache HBase, you can implement and manage custom secondary indexes yourself.
From a data model perspective, you can choose Amazon DynamoDB if your item size is relatively small. Although Amazon DynamoDB provides a number of options to overcome row size restrictions, Apache HBase is better equipped to handle large complex payloads with minimal restrictions.
Throughput Model
Although read and write requirements are specified at table creation time, Amazon DynamoDB lets you increase or decrease the provisioned throughput to accommodate load with no downtime.
In Apache HBase, the number of nodes in a cluster can be driven by the required throughput for reads and/or writes.
Consistency Model
Amazon DynamoDB lets you specify the desired consistency characteristics for each read request within an application. You can specify whether a read is eventually consistent or strongly consistent.
The eventual consistency option is the default in Amazon DynamoDB and maximizes the read throughput. However, an eventually consistent read might not always reflect the results of a recently completed write. Consistency across all copies of data is usually eached within a second.
Apache HBase reads and writes are strongly consistent. This means that all reads and writes to a single row in Apache HBase are atomic. Each concurrent reader and writer can make safe assumptions about the state of a row. Multi-versioning and time stamping in Apache HBase contribute to its strongly consistent model.
Transaction Model
Neither Amazon DynamoDB nor Apache HBase support multi-item/cross-row or crosstable transactions due to performance considerations. However, both databases provide batch operations for reading and writing multiple items/rows across multiple tables with no transaction guarantees.
Table Operations
One key difference between the two databases is the flexible provisioned throughput model of Amazon DynamoDB. The ability to dial up capacity when you need it and dial it back down when you are done is useful for processing variable workloads with unpredictable peaks.
For workloads that need high update rates to perform data aggregations or maintain counters, Apache HBase is a good choice. This is because Apache HBase supports a multi-version concurrency control mechanism, which contributes to its strongly consistent reads and writes. Amazon DynamoDB gives you the flexibility to specify whether you want your read request to be eventually consistent or strongly consistent depending on your specific workload. reached within a second.
Source: http://d0.awsstatic.com/whitepapers/AWS_Comparing_the_Use_of_DynamoDB_and_HBase_for_NoSQL.pdf