I am currently working in a few projects with MongoDB and Apache Cassandra respectively. I am also using Solr a lot and I am handling "lots" of data with them (approx. 1-2TB). I've heard of Greenplum and Vertica the first time in the last week and I am not really sure, where to put them in my brain. They seem to me like Dataware House (DWH) solutions and I haven't really worked DWH. And they seem to cost lots of money (e.g. $60k for 1TB storage in Greenplum). I am currently not handling Petabyte of data and won't do so I think, but products like cassandra seem also to be able to handle this
Cassandra is the acknowledged NoSQL leader when it comes to comfortably scaling to terabytes or petabytes of data.
So my question: Why should people use Greenplum & Co? Is there a huge advantage in comparison to these other products?
Thanks.
Cassandra, Greenplum and Vertica all handle huge amounts of data but in very different ways.
Some made up usecases where each database has its strengths:
Use cassandra for:
tweets.insert(key:user, data:blob);
tweets.get(key:user)
Use greenplum for:
begin;
update account set balance = balance - 10 where account_id = 1;
update account set balance = balance + 10 where account_id = 2;
commit;
Use Vertica for:
select sum(balance)
over (partition by region order by account rows unbounded preceding)
from transactions;