I need to be able to store small bits of data (approximately 50-75 bytes) for billions of records (~3 billion/month for a year).
The only requirement is fast inserts and fast lookups for all records with the same GUID and the ability to access the data store from .net.
I'm a SQL server guy and I think SQL Server can do this, but with all the talk about BigTable, CouchDB, and other nosql solutions, it's sounding more and more like an alternative to a traditional RDBS may be best due to optimizations for distributed queries and scaling. I tried cassandra and the .net libraries don't currently compile or are all subject to change (along with cassandra itself).
I've looked into many nosql data stores available, but can't find one that meets my needs as a robust production-ready platform.
If you had to store 36 billion small, flat records so that they're accessible from .net, what would choose and why?
Storing ~3.5TB of data and inserting about 1K/sec 24x7, and also querying at a rate not specified, it is possible with SQL Server, but there are more questions:
If you need all these requirements I highlighted, the load you propose is going to cost millions in hardware and licensing on an relational system, any system, no matter what gimmicks you try (sharding, partitioning etc). A nosql system would, by their very definition, not meet all these requirements.
So obviously you have already relaxed some of these requirements. There is a nice visual guide comparing the nosql offerings based on the 'pick 2 out of 3' paradigm at Visual Guide to NoSQL Systems:
After OP comment update
With SQL Server this would e straight forward implementation:
Partitioning and page compression each require an Enterprise Edition SQL Server, they will not work on Standard Edition and both are quite important to meet the requirements.
As a side note, if the records come from a front-end Web servers farm, I would put Express on each web server and instead of INSERT on the back end, I would SEND
the info to the back end, using a local connection/transaction on the Express co-located with the web server. This gives a much much better availability story to the solution.
So this is how I would do it in SQL Server. The good news is that the problems you'll face are well understood and solutions are known. that doesn't necessarily mean this is a better than what you could achieve with Cassandra, BigTable or Dynamo. I'll let someone more knowleageable in things no-sql-ish to argument their case.
Note that I never mentioned the programming model, .Net support and such. I honestly think they're irrelevant in large deployments. They make huge difference in the development process, but once deployed it doesn't matter how fast the development was, if the ORM overhead kills performance :)