What's the point of using Amazon SimpleDB?

Otigo picture Otigo · Mar 15, 2009 · Viewed 15.7k times · Source

I thought that I could use SimpleDB to take care of the most challenging area of my application (as far as scaling goes) - twitter-like comments, but with location on top - till the point when I sat down to actually start implementing it with SDB.

First thing, SDB has a 1000 bytes limitation per attribute value, which is not enough even for comments (probably need to break down longer values into multiple attributes).

Then, maximum domain size is 10GB. The promise was that you could scale up without worrying about database sharding etc., since SDB will not degrade with increasing loads of data. But if I understand correctly, with domains I would have exactly the same problem as with sharding, ie. at some point need to implement data records' distribution and queries across domains on application level.

Even for the simplest objects that I have in the whole application, ie. atomic user ratings, SDB is not an option, because it cannot calculate an average within the query (everything is string based). So to calculate average user rating for an object, I would have to load all records - 250 at a time - and calculate it on application level.

Am I missing something about SDB? Is 10GB really that much of a database to get over all SDB limitations? I was honestly enthusiastic about taking advantage of SDB, since I use S3 and EC2 already, but now I simply don't see a use case.

Answer

marcc picture marcc · May 5, 2009

I use SDB on a couple of large-ish applications. The 10 GB limit per domain does worry me, but we are gambling on Amazon allowing this to be extended if we need it. They have a request form on their site if you want more space.

As far as cross domain joins, don't think of SDB as a traditional database. During the migration of my data to SDB, I had to denormalize some of it so I could manually do the cross domain joins.

The 1000 byte per attribute limitation was tough to work around also. One of the applications I have is a blog service which stores posts and comments in the database. While porting it over to SDB, I ran into this limitation. I ended up storing the posts and comments as files in S3, and read that in my code. Since this server is on EC2, the traffic to S3 isn't costing anything extra.

Perhaps one of the other problems to watch out for is the eventual consistency model on SDB. You can't write data and then read it back with any guarantee that the newly written data will be returned to you. Eventually the data will be updated.

All of this said, I still love SDB. I don't regret switching to it. I moved from a SQL 2005 server. I think I had a lot more control with SQL, but once giving up that control, I have more flexibility. Not needing to pre-define the schema is awesome. With a strong and robust caching layer in your code, it's easy to make SDB more flexible.