I did some research on this, but wasn't able to find any substantial answers, so turning to StackOverflow.
How reliable is Amazon's S3 in terms of high-availability and reliability? I realize there are SLAs for it, but what about if a availability zone (AZ) or entire region in AWS goes down?
I checked up Amazon's docs on how S3 is set up. When you try to create a bucket, it says: "When creating a bucket, you can choose a Region to optimize for latency, minimize costs, or address regulatory requirements."
Amazon also says this (source): "Data stored in any given Amazon S3 bucket is replicated across multiple datacenters in a geographical region."
So it does look like S3 data is spread across multiple AZs, but within a region.
What if a region goes down (this has happened before)? Is S3 unavailable then? If so, S3 is not a reliable backup mechanism for restoration when a AWS region goes down, is it?
S3 offers "4 nines" of availability, or 99.99%.
For backups, you would be looking for Durability (chance that a stored object is lost). On that account, S3 offers "11 9's", or 99.9999999%
Here's a blurb from the S3 FAQ:
Data Durability and Reliability
Amazon S3 provides a highly durable storage infrastructure designed for mission-critical and primary data storage. Objects are redundantly stored on multiple devices across multiple facilities in an Amazon S3 Region. To help ensure durability, Amazon S3 PUT and COPY operations synchronously store your data across multiple facilities before returning SUCCESS. Once stored, Amazon S3 maintains the durability of your objects by quickly detecting and repairing any lost redundancy. Amazon S3 also regularly verifies the integrity of data stored using checksums. If corruption is detected, it is repaired using redundant data. In addition, Amazon S3 calculates checksums on all network traffic to detect corruption of data packets when storing or retrieving data.
Amazon S3’s standard storage is:
- Backed with the Amazon S3 Service Level Agreement.
- Designed to provide 99.999999999% durability and 99.99% availability of objects over a given year.
- Designed to sustain the concurrent loss of data in two facilities.
As for the Regions, you would have to implement a DIY replication strategy if you truly wanted cross-region failover. An no, an entire region has not failed yet, but I guess there's a first for everything.
Here's some more info on the topic:
Q: How durable is Amazon S3?
Amazon S3 is designed to provide 99.999999999% durability of objects over a given year. This durability level corresponds to an average annual expected loss of 0.000000001% of objects. For example, if you store 10,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000,000 years. In addition, Amazon S3 is designed to sustain the concurrent loss of data in two facilities.
Q: How is Amazon S3 designed to achieve 99.999999999% durability?
Amazon S3 redundantly stores your objects on multiple devices across multiple facilities in an Amazon S3 Region. The service is designed to sustain concurrent device failures by quickly detecting and repairing any lost redundancy. When processing a request to store data, the service will redundantly store your object across multiple facilities before returning SUCCESS. Amazon S3 also regularly verifies the integrity of your data using checksums.