What is maximum Amazon S3 replication time on file upload?

amazon-web-services amazon-s3 replication eventual-consistency

Alex Shesterov · May 21, 2014 · Viewed 11.1k times · Source

Background

We use Amazon S3 in our project as a storage for files uploaded by clients.

For technical reasons, we upload a file to S3 with a temporary name, then process its contents and rename the file after it has been processed.

Problem

The 'rename' operation fails time after time with 404 (key not found) error, although the file being renamed had been uploaded successfully.

Amazon docs mention this problem:

Amazon S3 achieves high availability by replicating data across multiple servers within Amazon's data centers. If a PUT request is successful, your data is safely stored. However, information about the changes must replicate across Amazon S3, which can take some time, and so you might observe the following behaviors:

We implemented a kind of polling as workaround: retry the 'rename' operation until it succeeds.
The polling stops after 20 seconds.

This workaround works in most cases: the file gets replicated within few seconds.
But sometimes — very rarely — 20 seconds are not enough; the replication in S3 takes more time.

Questions

What is the maximum time you observed between a successful PUT operation and complete replication on Amazon S3?
Does Amazon S3 offer a way to 'bypass' replication? (Query 'master' directly?)

Answer

Update: this answer uses some older terminology, which i have left in place, for the most part. AWS has changed the friendly name of "US-Standard" to be more consistent with the naming of other regions, but its regional endpoint for IPv4 still has the unusual name s3-external-1.amazonaws.com.

The us-east-1 region of S3 has an IPv4/IPv6 "dual stack" endpoint that follows the standard convention of s3.dualstack.us-east-1.amazonaws.com and if you are IPv6 enabled, this endpoint seems operationally-equivalent to s3-external-1 as discussed below.

The documented references to geographic routing of requests for this region seem to have largely disappeared, without much comment, but anecdotal evidence suggests that the following information is still relevant to that region.

Q. Wasn’t there a US Standard region?

We renamed the US Standard Region to US East (Northern Virginia) Region to be consistent with AWS regional naming conventions.

— https://aws.amazon.com/s3/faqs/#regions

Buckets using the S3 Transfer Acceleration feature use a global-style endpoint of ${bucketname}.s3-accelerate.amazonaws.com and it is not yet evident how this endpoint behaves with regard to us-east-1 buckets and eventual consistency, though it stands to reason that other regions should not be affected by this feature, if enabled. This feature improves transfer throughput for users who are more distant from the bucket by routing requests to the same S3 endpoints but proxying through the AWS "Edge Network," the same system that powers CloudFront. It is, essentially, a self-configuring path through CloudFront but without caching enabled. The acceleration comes from optimized network stacks and keeping the traffic on the managed AWS network for much of its path across the Internet. As such, this feature should have no impact on consistency, if you enable and use it on a bucket... but, as I mentioned, how it interacts with us-east-1 buckets is not yet known.

The US-Standard (us-east-1) region is the oldest, and presumably largest, region of S3, and does play by some different rules than the other, newer regions.

An important and relevant difference is the consistency model.

Amazon S3 buckets in [all regions except US Standard] provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES. Amazon S3 buckets in the US Standard region provide eventual consistency.

http://aws.amazon.com/s3/faqs/

This is why I assumed you were using US Standard. The behavior you described is consistent with that design constraint.

You should be able to verify that this doesn't happen with a test bucket in another region... but, because data transfer from EC2 to S3 within the same region is free and very low latency, using a bucket in a different region may not be practical.

There is another option that is worth trying, has to do with the inner-workings of US-Standard.

US Standard is in fact geographically-distributed between Virginia and Oregon, and requests to "s3.amazonaws.com" are selectively routed via DNS to one location or another. This routing is largely a black box, but Amazon has exposed a workaround.

You can force your requests to be routed only to Northern Virginia by changing your endpoint from "s3.amazonaws.com" to "s3-external-1.amazonaws.com" ...

http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

... this is speculation on my part, but your issue may be exacerbated by geographic routing of your requests, and forcing them to "s3-external-1" (which, to be clear, is still US-Standard), might improve or eliminate your issue.

Update: The advice above has officially risen above speculation, but I'll leave it for historical reference. About a year I wrote the above, Amazon indeed announced that US-Standard does offer read-after-write consistency on new object creation, but only when the s3-external-1 endpoint is used. They explain it as though it's a new behavior, and that may be the case... but it also may simply be a change in the behavior the platform officially supports. Either way:

Starting [2015-06-19], the US Standard Region now supports read-after-write consistency for new objects added to Amazon S3 using the Northern Virginia endpoint (s3-external-1.amazonaws.com). With this change, all Amazon S3 Regions now support read-after-write consistency. Read-after-write consistency allows you to retrieve objects immediately after creation in Amazon S3. Prior to this change, Amazon S3 buckets in the US Standard Region provided eventual consistency for newly created objects, which meant that some small set of objects might not have been available to read immediately after new object upload. These occasional delays could complicate data processing workflows where applications need to read objects immediately after creating the objects. Please note that in US Standard Region, this consistency change applies to the Northern Virginia endpoint (s3-external-1.amazonaws.com). Customers using the global endpoint (s3.amazonaws.com) should switch to using the Northern Virginia endpoint (s3-external-1.amazonaws.com) in order to leverage the benefits of this read-after-write consistency in the US Standard Region. _{[emphasis added]}

https://forums.aws.amazon.com/ann.jspa?annID=3112

If you are uploading a large number of files (hundreds per second), you might also be overwhelming S3's sharding mechanism. For very high numbers of uploads per second, it's important that your keys ("filenames") not be lexically sequential.

Depending on how Amazon handles DNS, you may also want to try another alternate variant of addressing your bucket if your code can handle it.

Buckets in US-Standard can be addressed either with http://mybucket.s3.amazonaws.com/key ... or http://s3.amazonaws.com/mybucket/key ... and the internal implementation of these two could, at least in theory, be different in a way that changes the behavior in a way that would be relevant to your issue.

What is maximum Amazon S3 replication time on file upload?

Background

Problem

Questions

Answer

Related questions