Upload large files to S3 with resume support

style-sheets picture style-sheets · Apr 12, 2012 · Viewed 17.2k times · Source

(I'm new to Amazon AWS/S3, so please bear with me)

My ultimate goal is to allow my users to upload files to S3 using their web browser, my requirements are:

  1. I must handle large files (2GB+)
  2. I must support pause/resume with progress indicator
  3. (Optional but desirable!) Ability to resume upload if connection temporarily drops out

My two-part question is:

  • I've read about the S3 multipart upload but it's not clear how can I implement the pause/resume for webbrowser-based uploads.

Is it even possible to do this for large files? If so how?

  • Should I upload files to EC2 then move them to S3 once I'm done? Can I (securely) upload files directly to S3 instead of using a temp. webserver?

If it's possible to upload directly to S3, how can I handle pause/resume?

PS. I'm using PHP 5.2+

Answer

Steffen Opel picture Steffen Opel · Apr 12, 2012

Update 20150527

The meanwhile available AWS SDK for JavaScript (in the Browser) supports Amazon S3, including a class ManagedUpload to support the multipart upload aspects of the use case at hand (see preceding update for more on this). It might now be the best solution for your scenario accordingly, see e.g. Uploading a local file using the File API for a concise example that uses the HTML5 File API in turn - the introductory blog post Announcing the Amazon S3 Managed Uploader in the AWS SDK for JavaScript provides more details about this SDK feature.

Update 20120412

My initial answer apparently missed the main point, so to clarify:

If you want to do browser based upload via simple HTML forms, you are constrained to using the POST Object operation, which adds an object to a specified bucket using HTML forms:

POST is an alternate form of PUT that enables browser-based uploads as a way of putting objects in buckets. Parameters that are passed to PUT via HTTP Headers are instead passed as form fields to POST in the multipart/form-data encoded message body. [...]

The upload is handled in a single operation here, thus doesn't support pause/resume and limits you to the original maximum object size of 5 gigabytes (GB) or less.

You can only overcome both limitations by Using the REST API for Multipart Upload instead, which is in turn used by SDKs like the AWS SDK for PHP to implement this functionality.

This obviously requires a server (e.g. on EC2) to handle the operation initiated via the browser (which allows you to facilitate S3 Bucket Policies and/or IAM Policies for access control easily as well).

The one alternative might be using a JavaScript library and performing this client side, see e.g. jQuery Upload Progress and AJAX file upload for an initial pointer. Unfortunately there is no canonical JavaScript SDK for AWS available (aws-lib surprisingly doesn't even support S3 yet) - apparently some forks of knox have added multipart upload, see e.g. slakis's fork, I haven't used either of these for the use case at hand though.


Initial Answer

If it's possible to upload [large files] directly to S3, how can I handle pause/resume?

The AWS SDK for PHP supports uploading large files to Amazon S3 by means of the Low-Level PHP API for Multipart Upload:

The AWS SDK for PHP exposes a low-level API that closely resembles the Amazon S3 REST API for multipart upload (see Using the REST API for Multipart Upload ). Use the low-level API when you need to pause and resume multipart uploads, vary part sizes during the upload, or do not know the size of the data in advance. Use the high-level API (see Using the High-Level PHP API for Multipart Upload) whenever you don't have these requirements. [emphasis mine]

Amazon S3 can handle objects from 1 byte all the way to 5 terabytes (TB), see the respective introductory post Amazon S3 - Object Size Limit Now 5 TB:

[...] Now customers can store extremely large files as single objects, which greatly simplifies their storage experience. Amazon S3 does the bookkeeping behind the scenes for our customers, so you can now GET that large object just like you would any other Amazon S3 object.

In order to store larger objects you would use the new Multipart Upload API that I blogged about last month to upload the object in parts. [...]