Stream uploading file to S3 on Node.js using formidable and (knox or aws-sdk)

jbmusso picture jbmusso · Jun 26, 2013 · Viewed 15.3k times · Source

I'm trying to stream upload a file submitted via a form directly to an Amazon S3 bucket, using aws-sdk or knox. Form handling is done with formidable.

My question is: how do I properly use formidable with aws-sdk (or knox) using each of these libraries' latest features for handling streams?

I'm aware that this topic has already been asked here in different flavors, ie:

However, I believe the answers are a bit outdated and/or off topic (ie. CORS support, which I don't wish to use for now for various reasons) and/or, most importantly, make no reference to the latest features from either aws-sdk (see: https://github.com/aws/aws-sdk-js/issues/13#issuecomment-16085442) or knox (notably putStream() or its readableStream.pipe(req) variant, both explained in the doc).

After hours of struggling, I came to the conclusion that I needed some help (disclaimer: I'm quite a newbie with streams).

HTML form:

<form action="/uploadPicture" method="post" enctype="multipart/form-data">
  <input name="picture" type="file" accept="image/*">
  <input type="submit">
</form>

Express bodyParser middleware is configured this way:

app.use(express.bodyParser({defer: true}))

POST request handler:

uploadPicture = (req, res, next) ->
  form = new formidable.IncomingForm()
  form.parse(req)

  form.onPart = (part) ->
    if not part.filename
      # Let formidable handle all non-file parts (fields)
      form.handlePart(part)
    else
      handlePart(part, form.bytesExpected)

  handlePart = (part, fileSize) ->
    # aws-sdk version
    params =
      Bucket: "mybucket"
      Key: part.filename
      ContentLength: fileSize
      Body: part # passing stream object as body parameter

    awsS3client.putObject(params, (err, data) ->
      if err
        console.log err
      else
        console.log data
    )

However, I'm getting the following error:

{ [RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.]

message: 'Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.', code: 'RequestTimeout', name: 'RequestTimeout', statusCode: 400, retryable: false }

A knox version of handlePart() function tailored this way also miserably fails:

handlePart = (part, fileSize) ->
  headers =
    "Content-Length": fileSize
    "Content-Type": part.mime
  knoxS3client.putStream(part, part.filename, headers, (err, res) ->
    if err
      console.log err
    else
      console.log res
  )      

I also get a big res object with a 400 statusCode somewhere.

Region is configured to eu-west-1 in both case.

Additional notes:

node 0.10.12

latest formidable from npm (1.0.14)

latest aws-sdk from npm (1.3.1)

latest knox from npm (0.8.3)

Answer

jbmusso picture jbmusso · Jun 26, 2013

Well, according to the creator of Formidable, direct streaming to Amazon S3 is impossible :

The S3 API requires you to provide the size of new files when creating them. This information is not available for multipart/form-data files until they have been fully received. This means streaming is impossible.

Indeed, form.bytesExpected refers to the size of the whole form, and not the size of the single file.

The data must therefore either hit the memory or the disk on the server first before being uploaded to S3.