I recently refactored some of my code to stuff rows into a db using 'load data' and it works great -- however for each record I have I must upload 2 files to s3 -- this totally destroys the magnificent speed upgrade that I was obtaining. Whereas I was able to process 600+ of these documents/second they are now trickling in at 1/second because of s3.
What are your workarounds for this? Looking at the API I see that it is mostly RESTful so I'm not sure what to do -- maybe I should just stick all this into the database. The text files are usually no more than 1.5k. (the other file we stuff in there is an xml representation of the text)
I already cache these files in HTTP requests to my web server as they are used quite a lot.
btw: our current implementation uses java; I have not yet tried threads but that might be an option
Recommendations?
You can use the [putObjects
][1] function of JetS3t to upload multiple files at once.
Alternatively you could use a background thread to upload to S3 from a queue, and add files to the queue from your code that loads the data into the database.
[1]: http://jets3t.s3.amazonaws.com/api/org/jets3t/service/multithread/S3ServiceMulti.html#putObjects(org.jets3t.service.model.S3Bucket, org.jets3t.service.model.S3Object[])