AWS: Ways of keeping cost down while backing up S3 files to Glacier?

I Z picture I Z · Mar 5, 2013 · Viewed 9.5k times · Source

As part of our project, we have created quite a bushy folder/file tree on S3 with all the files taking up about 6TB of data. We currently have no backup of this data which is bad. We want to do periodic back ups. Seems like Glacier is the way to go.

The question is: what are the ways to keep the total cost of a back up down?

Most of our files are text so we can compresses them and upload whole ZIP archives. This will require processing (on EC2) so I am curious whether there is any rule of thumb to compare extra cost of running an EC2 instance for zipping versus just uploading uncompressed files.

Also, we would have to pay for data transfer so I am wondering if there is any way of backing up other than (i) download file from S3 to an instance; (ii) upload file in its raw form or zipped up to Glacier.

Answer

Eric Hammond picture Eric Hammond · Mar 5, 2013

I generally think of Glacier as an alternative storage to S3, not an additional storage. I.e., data would most often be stored either in S3 or Glacier, but rarely both.

If you trust S3's advertised eleven nines of durability, then you're not backing up because S3 itself is likely to lose the data.

You might want to back up the data because (like I do) you see your Amazon account as a single point of failure (e.g., credentials are compromised or Amazon blocks your account because they believe you are doing something abusive). However, in that case, Glacier is not a sufficient backup as it still falls under the Amazon umbrella.

I recommend backing up S3 data outside of Amazon if you are concerned about losing the data in S3 due to user error, compromised credentials, and the like.

I recommend using Glacier as a place to archive data for long term, cheap storage when you know you're not going to need to access it much, if ever. When things are transitioned to Glacier, you would then delete them from S3.

Amazon provides automatic archival from S3 to Glacier which works great, but beware of the extra costs if the average size of your files is small. Here's an article I wrote on that danger:

Cost of Transitioning S3 Objects to Glacier
http://alestic.com/2012/12/s3-glacier-costs

If you still want to copy from S3 to Glacier, here are some points related to your questions:

  • You will presumably leave the data in Glacier a long time, so compressing it is probably worth the short term CPU usage. The exact trade off depends on factors like the compressibility of your data, how long it takes to compress, and how often you need to perform the compression.

  • There is no charge for downloading data from S3 to an EC2 instance. There is no data transfer charge for uploading data into Glacier.

  • If you upload many small files to Glacier, the upload per item charges can add up. You can save on cost by combining many small files into an archive and uploading it.

Another S3 feature that can help protect against accidental loss through user error or attacks is to turn on S3 versioning and enable MFA (multi-factor authentication). This prevents anybody from being able to permanently delete objects unless they have the credentials plus a physical device in your possession.