How stable is s3fs to mount an Amazon S3 bucket as a local directory

arod picture arod · May 29, 2012 · Viewed 76.4k times · Source

How stable is s3fs to mount an Amazon S3 bucket as a local directory in linux? Is it recommended/stable for high demand production environments?

Are there any better/similar solutions?

Update: Would it be better to use EBS and to mount it via NFS to all other AMIs?

Answer

reach4thelasers picture reach4thelasers · May 30, 2012

There's a good article on s3fs here, which after reading I resorted to an EBS Share.

It highlights a few important considerations when using s3fs, namely related to the inherent limitations of S3:

  • no file can be over 5GB
  • you can't partially update a file so changing a single byte will re-upload the entire file.
  • operation on many small files are very efficient (each is a separate S3 object after all) but large files are very inefficient
  • Though S3 supports partial/chunked downloads, s3fs doesn't take advantage of this so if you want to read just one byte of a 1GB file, you'll have to download the entire GB.

It therefore depends on what you are storing whether s3fs is a feasible option. If you're storing say, photos, where you want to write an entire file or read an entire file never incrementally change a file, then its fine, although one may ask, if you're doing this, then why not just use S3's API Directly?

If you're talking about appliation data, (say database files, logging files) where you want to make small incremental change then its a definite no - S3 Just doesn't work that way you can't incrementally change a file.

The article mentioned above does talk about a similar application - s3backer - which gets around the performance issues by implementing a virtual filesystem over S3. This gets around the performance issues but itself has a few issues of its own:

  • High risk for data corruption, due to the delayed writes
  • too small block sizes (e.g., the 4K default) can add significant extra costs (e.g., $130 for 50GB with 4K blocks worth of storage)
  • too large block sizes can add significant data transfer and storage fees.
  • memory usage can be prohibitive: by default it caches 1000 blocks.
    With the default 4K block size that's not an issue but most users
    will probably want to increase block size.

I resorted to EBS Mounted Drived shared from an EC2 instance. But you should know that although the most performant option it has one big problem An EBS Mounted NFS Share has its own problems - a single point of failure; if the machine that's sharing the EBS Volume goes down then you lose access on all machines which access the share.

This is a risk I was able to live with and was the option I chose in the end. I hope this helps.