So many options and so little time to test them all... I wonder if someone has experiences with distributed file systems for video streaming and storage/encoding.
I have a lot of huge video files (50GB to 250GB) that I need to store somewhere, be able to encode them to mp4 and stream them from several Adobe FMS servers. The only way to handle all this is with a distributed file system but now the question is which one??
My research so far tells me:
- Lustre: mature proven solution, used by a lot of big companies, best with >10G files is a kernel driver.
- Gluster: new, less mature, FUSE based that means easy to install but maybe slower due to FUSE overhead. Better to handle a large number of smaller files ~1GB
- MogileFS: seems to be only for small files ~MB, uses HTTP for access?? possible FUSE binding in the future.
So far Lustre seems the winner but I would like to hear real experiences for the particular application I have.
Also Hadoop, Redhat GFS, Coda and Windows DFS sound as options so any experiences are welcome. If someone has benchmarks please share.
After some real experience this is what I have learned:
- Luster:
- Performance: Amazingly fast! I can assert that Lustre can serve a lot of streams
and that encoding speed is not affected by accessing files via Lustre.
- POXIS compatibility: Very good!. No need to modify applications to use luster.
- Replication, Load Balancing and Fail Over: Very bad!. For replication load
balancing we and fail over we need to rely on other software such as virtual IPs
and DRDB.
- Installation: The worst!. Impossible to install by mere mortals. Requires a very
specific combination of kernel, lustre patches and tweaks to get it working. And
current luster patches usually work with old kernels that are incompatible with
new hardware/software.
- MogileFS:
- Performance: Good for small files but not usable for medium to large files. This is
mostly due to HTTP overhead since all files are send/receive via HTTP requests that
encode all data in base64 adding a 33% overhead to each file.
- POXIX compatibility is non existent. All applications require to be modified to use
mogilefs that renders it useless for streaming/encoding since most streaming servers
and encoding tools do not understand MogileFS protocol.
- Replication and failover out of the box and load balancing can be implemented in the
application by accessing more than one tracker at a time.
- Installation is relatively easy and ready to use packages exist in most distributions.
The only difficulty I found was setting the database master-slave to eliminate the
single point of failure.
- Performance: Very bad for streaming. I cannot reach more than a few Mbps in a 10Gbps
network. Clients and Server CPU skyrockets on heavy writes. For encoding works because
the CPU is saturated before the network and I/O.
- POXIS: Almost compatible. The tools I use can access gluster mounts as normal folders in
disk but in some edge cases things start causing problems. Check gluster mailing lists and
you will see there are a lot of problems.
- Replication, Failover and Load balancing: The best! if they actually worked. Gluster is
very new and it has a lot of bugs and performance problems.
- Installation is too easy. The management command line is amazing and setting replicated,
striped and distributed volumes among several servers can not be any easier.
Final conclusion:
Unfortunately the conclusion is "No single silver bullet".
Currently we have our media files in Gluster3.2 in a replicated volume for storage and transcoding. As long as you don't have a lot of servers, avoid geo-replication and stripe volumes things work ok.
When we are going to stream the media files we copy them to a lustre volume that is replicated to a second lustre volume via DR:DB. The wowza server then read the media files from the lustre volumes.
And finally we use MogileFS to serve the thumbnails in our web application servers.