Using a Filesystem (Not a Database!) for Schemaless Data - Best Practices

Évariste Galois picture Évariste Galois · Nov 16, 2010 · Viewed 7k times · Source

After reading over my other question, Using a Relational Database for Schema-Less Data, I began to wonder if a filesystem is more appropriate than a relational database for storing and querying schemaless data.

Rather than just building a file system on top of MySQL, why not just save the data directly to the filesystem? Indexing needs to be figured out, but modern filesystems are very stable, have great features like replication, snapshot and backup facilities, and are flexible at storing schema-less data.

However, I can't find any examples of someone using a filesystem instead of a database.

Where can I find more resources on how to implement a schemaless (or "document-oriented") database as a layer on top of a filesystem? Is anyone using a modern filesystem as a schemaless database?

Answer

MJB picture MJB · Apr 24, 2011

Yes a filesystem could be taken as a special case of a NOSQL-like database system. It may have some limitations that should be considered during any design decisions:

pros: - - simple, intuitive.

  • takes advantage of years of tuning and caching algorithms
  • easy backup, potentially easy clustering

things to think about:

  • richness of metadata - what types of data does it store, how does it let you query them, can you have hierarchal or multivalued attributes

  • speed of querying metadata - not all fs's are particularly well optimized with anything other than size, dates.

  • inability to join queries (though that's pretty much common to NoSQL)

  • inefficient storage usage (unless the file system performs block suballocation, you'll typically blow 4-16K per item stored regardless of size)

  • May not have the kind of caching algorithm you want for it's directory structure
  • tends to be less tunable, etc.
  • backup solutions may have trouble depending on how you store things - too deep, too many items per node, etc - which might obviate an obvious advantage of such a structure. locking for a LOCAL filesystem works pretty well of course if you call the right routines, but not necessarily for a network base fileesytem (those problems have been solved in various ways, but it's certainly a design issue)