FILESYSTEM vs SQLITE, while storing up-to 10M files

Doori Bar picture Doori Bar · Sep 27, 2010 · Viewed 7.6k times · Source

I would like to store up-to 10M files, 2TB storage unit. The only properties which I need restricted to filenames, and their contents (data).

The files max-length is 100MB, most of them are less than 1MB. The ability of removing files is required, and both writing and reading speeds should be a priority - while low storage efficiency, recovery or integrity methods, are not needed.

I thought about NTFS, but most of its features are not needed, while can't be disabled and considered to be an overhead concern, a few of them are: creation date, modification date, attribs, journal and of course permissions.

Due to the native features of a filesystem which are not needed, would you suggest I'll use SQLITE for this requirement? or there's an obvious disadvantage that I should be aware about? (one would guess that removing files will be a complicated task?)

(SQLITE will be via the C api)

My goal is to use a more suited solution to gain performance. Thanks in advance - Doori Bar

Answer

Sjon picture Sjon · Dec 29, 2017

The official SQLite site actually includes a page which documents the performance benefits of using a database over a native filesystem in various operating systems. When storing files of ~ 10 KiB sqlite is approximately 35% faster.

SQLite reads and writes small blobs (for example, thumbnail images) 35% faster¹ than the same blobs can be read from or written to individual files on disk using fread() or fwrite().

Furthermore, a single SQLite database holding 10-kilobyte blobs uses about 20% less disk space than storing the blobs in individual files.

The performance difference arises (we believe) because when working from an SQLite database, the open() and close() system calls are invoked only once, whereas open() and close() are invoked once for each blob when using blobs stored in individual files. It appears that the overhead of calling open() and close() is greater than the overhead of using the database. The size reduction arises from the fact that individual files are padded out to the next multiple of the filesystem block size, whereas the blobs are packed more tightly into an SQLite database.

The measurements in this article were made during the week of 2017-06-05 using a version of SQLite in between 3.19.2 and 3.20.0. You may expect future versions of SQLite to perform even better.

You may experience different results when using larger files, and SQLite site includes a link to kvtest which you may use to reproduce these results on your own hardware / operating system.