I'm writing an app that needs to store lots of files up to approx 10 million.
They are presently named with a UUID and are going to be around 4MB each but always the same size. Reading and writing from/to these files will always be sequential.
2 main questions I am seeking answers for:
1) Which filesystem would be best for this. XFS or ext4? 2) Would it be necessary to store the files beneath subdirectories in order to reduce the numbers of files within a single directory?
For question 2, I note that people have attempted to discover the XFS limit for number of files you can store in a single directory and haven't found the limit which exceeds millions. They noted no performance problems. What about under ext4?
Googling around with people doing similar things, some people suggested storing the inode number as a link to the file instead of the filename for performance (this is in a database index. which I'm also using). However, I don't see a usable API for opening the file by inode number. That seemed to be more of a suggestion for improving performance under ext3 which I am not intending to use by the way.
What are the ext4 and XFS limits? What performance benefits are there from one over the other and could you see a reason to use ext4 over XFS in my case?
You should definitely store the files in subdirectories.
EXT4 and XFS both use efficient lookup methods for file names, but if you ever need to run tools over the directories such as ls
or find
you will be very glad to have the files in manageable chunks of 1,000 - 10,000 files.
The inode number thing is to improve the sequential access performance of the EXT filesystems. The metadata is stored in inodes and if you access these inodes out of order then the metadata accesses are randomized. By reading your files in inode order you make the metadata access sequential too.