How to gather disk usage on a storage system faster than just using "du"?

Tommy picture Tommy · Jun 16, 2014 · Viewed 9k times · Source

I operates a Synology NAS device and the unit includes that over 600 users data.

The users backup data are tax accounting data. So, approximately one user's folder has 200,000 files.

I have to provide their backup data usage informations to each users, but since there are so many directories and files, the du command takes too long to execute.

Could someone provide me a way to check each user's disk usage in a faster way?

Answer

woot picture woot · Jun 16, 2014

There is no magic. In order to gather the disk usage, you'll have to traverse the file system. If you are looking for a method of just doing it at a file system level, that would be easy (just df -h for example)... but it sounds like you want it at a directory level within mount point.

You could perhaps run jobs in parallel on each directory. For example in bash:

for D in `ls -d */`
do
    du -s $D &
done

wait

But you are likely to be i/o bound, I think. Also, if you have a lot of top-level directories, this method might be... well... rather taxing since it doesn't have any kind of governing of max number of processes.

If you have GNU Parallel installed you can do something like:

ls -d */ | parallel du -s 

...which would be much better. parallel has a lot of nice features like grouping the output, governing the max processes, etc... and you can also pass in some parameters to tweak it some (although, like I mentioned earlier, you'll be i/o bound, so more processes is not better, in fact less than the default may be preferable).

The only other thought I have on this is to perhaps use disk quotas if that is really the point of what you are trying to do. There is a good tutorial here if you want to read about it.