Using tqdm on a for loop inside a function to check progress

Mayank Gautam picture Mayank Gautam · Mar 13, 2016 · Viewed 15.8k times · Source

I'm iterating over a large group files inside a directory tree using the for loop.

While doing so, I want to monitor the progress through a progress bar in console. So, I decided to use tqdm for this purpose.

Currently, my code looks like this:

for dirPath, subdirList, fileList in tqdm(os.walk(target_dir)):
        sleep(0.01)
        dirName = dirPath.split(os.path.sep)[-1]
        for fname in fileList:
        *****

Output:

Scanning Directory....
43it [00:23, 11.24 it/s]

So, my problem is that it is not showing a progress bar. I want to know how to use it properly and get a better understanding of it working. Also, if there are any other alternatives to tqdm that can be used here.

Answer

Benjamin Hodgson picture Benjamin Hodgson · Mar 13, 2016

You can't show a percentage complete unless you know what "complete" means.

While os.walk is running, it doesn't know how many files and folders it's going to end up iterating: the return type of os.walk has no __len__. It'd have to look all the way down the directory tree, enumerating all the files and folders, in order to count them. In other words, os.walk would have to do all of its work twice in order to tell you how many items it's going to produce, which is inefficient.

If you're dead set on showing a progress bar, you could spool the data into an in-memory list: list(os.walk(target_dir)). I don't recommend this. If you're traversing a large directory tree this could consume a lot of memory. Worse, if followlinks is True and you have a cyclic directory structure (with children linking to their parents), then it could end up looping forever until you run out of RAM.