I have a script that's successfully doing a multiprocessing Pool set of tasks with a imap_unordered()
call:
p = multiprocessing.Pool()
rs = p.imap_unordered(do_work, xrange(num_tasks))
p.close() # No more work
p.join() # Wait for completion
However, my num_tasks
is around 250,000, and so the join()
locks the main thread for 10 seconds or so, and I'd like to be able to echo out to the command line incrementally to show the main process isn't locked. Something like:
p = multiprocessing.Pool()
rs = p.imap_unordered(do_work, xrange(num_tasks))
p.close() # No more work
while (True):
remaining = rs.tasks_remaining() # How many of the map call haven't been done yet?
if (remaining == 0): break # Jump out of while loop
print "Waiting for", remaining, "tasks to complete..."
time.sleep(2)
Is there a method for the result object or the pool itself that indicates the number of tasks remaining? I tried using a multiprocessing.Value
object as a counter (do_work
calls a counter.value += 1
action after doing its task), but the counter only gets to ~85% of the total value before stopping incrementing.
My personal favorite -- gives you a nice little progress bar and completion ETA while things run and commit in parallel.
from multiprocessing import Pool
import tqdm
pool = Pool(processes=8)
for _ in tqdm.tqdm(pool.imap_unordered(do_work, tasks), total=len(tasks)):
pass