How to see progress of Dask Compute task?

ambigus9 picture ambigus9 · Feb 28, 2018 · Viewed 10.2k times · Source

I would like to see a progressbar on Jupyternotebook while i'm running a compute task using Dask, I'm counting all values of "id" column from a large csv file +4GB, so any ideas?

import dask.dataframe as dd

df = dd.read_csv('data/train.csv')
df.id.count().compute()

Answer

MRocklin picture MRocklin · Feb 28, 2018

If you're using the single machine scheduler then do this:

from dask.diagnostics import ProgressBar
ProgressBar().register()

http://dask.pydata.org/en/latest/diagnostics-local.html

If you're using the distributed scheduler then do this:

from dask.distributed import progress

result = df.id.count.persist()
progress(result)

Or just use the dashboard

http://dask.pydata.org/en/latest/diagnostics-distributed.html