How should I get the shape of a dask dataframe?

user1559897 picture user1559897 · May 15, 2018 · Viewed 10k times · Source

Performing .shape is giving me the following error.

AttributeError: 'DataFrame' object has no attribute 'shape'

How should I get the shape instead?

Answer

MRocklin picture MRocklin · May 15, 2018

You can get the number of columns directly

len(df.columns)  # this is fast

You can also call len on the dataframe itself, though beware that this will trigger a computation.

len(df)  # this requires a full scan of the data

Dask.dataframe doesn't know how many records are in your data without first reading through all of it.