Top "Dask" questions

Dask is a parallel computing and data analytics library for Python.

How to read a compressed (gz) CSV file into a dask Dataframe?

Is there a way to read a .csv file that is compressed via gz into a dask dataframe? I've tried …

python csv pandas dask
Writing Dask partitions into single file

New to dask,I have a 1GB CSV file when I read it in dask dataframe it creates around 50 partitions …

python dask
How should I get the shape of a dask dataframe?

Performing .shape is giving me the following error. AttributeError: 'DataFrame' object has no attribute 'shape' How should I get the …

python dask
Strategy for partitioning dask dataframes efficiently

The documentation for Dask talks about repartioning to reduce overhead here. They however seem to indicate you need some knowledge …

python optimization dataframe dask
Simple way to Dask concatenate (horizontal, axis=1, columns)

Action Reading two csv (data.csv and label.csv) to a single dataframe. df = dd.read_csv(data_files, delimiter=…

python pandas dask
dask DataFrame equivalent of pandas DataFrame sort_values

What would be the equivalent of sort_values in pandas for a dask DataFrame ? I am trying to scale some …

python dataframe sorting dask
dask: difference between client.persist and client.compute

I am confused about what the difference is between client.persist() and client.compute() both seem (in some cases) to …

python dask
How to use all the cpu cores using Dask?

I have a pandas series with more than 35000 rows. I want to use dask make it more efficient. However, I …

dask dask-distributed dask-delayed
how to parallelize many (fuzzy) string comparisons using apply in Pandas?

I have the following problem I have a dataframe master that contains sentences, such as master Out[8]: original 0 this is …

python pandas parallel-processing dask fuzzywuzzy
Merge a large Dask dataframe with a small Pandas dataframe

Following the example here: YouTube: Dask-Pandas Dataframe Join I attempting to merge a ~70GB Dask data frame with a ~24MB …

python pandas dask