PySpark 2.0 The size or shape of a DataFrame

Xi Liang picture Xi Liang · Sep 23, 2016 · Viewed 153.8k times · Source

I am trying to find out the size/shape of a DataFrame in PySpark. I do not see a single function that can do this.

In Python I can do

data.shape()

Is there a similar function in PySpark. This is my current solution, but I am looking for an element one

row_number = data.count()
column_number = len(data.dtypes)

The computation of the number of columns is not ideal...

Answer

George Fisher picture George Fisher · Aug 11, 2017

You can get its shape with:

print((df.count(), len(df.columns)))