How to convert Spark RDD to pandas dataframe in ipython?

user2966197 picture user2966197 · Jan 15, 2016 · Viewed 74.1k times · Source

I have a RDD and I want to convert it to pandas dataframe. I know that to convert and RDD to a normal dataframe we can do

df = rdd1.toDF()

But I want to convert the RDD to pandas dataframe and not a normal dataframe. How can I do it?

Answer

jezrael picture jezrael · Jan 15, 2016

You can use function toPandas():

Returns the contents of this DataFrame as Pandas pandas.DataFrame.

This is only available if Pandas is installed and available.

>>> df.toPandas()  
   age   name
0    2  Alice
1    5    Bob