I am using Spark 1.3.1 (PySpark) and I have generated a table using a SQL query. I now have an object that is a DataFrame
. I want to export this DataFrame
object (I have called it "table") to a csv file so I can manipulate it and plot the columns. How do I export the DataFrame
"table" to a csv file?
Thanks!
If data frame fits in a driver memory and you want to save to local files system you can convert Spark DataFrame to local Pandas DataFrame using toPandas
method and then simply use to_csv
:
df.toPandas().to_csv('mycsv.csv')
Otherwise you can use spark-csv:
Spark 1.3
df.save('mycsv.csv', 'com.databricks.spark.csv')
Spark 1.4+
df.write.format('com.databricks.spark.csv').save('mycsv.csv')
In Spark 2.0+ you can use csv
data source directly:
df.write.csv('mycsv.csv')