Spark - How to write a single csv file WITHOUT folder?

antonioACR1 picture antonioACR1 · Apr 27, 2017 · Viewed 16.9k times · Source

Suppose that df is a dataframe in Spark. The way to write df into a single CSV file is

df.coalesce(1).write.option("header", "true").csv("name.csv")

This will write the dataframe into a CSV file contained in a folder called name.csv but the actual CSV file will be called something like part-00000-af091215-57c0-45c4-a521-cd7d9afb5e54.csv.

I would like to know if it is possible to avoid the folder name.csv and to have the actual CSV file called name.csv and not part-00000-af091215-57c0-45c4-a521-cd7d9afb5e54.csv. The reason is that I need to write several CSV files which later on I will read together in Python, but my Python code makes use of the actual CSV names and also needs to have all the single CSV files in a folder (and not a folder of folders).

Any help is appreciated.

Answer

Paul Vbl picture Paul Vbl · Sep 10, 2018

A possible solution could be convert the Spark dataframe to a pandas dataframe and save it as csv:

df.toPandas().to_csv("<path>/<filename>")