How to save CSV with all fields quoted?

Arvind Kandaswamy picture Arvind Kandaswamy · Apr 26, 2017 · Viewed 10.1k times · Source

The below code does not add the double quotes which is the default. I also tried adding # and single quote using option quote with no success. I also used quoteMode with ALL and NON_NUMERIC options, still no change in the output.

s2d.coalesce(64).write
  .format("com.databricks.spark.csv")
  .option("header", "false")
  .save(fname)

Are there any other options I can try? I am using spark-csv 2.11 over spark 2.1.

Output it produces:

d4c354ef,2017-03-14 16:31:33,2017-03-14 16:31:46,104617772177,340618697

Output I am looking for:

“d4c354ef”,”2017-03-14 16:31:33”,”2017-03-14 16:31:46”,104617772177,340618697  

Answer

Jacek Laskowski picture Jacek Laskowski · Apr 27, 2017

tl;dr Enable quoteAll option.

scala> Seq(("hello", 5)).toDF.write.option("quoteAll", true).csv("hello5.csv")

The above gives the following output:

$ cat hello5.csv/part-00000-a0ecb4c2-76a9-4e08-9c54-6a7922376fe6-c000.csv
"hello","5"

That assumes the quote is " (see CSVOptions)

That however won't give you "Double quotes around all non-numeric characters." Sorry.

You can see all the options in CSVOptions that serves as the source of the options for the CSV reader and writer.

p.s. com.databricks.spark.csv is currently a mere alias for csv format. You can use both interchangeably, but the shorter csv is preferred.

p.s. Use option("header", false) (false as boolean not String) that will make your code slightly more type-safe.