Spark DataFrame: How to specify schema when writing as Avro

erwaman picture erwaman · Feb 21, 2018 · Viewed 8k times · Source

I want to write a DataFrame in Avro format using a provided Avro schema rather than Spark's auto-generated schema. How can I tell Spark to use my custom schema on write?

Answer

erwaman picture erwaman · Feb 21, 2018

After applying the patch in https://github.com/databricks/spark-avro/pull/222/, I was able to specify a schema on write as follows:

df.write.option("forceSchema", myCustomSchemaString).avro("/path/to/outputDir")