How to write dataset object to excel in spark java?

Question 1

How to write dataset object to excel in spark java?

apache-spark pyspark apache-spark-sql spark-excel

BHANUMATHI H M · Jun 24, 2017 · Viewed 8.3k times · Source

Answer

Answer

Looks like the library you chose, com.crealytics.spark.excel, does not have any code related to writing excel files. Underneath it uses Apache POI for reading Excel files, there are also few examples.

The good news are that CSV is a valid Excel file, and you may use spark-csv to write it. You need to change your code like this:

sourcePropertySet.write
    .format("com.databricks.spark.csv")
    .option("header", "true")
    .save("D:\\resultset.csv");

Keep in mind that Spark makes 1 output file per partition, and you might want to do .repartition(1) to have exactly one result file.

Question 2

I Am reading excel file using com.crealytics.spark.excel package. Below is the code to read an excel file in spark java.

    Dataset<Row> SourcePropertSet = sqlContext.read()
               .format("com.crealytics.spark.excel")
               .option("location", "D:\\5Kto10K.xlsx")
               .option("useHeader", "true")
               .option("treatEmptyValuesAsNulls", "true")
               .option("inferSchema", "true")
               .option("addColorColumns", "false")
               .load("com.databricks.spark.csv");

But I tried with the same (com.crealytics.spark.excel) package to write dataset object to an excel file in spark java.

    SourcePropertSet.write()
          .format("com.crealytics.spark.excel")
          .option("useHeader", "true")
          .option("treatEmptyValuesAsNulls", "true")
          .option("inferSchema", "true")
          .option("addColorColumns", "false").save("D:\\resultset.xlsx");

But i am getting below error.

java.lang.RuntimeException: com.crealytics.spark.excel.DefaultSource does not allow create table as select.

And even I tried with org.zuinnote.spark.office.excel package also. below is the code for that.

    SourcePropertSet.write()
             .format("org.zuinnote.spark.office.excel")
             .option("write.locale.bcp47", "de") 
             .save("D:\\result");

i have added following dependencies in my pom.xml

<dependency>
              <groupId>com.github.zuinnote</groupId>
              <artifactId>hadoopoffice-fileformat</artifactId>
              <version>1.0.0</version>
          </dependency>
        <dependency>
            <groupId>com.github.zuinnote</groupId>
            <artifactId>spark-hadoopoffice-ds_2.11</artifactId>
            <version>1.0.3</version>
        </dependency>

But I am getting below error.

java.lang.IllegalAccessError: tried to access method org.zuinnote.hadoop.office.format.mapreduce.ExcelFileOutputFormat.getSuffix(Ljava/lang/String;)Ljava/lang/String; from class org.zuinnote.spark.office.excel.ExcelOutputWriterFactory

Please help me to write dataset object to an excel file in spark java.

How to write dataset object to excel in spark java?

Answer

Related questions