Why is "Unable to find encoder for type stored in a Dataset" when creating a dataset of custom case class?

Question 1

Why is "Unable to find encoder for type stored in a Dataset" when creating a dataset of custom case class?

scala apache-spark apache-spark-dataset apache-spark-encoders

clay · Jul 29, 2016 · Viewed 65.6k times · Source

Answer

Answer

Spark Datasets require Encoders for data type which is about to be stored. For common types (atomics, product types) there is a number of predefined encoders available but you have to import these first from SparkSession.implicits to make it work:

val sparkSession: SparkSession = ???
import sparkSession.implicits._
val dataset = sparkSession.createDataset(dataList)

Alternatively you can provide directly an explicit

import org.apache.spark.sql.{Encoder, Encoders}

val dataset = sparkSession.createDataset(dataList)(Encoders.product[SimpleTuple])

or implicit

implicit val enc: Encoder[SimpleTuple] = Encoders.product[SimpleTuple]
val dataset = sparkSession.createDataset(dataList)

Encoder for the stored type.

Note that Encoders also provide a number of predefined Encoders for atomic types, and Encoders for complex ones, can derived with ExpressionEncoder.

Further reading:

For custom objects which are not covered by built-in encoders see How to store custom objects in Dataset?
For Row objects you have to provide Encoder explicitly as shown in Encoder error while trying to map dataframe row to updated row
For debug cases, case class must be defined outside of the Main https://stackoverflow.com/a/34715827/3535853

Question 2

Spark 2.0 (final) with Scala 2.11.8. The following super simple code yields the compilation error Error:(17, 45) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.

import org.apache.spark.sql.SparkSession

case class SimpleTuple(id: Int, desc: String)

object DatasetTest {
  val dataList = List(
    SimpleTuple(5, "abc"),
    SimpleTuple(6, "bcd")
  )

  def main(args: Array[String]): Unit = {
    val sparkSession = SparkSession.builder.
      master("local")
      .appName("example")
      .getOrCreate()

    val dataset = sparkSession.createDataset(dataList)
  }
}

Why is "Unable to find encoder for type stored in a Dataset" when creating a dataset of custom case class?

Answer

Related questions