Spark Dataset select with typedcolumn

Jeremy picture Jeremy · Jul 28, 2016 · Viewed 11.1k times · Source

Looking at the select() function on the spark DataSet there are various generated function signatures:

(c1: TypedColumn[MyClass, U1],c2: TypedColumn[MyClass, U2] ....)

This seems to hint that I should be able to reference the members of MyClass directly and be type safe, but I'm not sure how...

ds.select("member") of course works .. seems like ds.select(_.member) might also work somehow?

Answer

Sim picture Sim · Jul 28, 2016

In the Scala DSL for select, there are many ways to identify a Column:

  • From a symbol: 'name
  • From a string: $"name" or col(name)
  • From an expression: expr("nvl(name, 'unknown') as renamed")

To get a TypedColumn from Column you simply use myCol.as[T].

For example: ds.select(col("name").as[String])