Upacking a list to select multiple columns from a spark data frame

Ben picture Ben · Jan 22, 2016 · Viewed 67.3k times · Source

I have a spark data frame df. Is there a way of sub selecting a few columns using a list of these columns?

scala> df.columns
res0: Array[String] = Array("a", "b", "c", "d")

I know I can do something like df.select("b", "c"). But suppose I have a list containing a few column names val cols = List("b", "c"), is there a way to pass this to df.select? df.select(cols) throws an error. Something like df.select(*cols) as in python

Answer

Shagun Sodhani picture Shagun Sodhani · Jan 22, 2016

Use df.select(cols.head, cols.tail: _*)

Let me know if it works :)

Explanation from @Ben:

The key is the method signature of select:

select(col: String, cols: String*)

The cols:String* entry takes a variable number of arguments. :_* unpacks arguments so that they can be handled by this argument. Very similar to unpacking in python with *args. See here and here for other examples.