How do we concatenate two columns in an Apache Spark DataFrame? Is there any function in Spark SQL which we can use?
With raw SQL you can use CONCAT
In Python
df = sqlContext.createDataFrame([("foo", 1), ("bar", 2)], ("k", "v"))
sqlContext.sql("SELECT CONCAT(k, ' ', v) FROM df")
In Scala
import sqlContext.implicits._
val df = sc.parallelize(Seq(("foo", 1), ("bar", 2))).toDF("k", "v")
sqlContext.sql("SELECT CONCAT(k, ' ', v) FROM df")
Since Spark 1.5.0 you can use concat
function with DataFrame API:
In Python :
from pyspark.sql.functions import concat, col, lit"k"), lit(" "), col("v")))
In Scala :
import org.apache.spark.sql.functions.{concat, lit}$"k", lit(" "), $"v"))
There is also concat_ws
function which takes a string separator as the first argument.