Use of lit() in expr()

Kyunam picture Kyunam · Nov 8, 2018 · Viewed 7.5k times · Source

The line:

df.withColumn("test", expr("concat(lon, lat)")) 

works as expected but

df.withColumn("test", expr("concat(lon, lit(','), lat)"))

produces the following exception:

org.apache.spark.sql.AnalysisException: Undefined function: 'lit'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 12 at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$49.apply(Analyzer.scala:1198)

Why? And what would be the workaround?

Answer

Shaido picture Shaido · Nov 8, 2018

The string argument to expr will be parsed as a SQL expression and used to construct a column. Since lit is not a valid SQL command this will give you an error. (lit is used in Spark to convert a literal value into a new column.)

To solve this, simply remove the lit part:

df.withColumn("test", expr("concat(lon, ',', lat)")) 

Or use the in-built Spark concat function directly without expr:

df.withColumn("test", concat($"lon", lit(","), $"lat"))

Since concat takes columns as arguments lit must be used here.