use length function in substring in spark

satish picture satish · Sep 21, 2017 · Viewed 36.6k times · Source

I am trying to use the length function inside a substring function in a DataFrame but it gives error

val substrDF = testDF.withColumn("newcol", substring($"col", 1, length($"col")-1))

below is the error

 error: type mismatch;
 found   : org.apache.spark.sql.Column
 required: Int

I am using 2.1.

Answer

pasha701 picture pasha701 · Sep 22, 2017

Function "expr" can be used:

val data = List("first", "second", "third")
val df = sparkContext.parallelize(data).toDF("value")
val result = df.withColumn("cutted", expr("substring(value, 1, length(value)-1)"))
result.show(false)

output:

+------+------+
|value |cutted|
+------+------+
|first |firs  |
|second|secon |
|third |thir  |
+------+------+