How to change case of whole column to lowercase?

Shreeharsha picture Shreeharsha · Apr 19, 2017 · Viewed 49.9k times · Source

I want to Change case of whole column to Lowercase in Spark Dataset

        Desired Input
        +------+--------------------+
        |ItemID|       Category name|
        +------+--------------------+
        |   ABC|BRUSH & BROOM HAN...|
        |   XYZ|WHEEL BRUSH PARTS...|
        +------+--------------------+

        Desired Output
        +------+--------------------+
        |ItemID|       Category name|
        +------+--------------------+
        |   ABC|brush & broom han...|
        |   XYZ|wheel brush parts...|
        +------+--------------------+

I tried with collectAsList() and toString(), which is slow and complex procedure for very large dataset.

I also found a method 'lower' but didnt get to know how to get it work in dasaset Please suggest me a simple or effective way to do the above. Thanks in advance

Answer

Shreeharsha picture Shreeharsha · Apr 19, 2017

I Got it (use Functions#lower, see Javadoc)

import org.apache.spark.sql.functions.lower

        String columnName="Category name";
        src=src.withColumn(columnName, lower(col(columnName)));
        src.show();

This replaced old column with new one retaining the whole Dataset.

        +------+--------------------+
        |ItemID|       Category name|
        +------+--------------------+
        |   ABC|brush & broom han...|
        |   XYZ|wheel brush parts...|
        +------+--------------------+