How to lower the case of column names of a data frame but not its values?

apache-spark apache-spark-sql apache-spark-dataset

user1870400 · Feb 8, 2018 · Viewed 13.1k times · Source

How to lower the case of column names of a data frame but not its values? using RAW Spark SQL and Dataframe methods ?

Input data frame (Imagine I have 100's of these columns in uppercase)

NAME | COUNTRY | SRC        | CITY       | DEBIT
---------------------------------------------
"foo"| "NZ"    | salary     | "Auckland" | 15.0
"bar"| "Aus"   | investment | "Melbourne"| 12.5

taget dataframe

name | country | src        | city       | debit
------------------------------------------------
"foo"| "NZ"    | salary     | "Auckland" | 15.0
"bar"| "Aus"   | investment | "Melbourne"| 12.5

Answer

If you are using scala, you can simply do the following

import org.apache.spark.sql.functions._
df.select(df.columns.map(x => col(x).as(x.toLowerCase)): _*).show(false)

And if you are using pyspark, you can simply do the following

from pyspark.sql import functions as F
df.select([F.col(x).alias(x.lower()) for x in df.columns]).show()

How to lower the case of column names of a data frame but not its values?

Answer

Related questions