Trim string column in PySpark dataframe

minh-hieu.pham picture minh-hieu.pham · Feb 2, 2016 · Viewed 65.2k times · Source

I'm beginner on Python and Spark. After creating a DataFrame from CSV file, I would like to know how I can trim a column. I've try:

df = df.withColumn("Product", df.Product.strip())

df is my data frame, Product is a column in my table

But I see always the error:

Column object is not callable

Do you have any suggestions?

Answer

Maniganda Prakash picture Maniganda Prakash · Dec 27, 2017
from pyspark.sql.functions import trim

df = df.withColumn("Product", trim(col("Product")))