How to delete columns in pyspark dataframe

apache-spark apache-spark-sql pyspark

xjx0524 · Apr 13, 2015 · Viewed 208.3k times · Source

>>> a
DataFrame[id: bigint, julian_date: string, user_id: bigint]
>>> b
DataFrame[id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]
>>> a.join(b, a.id==b.id, 'outer')
DataFrame[id: bigint, julian_date: string, user_id: bigint, id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]

There are two id: bigint and I want to delete one. How can I do?

Answer

Reading the Spark documentation I found an easier solution.

Since version 1.4 of spark there is a function drop(col) which can be used in pyspark on a dataframe.

You can use it in two ways

df.drop('age').collect()
df.drop(df.age).collect()

Pyspark Documentation - Drop

How to delete columns in pyspark dataframe

Answer

Related questions