How to get name of dataframe column in pyspark?

Kaushik Acharya picture Kaushik Acharya · Sep 28, 2016 · Viewed 101k times · Source

In pandas, this can be done by column.name.

But how to do the same when its column of spark dataframe?

e.g. The calling program has a spark dataframe: spark_df

>>> spark_df.columns
['admit', 'gre', 'gpa', 'rank']

This program calls my function: my_function(spark_df['rank']) In my_function, I need the name of the column i.e. 'rank'

If it was pandas dataframe, we can use inside my_function

>>> pandas_df['rank'].name
'rank'

Answer

David picture David · Sep 28, 2016

You can get the names from the schema by doing

spark_df.schema.names

Printing the schema can be useful to visualize it as well

spark_df.printSchema()