My question is similar to this thread: Partitioning by multiple columns in Spark SQL
but I'm working in Pyspark rather than Scala and I want to pass in my list of columns as a list. I want to do something like this:
column_list = ["col1","col2"]
win_spec = Window.partitionBy(column_list)
I can get the following to work:
win_spec = Window.partitionBy(col("col1"))
This also works:
col_name = "col1"
win_spec = Window.partitionBy(col(col_name))
And this also works:
win_spec = Window.partitionBy([col("col1"), col("col2")])
Convert column names to column expressions with a list comprehension [col(x) for x in column_list]
:
from pyspark.sql.functions import col
column_list = ["col1","col2"]
win_spec = Window.partitionBy([col(x) for x in column_list])