I am having the following python/pandas command:
df.groupby('Column_Name').agg(lambda x: x.value_counts().max()
where I am getting the value counts for ALL columns in a DataFrameGroupBy
object.
How do I do this action in PySpark?
It's more or less the same:
spark_df.groupBy('column_name').count().orderBy('count')
In the groupBy you can have multiple columns delimited by a ,
For example groupBy('column_1', 'column_2')