rank() function usage in Spark SQL

Binu picture Binu · Mar 6, 2017 · Viewed 10.3k times · Source

Need some pointers in using rank()

I have extracted a column from a dataset..need to do the ranking.

Dataset<Row> inputCol= inputDataset.apply("Colname");    
Dataset<Row>  DSColAwithIndex=inputDSAAcolonly.withColumn("df1Rank", rank());

DSColAwithIndex.show();

I can sort the column and then append an index column too to get rank...but curious to known syntax and usage of rank()

Answer

mrsrinivas picture mrsrinivas · Mar 6, 2017

Window spec need to be specified for rank()

val w = org.apache.spark.sql.expressions.Window.orderBy("date") //some spec    

val leadDf = inputDSAAcolonly.withColumn("df1Rank", rank().over(w))

Edit: Java version of answer, as OP using Java

import org.apache.spark.sql.expressions.WindowSpec; 
WindowSpec w = org.apache.spark.sql.expressions.Window.orderBy(colName);
Dataset<Row> leadDf = inputDSAAcolonly.withColumn("df1Rank", rank().over(w));