I need to rank rows by partition (or group), i.e. if my source table is:
NAME PRICE
---- -----
AAA 1.59
AAA 2.00
AAA 0.75
BBB 3.48
BBB 2.19
BBB 0.99
BBB 2.50
I would like to get target table:
RANK NAME PRICE
---- ---- -----
1 AAA 0.75
2 AAA 1.59
3 AAA 2.00
1 BBB 0.99
2 BBB 2.19
3 BBB 2.50
4 BBB 3.48
Normally I would use ROW_NUMBER() OVER
function, so in Apache Hive it would be:
select
row_number() over (partition by NAME order by PRICE) as RANK,
NAME,
PRICE
from
MY_TABLE
;
Unfortunately Cloudera Impala does not support (at the moment) ROW_NUMBER() OVER
function, so I'm looking for a workaround. Preferably not to use UDAF, as it will be politically difficult to convince to deploy it to the server.
Thank you for your help.
If you can't do it with a correlated subquery, you can still do this with a join:
select t1.name, t1.price,
coalesce(count(t2.name) + 1, 1)
from my_table t1 join
my_table t2
on t2.name = t1.name and
t2.price < t1.price
order by t1.name, t1.price;
Note that this doesn't exactly do row_number()
unless all the prices are distinct for a given name
. This formulation is actually equivalent to rank()
.
For row_number()
, you need a unique row identifier.
By the way, the following is equivalent to dense_rank()
:
select t1.name, t1.price,
coalesce(count(distinct t2.name) + 1, 1)
from my_table t1 join
my_table t2
on t2.name = t1.name and
t2.price < t1.price
order by t1.name, t1.price;