All:
I have a data frame like the follow.I know I can do a global rank order like this:
dt <- data.frame(
ID = c('A1','A2','A4','A2','A1','A4','A3','A2','A1','A3'),
Value = c(4,3,1,3,4,6,6,1,8,4)
);
> dt
ID Value
1 A1 4
2 A2 3
3 A4 1
4 A2 3
5 A1 4
6 A4 6
7 A3 6
8 A2 1
9 A1 8
10 A3 4
dt$Order <- rank(dt$Value,ties.method= "first")
> dt
ID Value Order
1 A1 4 5
2 A2 3 3
3 A4 1 1
4 A2 3 4
5 A1 4 6
6 A4 6 8
7 A3 6 9
8 A2 1 2
9 A1 8 10
10 A3 4 7
But how can I set a rank order for a particular ID instead of a global rank order. How can I get this done? In T-SQL, we can get this done as the following syntax:
RANK() OVER ( [ < partition_by_clause > ] < order_by_clause > )
Any idea?
Many options.
Using ddply
from the plyr package:
library(plyr)
ddply(dt,.(ID),transform,Order = rank(Value,ties.method = "first"))
ID Value Order
1 A1 4 1
2 A1 4 2
3 A1 8 3
4 A2 3 2
5 A2 3 3
6 A2 1 1
7 A3 6 2
8 A3 4 1
9 A4 1 1
10 A4 6 2
Or if performance is an issue (i.e. very large data) using the data.table package:
library(data.table)
DT <- data.table(dt,key = "ID")
DT[,transform(.SD,Order = rank(Value,ties.method = "first")),by = ID]
ID Value Order
[1,] A1 4 1
[2,] A1 4 2
[3,] A1 8 3
[4,] A2 3 2
[5,] A2 3 3
[6,] A2 1 1
[7,] A4 1 1
[8,] A4 6 2
[9,] A3 6 2
[10,] A3 4 1
or in all its gory detail a base R solution using split
lapply
do.call
and rbind
:
do.call(rbind,lapply(split(dt,dt$ID),transform,
Order = rank(Value,ties.method = "first")))