How to select the row with the maximum value in each group

Xinting WANG picture Xinting WANG · Jul 3, 2014 · Viewed 98.9k times · Source

In a dataset with multiple observations for each subject I want to take a subset with only the maximum data value for each record. For example, with a following dataset:

ID    <- c(1,1,1,2,2,2,2,3,3)
Value <- c(2,3,5,2,5,8,17,3,5)
Event <- c(1,1,2,1,2,1,2,2,2)

group <- data.frame(Subject=ID, pt=Value, Event=Event)

Subject 1, 2, and 3 have the biggest pt value of 5, 17, and 5 respectively.

How could I first find the biggest pt value for each subject, and then, put this observation in another data frame? The resulting data frame should only have the biggest pt values for each subject.

Answer

Arun picture Arun · Jul 3, 2014

Here's a data.table solution:

require(data.table) ## 1.9.2
group <- as.data.table(group)

If you want to keep all the entries corresponding to max values of pt within each group:

group[group[, .I[pt == max(pt)], by=Subject]$V1]
#    Subject pt Event
# 1:       1  5     2
# 2:       2 17     2
# 3:       3  5     2

If you'd like just the first max value of pt:

group[group[, .I[which.max(pt)], by=Subject]$V1]
#    Subject pt Event
# 1:       1  5     2
# 2:       2 17     2
# 3:       3  5     2

In this case, it doesn't make a difference, as there aren't multiple maximum values within any group in your data.