I wish to (1) group data by one variable (State
), (2) within each group find the row of minimum value of another variable (Employees
), and (3) extract the entire row.
(1) and (2) are easy one-liners, and I feel like (3) should be too, but I can't get it.
Here is a sample data set:
> data
State Company Employees
1 AK A 82
2 AK B 104
3 AK C 37
4 AK D 24
5 RI E 19
6 RI F 118
7 RI G 88
8 RI H 42
data <- structure(list(State = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("AK", "RI"), class = "factor"), Company = structure(1:8, .Label = c("A",
"B", "C", "D", "E", "F", "G", "H"), class = "factor"), Employees = c(82L,
104L, 37L, 24L, 19L, 118L, 88L, 42L)), .Names = c("State", "Company",
"Employees"), class = "data.frame", row.names = c(NA, -8L))
Calculate min
by group is easy, using aggregate
:
> aggregate(Employees ~ State, data, function(x) min(x))
State Employees
1 AK 24
2 RI 19
...or data.table
:
> library(data.table)
> DT <- data.table(data)
> DT[ , list(Employees = min(Employees)), by = State]
State Employees
1: AK 24
2: RI 19
But how do I extract the entire row corresponding to these min
values, i.e. also including Company
in the result?
Slightly more elegant:
library(data.table)
DT[ , .SD[which.min(Employees)], by = State]
State Company Employees
1: AK D 24
2: RI E 19
Slighly less elegant than using .SD
, but a bit faster (for data with many groups):
DT[DT[ , .I[which.min(Employees)], by = State]$V1]
Also, just replace the expression which.min(Employees)
with Employees == min(Employees)
, if your data set has multiple identical min values and you'd like to subset all of them.
See also Subset rows corresponding to max value by group using data.table.