I'm trying to get multiple summary statistics in R/S-PLUS grouped by categorical column in one shot. I found couple of functions, but all of them do one statistic per call, like aggregate()
.
data <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66,
71, 67, 68, 68, 56, 62, 60, 61, 63, 64, 63, 59)
grp <- factor(rep(LETTERS[1:4], c(4,6,6,8)))
df <- data.frame(group=grp, dt=data)
mg <- aggregate(df$dt, by=df$group, FUN=mean)
mg <- aggregate(df$dt, by=df$group, FUN=sum)
What I'm looking for is to get multiple statistics for the same group like mean, min, max, std, ...etc in one call, is that doable?
tapply
I'll put in my two cents for tapply()
.
tapply(df$dt, df$group, summary)
You could write a custom function with the specific statistics you want or format the results:
tapply(df$dt, df$group,
function(x) format(summary(x), scientific = TRUE))
$A
Min. 1st Qu. Median Mean 3rd Qu. Max.
"5.900e+01" "5.975e+01" "6.100e+01" "6.100e+01" "6.225e+01" "6.300e+01"
$B
Min. 1st Qu. Median Mean 3rd Qu. Max.
"6.300e+01" "6.425e+01" "6.550e+01" "6.600e+01" "6.675e+01" "7.100e+01"
$C
Min. 1st Qu. Median Mean 3rd Qu. Max.
"6.600e+01" "6.725e+01" "6.800e+01" "6.800e+01" "6.800e+01" "7.100e+01"
$D
Min. 1st Qu. Median Mean 3rd Qu. Max.
"5.600e+01" "5.975e+01" "6.150e+01" "6.100e+01" "6.300e+01" "6.400e+01"
data.table
The data.table
package offers a lot of helpful and fast tools for these types of operation:
library(data.table)
setDT(df)
> df[, as.list(summary(dt)), by = group]
group Min. 1st Qu. Median Mean 3rd Qu. Max.
1: A 59 59.75 61.0 61 62.25 63
2: B 63 64.25 65.5 66 66.75 71
3: C 66 67.25 68.0 68 68.00 71
4: D 56 59.75 61.5 61 63.00 64