I want to produce dataframes containing summary statistics for each factor level for multiple variables.
For example if I have the following dataframe
Factor <- c("A","A","A","B","B","B")
Variable1 <- c(3,4,5,4,5,3)
Variable2 <- c(7,9,14,16,10,10)
mydf <- data.frame(Factor, Variable1, Variable2)
mydf
Factor Variable1 Variable2
1 A 3 7
2 A 4 9
3 A 5 14
4 B 4 16
5 B 5 10
6 B 3 10
and I have the following function that I want to use to produce my summary stats:
my.summary <- function(x, na.rm=TRUE){result <- c(n=as.integer(length(x)),
Mean=mean(x, na.rm=TRUE), SD=sd(x, na.rm=TRUE), SeM = SEM(x),
Median=median(x), Min=min(x), Max=max(x))}
To apply this to factor levels of Variable1 I can do this:
ddply(mydf, c("Factor"), function(x) my.summary(x$Variable1))
Factor n Mean SD SeM Median Min Max
1 A 3 4 1 0.5773503 4 3 5
2 B 3 4 1 0.5773503 4 3 5
Now I can do the same for Variable 2:
ddply(mydf, c("Factor"), function(x) my.summary(x$Variable2))
Which is easy enough if I just have 2 variables. However, if I had lots of variables this would be a pain. So how can I solve this so that I can produce a dataframe of the summary stats for each variable/factor level without having to adjust the code?
I have tried using aggregate.data.frame but it doesn't work using my.summary. It works using summary but produces one big data frame.
Thanks
You could use summarise_each from dplyr
:
library(dplyr)
mydf %>% group_by(Factor) %>%
summarise_each(funs(my.summary(.)))
After modifying your function to return a list:
my.summary <- function(x, na.rm=TRUE){result <- list(c(n=as.integer(length(x)),
Mean=mean(x, na.rm=TRUE), SD=sd(x, na.rm=TRUE),
Median=median(x), Min=min(x), Max=max(x)))}