Summary stats by factor level for multiple variables

Question 1

Summary stats by factor level for multiple variables

r summary

Rory Shaw · Nov 23, 2015 · Viewed 8.4k times · Source

Answer

Answer

You could use summarise_each from dplyr:

library(dplyr)

mydf %>% group_by(Factor) %>%
         summarise_each(funs(my.summary(.)))

After modifying your function to return a list:

my.summary <- function(x, na.rm=TRUE){result <- list(c(n=as.integer(length(x)),
                                                  Mean=mean(x, na.rm=TRUE), SD=sd(x, na.rm=TRUE),
                                                  Median=median(x),   Min=min(x), Max=max(x)))}

Question 2

I want to produce dataframes containing summary statistics for each factor level for multiple variables.

For example if I have the following dataframe

Factor <- c("A","A","A","B","B","B")
Variable1 <- c(3,4,5,4,5,3)
Variable2 <- c(7,9,14,16,10,10)
mydf <- data.frame(Factor, Variable1, Variable2)
mydf
  Factor Variable1 Variable2
1      A         3         7
2      A         4         9
3      A         5        14
4      B         4        16
5      B         5        10
6      B         3        10

and I have the following function that I want to use to produce my summary stats:

my.summary <- function(x, na.rm=TRUE){result <- c(n=as.integer(length(x)),
Mean=mean(x, na.rm=TRUE), SD=sd(x, na.rm=TRUE), SeM = SEM(x),
Median=median(x),   Min=min(x), Max=max(x))}

To apply this to factor levels of Variable1 I can do this:

ddply(mydf, c("Factor"), function(x) my.summary(x$Variable1))
  Factor n Mean SD       SeM Median Min Max
1      A 3    4  1 0.5773503      4   3   5
2      B 3    4  1 0.5773503      4   3   5

Now I can do the same for Variable 2:

ddply(mydf, c("Factor"), function(x) my.summary(x$Variable2))

Which is easy enough if I just have 2 variables. However, if I had lots of variables this would be a pain. So how can I solve this so that I can produce a dataframe of the summary stats for each variable/factor level without having to adjust the code?

I have tried using aggregate.data.frame but it doesn't work using my.summary. It works using summary but produces one big data frame.

Thanks

Summary stats by factor level for multiple variables

Answer

Related questions