I have a working solution but am looking for a cleaner, more readable solution that perhaps takes advantage of some of the newer dplyr window functions.
Using the mtcars dataset, if I want to look at the 25th, 50th, 75th percentiles and the mean and count of miles per gallon ("mpg") by the number of cylinders ("cyl"), I use the following code:
library(dplyr)
library(tidyr)
# load data
data("mtcars")
# Percentiles used in calculation
p <- c(.25,.5,.75)
# old dplyr solution
mtcars %>% group_by(cyl) %>%
do(data.frame(p=p, stats=quantile(.$mpg, probs=p),
n = length(.$mpg), avg = mean(.$mpg))) %>%
spread(p, stats) %>%
select(1, 4:6, 3, 2)
# note: the select and spread statements are just to get the data into
# the format in which I'd like to see it, but are not critical
Is there a way I can do this more cleanly with dplyr using some of the summary functions (n_tiles, percent_rank, etc.)? By cleanly, I mean without the "do" statement.
Thank you
In dplyr 1.0
, summarise
can return multiple values, allowing the following:
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
summarise(quantile = scales::percent(c(0.25, 0.5, 0.75)),
mpg = quantile(mpg, c(0.25, 0.5, 0.75)))
Or, you can avoid a separate line to name the quantiles by going with enframe
:
mtcars %>%
group_by(cyl) %>%
summarise(enframe(quantile(mpg, c(0.25, 0.5, 0.75)), "quantile", "mpg"))
cyl quantile mpg <dbl> <chr> <dbl> 1 4 25% 22.8 2 4 50% 26 3 4 75% 30.4 4 6 25% 18.6 5 6 50% 19.7 6 6 75% 21 7 8 25% 14.4 8 8 50% 15.2 9 8 75% 16.2
Answer for previous versions of dplyr
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
summarise(x=list(enframe(quantile(mpg, probs=c(0.25,0.5,0.75)), "quantiles", "mpg"))) %>%
unnest(x)
cyl quantiles mpg 1 4 25% 22.80 2 4 50% 26.00 3 4 75% 30.40 4 6 25% 18.65 5 6 50% 19.70 6 6 75% 21.00 7 8 25% 14.40 8 8 50% 15.20 9 8 75% 16.25
This can be turned into a more general function using tidyeval:
q_by_group = function(data, value.col, ..., probs=seq(0,1,0.25)) {
groups=enquos(...)
data %>%
group_by(!!!groups) %>%
summarise(x = list(enframe(quantile({{value.col}}, probs=probs), "quantiles", "mpg"))) %>%
unnest(x)
}
q_by_group(mtcars, mpg)
q_by_group(mtcars, mpg, cyl)
q_by_group(mtcars, mpg, cyl, vs, probs=c(0.5,0.75))
q_by_group(iris, Petal.Width, Species)