Changing whisker definition in geom_boxplot

cswingle picture cswingle · Jan 22, 2011 · Viewed 20k times · Source

I'm trying to use ggplot2 / geom_boxplot to produce a boxplot where the whiskers are defined as the 5 and 95th percentile instead of 0.25 - 1.5 IQR / 0.75 + IQR and outliers from those new whiskers are plotted as usual. I can see that the geom_boxplot aesthetics include ymax / ymin, but it's not clear to me how I put values in here. It seems like:

stat_quantile(quantiles = c(0.05, 0.25, 0.5, 0.75, 0.95))

should be able to help, but I don't know how to relate the results of this stat to set the appropriate geom_boxplot() aesthetics:

geom_boxplot(aes(ymin, lower, middle, upper, ymax))

I've seen other posts where people mention essentially building a boxplot-like object manually, but I'd rather keep the whole boxplot gestalt intact, just revising the meaning of two of the variables being drawn.

Answer

kohske picture kohske · Jan 22, 2011

geom_boxplot with stat_summary can do it:

# define the summary function
f <- function(x) {
  r <- quantile(x, probs = c(0.05, 0.25, 0.5, 0.75, 0.95))
  names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
  r
}

# sample data
d <- data.frame(x=gl(2,50), y=rnorm(100))

# do it
ggplot(d, aes(x, y)) + stat_summary(fun.data = f, geom="boxplot")

# example with outliers
# define outlier as you want    
o <- function(x) {
  subset(x, x < quantile(x)[2] | quantile(x)[4] < x)
}

# do it
ggplot(d, aes(x, y)) + 
  stat_summary(fun.data=f, geom="boxplot") + 
  stat_summary(fun.y = o, geom="point")