ggplot2 stat_summary mean_sdl not the same as mean +/- sd

biomiha picture biomiha · Jan 25, 2017 · Viewed 8.5k times · Source

I am unsure as to why the error bars generated by the mean_sdl function (from Hmisc) in ggplot2 are significantly broader than the error bars generated manually and plotting mean + sd and mean - sd. My code:

library(drc)
library(tidyverse)

test_dataset <- 
  structure(
    list(
      X = c(1e-10, 1e-08, 3e-08, 1e-07, 3e-07, 1e-06, 3e-06, 1e-05, 3e-05, 1e-04, 3e-04),
      AY1 = c(0, 11, 125, 190, 258, 322, 354, 348, NA, 412, NA),
      AY2 = c(3, 33, 141, 218, 289, 353, 359, 298, NA, 378, NA),
      AY3 = c(2, 25, 160, 196, 345, 328, 369, 372, NA, 399, NA),
      BY1 = c(3, NA, 11, 52, 80, 171, 289, 272, 359, 352, 389),
      BY2 = c(5, NA, 25, 55, 77, 195, 230, 333, 306, 320, 338),
      BY3 = c(4, NA, 28, 61, 44, 246, 243, 310, 297, 365, NA)
    ),
    class = c("tbl_df", "tbl", "data.frame"),
    row.names = c(NA,-11L),
    .Names = c("X", "AY1", "AY2", "AY3", "BY1", "BY2", "BY3")
  )

test_dataset2 <- test_dataset %>% 
  rename(conc = X) %>% 
  gather(-conc, key = "measurement", value = "signal") %>% 
  separate(col = measurement, into = c("mAb", "rep"), sep = "Y")

plot_with_mean_sdl <- ggplot(test_dataset2, aes(x = conc, y = signal, col = mAb)) + 
  scale_x_log10() +
  stat_summary(fun.data = mean_se, 
               geom = "point",
               size = 2
               ) +
  # geom_errorbar(data = (test_dataset2 %>% group_by(mAb, conc) %>% 
  # summarise(AVG = mean(signal), SD = sd(signal)) %>% 
  # dplyr::filter(AVG != "NA") %>% 
  # mutate(top = AVG + SD, bottom = AVG - SD)), aes(x = conc, y = AVG, ymin = bottom, ymax = top)) + 
  stat_summary(fun.data = mean_sdl, geom = "errorbar") +
  stat_smooth(method = "drm",
              method.args=list(fct = L.4()),
              se = F,
              n = 300
              )

plot_with_manual_errorbars <- ggplot(test_dataset2, aes(x = conc, y = signal, col = mAb)) + 
  scale_x_log10() +
  stat_summary(fun.data = mean_se, 
               geom = "point",
               size = 2
               ) +
  geom_errorbar(data = (test_dataset2 %>% group_by(mAb, conc) %>%
  summarise(AVG = mean(signal), SD = sd(signal)) %>%
  dplyr::filter(AVG != "NA") %>%
  mutate(top = AVG + SD, bottom = AVG - SD)), aes(x = conc, y = AVG, ymin = bottom, ymax = top)) +
  # stat_summary(fun.data = mean_sdl, geom = "errorbar") +
  stat_smooth(method = "drm",
              method.args=list(fct = L.4()),
              se = F,
              n = 300
              )

I thought the smean_sdl function from the Hmisc package was supposed to plot the mean +/- a constant number of standard deviations from the mean. What am I not getting right?

Thanks.

Answer

Axeman picture Axeman · Jan 25, 2017

From ?smean.sd (also linked on ?hmisc) :

smean.sdl computes the mean plus or minus a constant times the standard deviation.

And:

smean.sdl(x, mult=2, na.rm=TRUE)

So the default appears to be 2 standard deviations.

Use fun.args = list(mult = 1) as shown in the examples for stat_summary.