I would like to use stargazer to produce summary statistics for each category of a grouping variable. I could do it in separate tables, but I'd like it all in one – if that is not unreasonably challenging for this package.
For example
library(stargazer)
stargazer(ToothGrowth, type = "text")
#>
#> =========================================
#> Statistic N Mean St. Dev. Min Max
#> -----------------------------------------
#> len 60 18.813 7.649 4.200 33.900
#> dose 60 1.167 0.629 0.500 2.000
#> -----------------------------------------
provides summery statistics for the continues variables in ToothGrowth
. I would like to split that summery by the categorical variable supp
, also in ToothGrowth
.
Two suggestions for desired outcome,
stargazer(ToothGrowth ~ supp, type = "text")
#>
#> ==================================================
#> Statistic N Mean St. Dev. Min Max
#> --------------------------------------------------
#> OJ len 30 16.963 8.266 4.200 33.900
#> dose 30 1.167 0.634 0.500 2.000
#> VC len 30 20.663 6.606 8.200 30.900
#> dose 30 1.167 0.634 0.500 2.000
#> --------------------------------------------------
#>
stargazer(ToothGrowth ~ supp, type = "text")
#>
#> ==================================================
#> Statistic N Mean St. Dev. Min Max
#> --------------------------------------------------
#> len
#> _by VC 30 16.963 8.266 4.200 33.900
#> _by VC 30 1.167 0.634 0.500 2.000
#> _tot 60 18.813 7.649 4.200 33.900
#>
#> dose
#> _by OJ 30 20.663 6.606 8.200 30.900
#> _by OJ 30 1.167 0.634 0.500 2.000
#> _tot 60 1.167 0.629 0.500 2.000
#> --------------------------------------------------
library(stargazer)
library(dplyr)
library(tidyr)
ToothGrowth %>%
group_by(supp) %>%
mutate(id = 1:n()) %>%
ungroup() %>%
gather(temp, val, len, dose) %>%
unite(temp1, supp, temp, sep = '_') %>%
spread(temp1, val) %>%
select(-id) %>%
as.data.frame() %>%
stargazer(type = 'text')
=========================================
Statistic N Mean St. Dev. Min Max
-----------------------------------------
OJ_dose 30 1.167 0.634 0.500 2.000
OJ_len 30 20.663 6.606 8.200 30.900
VC_dose 30 1.167 0.634 0.500 2.000
VC_len 30 16.963 8.266 4.200 33.900
-----------------------------------------
This gets rid of the problem mentioned by the OP in a comment to the original answer, "What I really want is a single table with summary statistics separated by a categorical variable instead of creating separate tables." The easiest way I saw to do that with stargazer
was to create a new data frame that had variables for each group's observations using a gather()
, unite()
, spread()
strategy. The only trick to it is to avoid duplicate identifiers by creating unique identifiers by group and dropping that variable before calling stargazer()
.