I am running a linear regression on some variables in a data frame. I'd like to be able to subset the linear regressions by a categorical variable, run the linear regression for each categorical variable, and then store the t-stats in a data frame. I'd like to do this without a loop if possible.
Here's a sample of what I'm trying to do:
a<- c("a","a","a","a","a",
"b","b","b","b","b",
"c","c","c","c","c")
b<- c(0.1,0.2,0.3,0.2,0.3,
0.1,0.2,0.3,0.2,0.3,
0.1,0.2,0.3,0.2,0.3)
c<- c(0.2,0.1,0.3,0.2,0.4,
0.2,0.5,0.2,0.1,0.2,
0.4,0.2,0.4,0.6,0.8)
cbind(a,b,c)
I can begin by running the following linear regression and pulling the t-statistic out very easily:
summary(lm(b~c))$coefficients[2,3]
However, I'd like to be able to run the regression for when column a is a, b, or c. I'd like to then store the t-stats in a table that looks like this:
variable t-stat
a 0.9
b 2.4
c 1.1
Hope that makes sense. Please let me know if you have any suggestions!
Here is a solution using dplyr
and tidy()
from the broom
package. tidy()
converts various statistical model outputs (e.g. lm
, glm
, anova
, etc.) into a tidy data frame.
library(broom)
library(dplyr)
data <- data_frame(a, b, c)
data %>%
group_by(a) %>%
do(tidy(lm(b ~ c, data = .))) %>%
select(variable = a, t_stat = statistic) %>%
slice(2)
# variable t_stat
# 1 a 1.6124515
# 2 b -0.1369306
# 3 c 0.8000000
Or extracting both, the t-statistic for the intercept and the slope term:
data %>%
group_by(a) %>%
do(tidy(lm(b ~ c, data = .))) %>%
select(variable = a, term, t_stat = statistic)
# variable term t_stat
# 1 a (Intercept) 1.2366939
# 2 a c 1.6124515
# 3 b (Intercept) 2.6325081
# 4 b c -0.1369306
# 5 c (Intercept) 1.4572335
# 6 c c 0.8000000