Linear Regression and storing results in data frame

Trexion Kameha picture Trexion Kameha · Jan 19, 2015 · Viewed 21.7k times · Source

I am running a linear regression on some variables in a data frame. I'd like to be able to subset the linear regressions by a categorical variable, run the linear regression for each categorical variable, and then store the t-stats in a data frame. I'd like to do this without a loop if possible.

Here's a sample of what I'm trying to do:

  a<-  c("a","a","a","a","a",
         "b","b","b","b","b",
         "c","c","c","c","c")     
  b<-  c(0.1,0.2,0.3,0.2,0.3,
         0.1,0.2,0.3,0.2,0.3,
         0.1,0.2,0.3,0.2,0.3)
  c<-  c(0.2,0.1,0.3,0.2,0.4,
         0.2,0.5,0.2,0.1,0.2,
         0.4,0.2,0.4,0.6,0.8)
      cbind(a,b,c)

I can begin by running the following linear regression and pulling the t-statistic out very easily:

  summary(lm(b~c))$coefficients[2,3]

However, I'd like to be able to run the regression for when column a is a, b, or c. I'd like to then store the t-stats in a table that looks like this:

variable t-stat
a        0.9
b        2.4
c        1.1

Hope that makes sense. Please let me know if you have any suggestions!

Answer

alex23lemm picture alex23lemm · Jan 19, 2015

Here is a solution using dplyr and tidy() from the broom package. tidy() converts various statistical model outputs (e.g. lm, glm, anova, etc.) into a tidy data frame.

library(broom)
library(dplyr)

data <- data_frame(a, b, c)

data %>% 
  group_by(a) %>% 
  do(tidy(lm(b ~ c, data = .))) %>% 
  select(variable = a, t_stat = statistic) %>% 
  slice(2)

#   variable     t_stat
# 1        a  1.6124515
# 2        b -0.1369306
# 3        c  0.8000000  

Or extracting both, the t-statistic for the intercept and the slope term:

data %>% 
  group_by(a) %>% 
  do(tidy(lm(b ~ c, data = .))) %>% 
  select(variable = a, term, t_stat = statistic)

#   variable        term     t_stat
# 1        a (Intercept)  1.2366939
# 2        a           c  1.6124515
# 3        b (Intercept)  2.6325081
# 4        b           c -0.1369306
# 5        c (Intercept)  1.4572335
# 6        c           c  0.8000000