In R linear model, get p-values for only the interaction coefficients

r lm
Jdub picture Jdub · Jun 27, 2012 · Viewed 7.2k times · Source

If I have a summary table for a linear model in R, how can I get the p-values associated with just the interaction estimates, or just the group intercepts, etc., without having to count the row numbers?

For example, with a model such as lm(y ~ x + group) with x as continuous and group as categorical, the summary table for the lm object has estimates for:

  1. an intercept
  2. x, the slope across all groups
  3. 5 within group differences from the overall intercept
  4. 5 within group differences from the overall slope.

I would like to figure out a way to get each of these as a group of p-values, even if the number of groups or the model formula change. Maybe there is information the summary table somehow uses to group together the rows?

Following is an example data set with two different models. The first model has four different sets of p-values I might want to get separately, while the second model has only two sets of p-values.

x <- 1:100
groupA <- .5*x + 10 + rnorm(length(x), 0, 1)
groupB <- .5*x + 20 + rnorm(length(x), 0, 1)
groupC <- .5*x + 30 + rnorm(length(x), 0, 1)
groupD <- .5*x + 40 + rnorm(length(x), 0, 1)
groupE <- .5*x + 50 + rnorm(length(x), 0, 1)
groupF <- .5*x + 60 + rnorm(length(x), 0, 1)

myData <- data.frame(x = x,
    y = c(groupA, groupB, groupC, groupD, groupE, groupF),
    group = rep(c("A","B","C","D","E","F"), each = length(x))
)

myMod1 <- lm(y ~ x + group + x:group, data = myData)
myMod2 <- lm(y ~ group + x:group - 1, data = myData)
summary(myMod1)
summary(myMod2)

Answer

RoyalTS picture RoyalTS · Jun 27, 2012

You can access all coefficients and their associated statistics via summary()$coefficients, like so:

> summary(myMod1)$coefficients
                 Estimate  Std. Error      t value      Pr(>|t|)
(Intercept)  9.8598180335 0.207551769  47.50534335 1.882690e-203
x            0.5013049448 0.003568152 140.49427911  0.000000e+00
groupB       9.9833257879 0.293522526  34.01212819 5.343527e-141
groupC      20.0988336744 0.293522526  68.47458673 2.308586e-282
groupD      30.0671851583 0.293522526 102.43569906  0.000000e+00
groupE      39.8366758058 0.293522526 135.71931370  0.000000e+00
groupF      50.4780382104 0.293522526 171.97330259  0.000000e+00
x:groupB    -0.0001115097 0.005046129  -0.02209807  9.823772e-01
x:groupC     0.0004144536 0.005046129   0.08213297  9.345689e-01
x:groupD     0.0022577223 0.005046129   0.44741668  6.547390e-01
x:groupE     0.0024544207 0.005046129   0.48639675  6.268671e-01
x:groupF    -0.0052089956 0.005046129  -1.03227556  3.023674e-01

Of this you only want the p-values, i.e. the 4th column:

> summary(myMod1)$coefficients[,4]
  (Intercept)             x        groupB        groupC        groupD        groupE        groupF      x:groupB      x:groupC 
1.882690e-203  0.000000e+00 5.343527e-141 2.308586e-282  0.000000e+00  0.000000e+00  0.000000e+00  9.823772e-01  9.345689e-01 
     x:groupD      x:groupE      x:groupF 
 6.547390e-01  6.268671e-01  3.023674e-01 

Lastly, you only want the p-values of particular coefficients, either for the intercepts or for the interaction terms. One way of doing this is to match the coefficient names (names(summary(myMod1)$coefficients[,4])) to a RegEx via grepl() and using the logical vector that grepl returns as an index:

> # all group dummies
> summary(myMod1)$coefficients[grepl('^group[A-F]',names(summary(myMod1)$coefficients[,4])),4]
       groupB        groupC        groupD        groupE        groupF 
5.343527e-141 2.308586e-282  0.000000e+00  0.000000e+00  0.000000e+00 
> # all interaction terms
> summary(myMod1)$coefficients[grepl('^x:group[A-F]',names(summary(myMod1)$coefficients[,4])),4]
 x:groupB  x:groupC  x:groupD  x:groupE  x:groupF 
0.9823772 0.9345689 0.6547390 0.6268671 0.3023674