If I have a summary table for a linear model in R, how can I get the p-values associated with just the interaction estimates, or just the group intercepts, etc., without having to count the row numbers?
For example, with a model such as lm(y ~ x + group)
with x
as continuous and group
as categorical, the summary table for the lm
object has estimates for:
I would like to figure out a way to get each of these as a group of p-values, even if the number of groups or the model formula change. Maybe there is information the summary table somehow uses to group together the rows?
Following is an example data set with two different models. The first model has four different sets of p-values I might want to get separately, while the second model has only two sets of p-values.
x <- 1:100
groupA <- .5*x + 10 + rnorm(length(x), 0, 1)
groupB <- .5*x + 20 + rnorm(length(x), 0, 1)
groupC <- .5*x + 30 + rnorm(length(x), 0, 1)
groupD <- .5*x + 40 + rnorm(length(x), 0, 1)
groupE <- .5*x + 50 + rnorm(length(x), 0, 1)
groupF <- .5*x + 60 + rnorm(length(x), 0, 1)
myData <- data.frame(x = x,
y = c(groupA, groupB, groupC, groupD, groupE, groupF),
group = rep(c("A","B","C","D","E","F"), each = length(x))
)
myMod1 <- lm(y ~ x + group + x:group, data = myData)
myMod2 <- lm(y ~ group + x:group - 1, data = myData)
summary(myMod1)
summary(myMod2)
You can access all coefficients and their associated statistics via summary()$coefficients
, like so:
> summary(myMod1)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.8598180335 0.207551769 47.50534335 1.882690e-203
x 0.5013049448 0.003568152 140.49427911 0.000000e+00
groupB 9.9833257879 0.293522526 34.01212819 5.343527e-141
groupC 20.0988336744 0.293522526 68.47458673 2.308586e-282
groupD 30.0671851583 0.293522526 102.43569906 0.000000e+00
groupE 39.8366758058 0.293522526 135.71931370 0.000000e+00
groupF 50.4780382104 0.293522526 171.97330259 0.000000e+00
x:groupB -0.0001115097 0.005046129 -0.02209807 9.823772e-01
x:groupC 0.0004144536 0.005046129 0.08213297 9.345689e-01
x:groupD 0.0022577223 0.005046129 0.44741668 6.547390e-01
x:groupE 0.0024544207 0.005046129 0.48639675 6.268671e-01
x:groupF -0.0052089956 0.005046129 -1.03227556 3.023674e-01
Of this you only want the p-values, i.e. the 4th column:
> summary(myMod1)$coefficients[,4]
(Intercept) x groupB groupC groupD groupE groupF x:groupB x:groupC
1.882690e-203 0.000000e+00 5.343527e-141 2.308586e-282 0.000000e+00 0.000000e+00 0.000000e+00 9.823772e-01 9.345689e-01
x:groupD x:groupE x:groupF
6.547390e-01 6.268671e-01 3.023674e-01
Lastly, you only want the p-values of particular coefficients, either for the intercepts or for the interaction terms. One way of doing this is to match the coefficient names (names(summary(myMod1)$coefficients[,4])
) to a RegEx via grepl()
and using the logical vector that grepl
returns as an index:
> # all group dummies
> summary(myMod1)$coefficients[grepl('^group[A-F]',names(summary(myMod1)$coefficients[,4])),4]
groupB groupC groupD groupE groupF
5.343527e-141 2.308586e-282 0.000000e+00 0.000000e+00 0.000000e+00
> # all interaction terms
> summary(myMod1)$coefficients[grepl('^x:group[A-F]',names(summary(myMod1)$coefficients[,4])),4]
x:groupB x:groupC x:groupD x:groupE x:groupF
0.9823772 0.9345689 0.6547390 0.6268671 0.3023674