I am trying to fit a regression model in R, after figuring out the main predictors, I want to check the interaction effects for the predictors. However, there are 14 predictors in total, which means hundreds of combinations possible. If I do this:
lm.fit2=lm(medv~chas*dis*tax*black*rm*lstat*age*nox*zn*crim*rad*indus*ptratio,data=Boston)
summary(lm.fit2) Then error occurs because the degree of freedom reduces to negative which is not available.
To make it work:
lm.fit2=lm(medv~chas*dis*tax*black*rm,data=Boston)
summary(lm.fit2)
However, this still gives me too many options:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.082e+02 1.798e+02 -1.158 0.248
chas -2.585e+03 1.820e+03 -1.420 0.156
dis 2.545e+01 6.613e+01 0.385 0.701
tax 4.098e-01 3.021e-01 1.356 0.176
black 3.434e-01 4.622e-01 0.743 0.458
rm 4.234e+01 3.015e+01 1.405 0.161
chas:dis 8.677e+02 6.350e+02 1.367 0.172
chas:tax 6.656e+00 5.232e+00 1.272 0.204
dis:tax -7.457e-02 1.259e-01 -0.593 0.554
chas:black 6.931e+00 4.936e+00 1.404 0.161
dis:black -6.838e-02 1.688e-01 -0.405 0.686
tax:black -7.198e-04 7.791e-04 -0.924 0.356
chas:rm 3.295e+02 2.864e+02 1.150 0.251
dis:rm -5.586e+00 1.084e+01 -0.515 0.606
tax:rm -7.681e-02 5.049e-02 -1.521 0.129
black:rm -6.455e-02 7.744e-02 -0.833 0.405
chas:dis:tax -1.971e+00 2.520e+00 -0.782 0.435
chas:dis:black -2.280e+00 1.648e+00 -1.383 0.167
chas:tax:black -1.835e-02 1.370e-02 -1.339 0.181
dis:tax:black 1.878e-04 3.227e-04 0.582 0.561
chas:dis:rm -9.001e+01 1.018e+02 -0.884 0.377
chas:tax:rm -8.002e-01 8.687e-01 -0.921 0.357
dis:tax:rm 1.447e-02 2.063e-02 0.702 0.483
chas:black:rm -9.037e-01 7.670e-01 -1.178 0.239
dis:black:rm 1.414e-02 2.765e-02 0.511 0.609
tax:black:rm 1.318e-04 1.301e-04 1.013 0.312
chas:dis:tax:black 5.364e-03 6.461e-03 0.830 0.407
chas:dis:tax:rm 1.592e-01 4.289e-01 0.371 0.711
chas:dis:black:rm 2.436e-01 2.619e-01 0.930 0.353
chas:tax:black:rm 2.293e-03 2.250e-03 1.019 0.309
dis:tax:black:rm -3.452e-05 5.286e-05 -0.653 0.514
chas:dis:tax:black:rm -4.712e-04 1.098e-03 -0.429 0.668
So if I include more predictors it is more likely taking much more time to make decisions. I want to ask if there is any way I can check the interaction effects more quickly.
Generally the third and higher order interactions are weak and hard to interpret, so my suggestion is to first look at the main effects and second order interactions. The R formula syntax using ^2 to mean "all two-way interactions of the variables inside enclosing parentheses". You should use poly
to model polynomial transforms:
lm.fit2=lm(medv ~ (chas+dis+tax_black+rm+lstat+age+nox+zn+
crim+rad+indus+ptratio)^2,data=Boston)
> anova(lm.fit2)
Analysis of Variance Table
Response: medv
Df Sum Sq Mean Sq F value Pr(>F)
chas 1 1312.1 1312.1 161.3513 < 2.2e-16 ***
dis 1 3082.6 3082.6 379.0794 < 2.2e-16 ***
tax 1 6078.7 6078.7 747.5244 < 2.2e-16 ***
black 1 765.8 765.8 94.1700 < 2.2e-16 ***
rm 1 14071.3 14071.3 1730.3969 < 2.2e-16 ***
lstat 1 3819.5 3819.5 469.6923 < 2.2e-16 ***
age 1 112.9 112.9 13.8843 0.0002214 ***
nox 1 109.6 109.6 13.4719 0.0002738 ***
zn 1 687.4 687.4 84.5305 < 2.2e-16 ***
crim 1 106.2 106.2 13.0561 0.0003395 ***
rad 1 288.8 288.8 35.5125 5.426e-09 ***
indus 1 8.6 8.6 1.0541 0.3051555
ptratio 1 1194.2 1194.2 146.8594 < 2.2e-16 ***
chas:dis 1 78.6 78.6 9.6679 0.0020047 **
chas:tax 1 118.8 118.8 14.6093 0.0001526 ***
chas:black 1 50.4 50.4 6.2026 0.0131473 *
chas:rm 1 5.4 5.4 0.6604 0.4168819
chas:lstat 1 197.6 197.6 24.3037 1.193e-06 ***
chas:age 1 27.3 27.3 3.3584 0.0675818 .
chas:nox 1 220.8 220.8 27.1561 2.967e-07 ***
chas:zn 1 131.9 131.9 16.2178 6.717e-05 ***
chas:crim 1 311.2 311.2 38.2735 1.479e-09 ***
chas:rad 1 101.3 101.3 12.4601 0.0004624 ***
chas:indus 1 0.8 0.8 0.1022 0.7493299
chas:ptratio 1 38.1 38.1 4.6844 0.0310080 *
dis:tax 1 113.7 113.7 13.9797 0.0002108 ***
dis:black 1 20.7 20.7 2.5508 0.1110013
dis:rm 1 769.1 769.1 94.5817 < 2.2e-16 ***
dis:lstat 1 178.4 178.4 21.9372 3.826e-06 ***
dis:age 1 201.2 201.2 24.7456 9.607e-07 ***
dis:nox 1 33.1 33.1 4.0712 0.0442657 *
dis:zn 1 48.1 48.1 5.9169 0.0154195 *
dis:crim 1 45.2 45.2 5.5527 0.0189169 *
dis:rad 1 4.8 4.8 0.5956 0.4407156
dis:indus 1 138.1 138.1 16.9862 4.550e-05 ***
dis:ptratio 1 524.8 524.8 64.5419 9.940e-15 ***
tax:black 1 3.1 3.1 0.3829 0.5363790
tax:rm 1 1453.4 1453.4 178.7271 < 2.2e-16 ***
tax:lstat 1 541.5 541.5 66.5939 4.046e-15 ***
tax:age 1 49.6 49.6 6.1056 0.0138770 *
tax:nox 1 40.8 40.8 5.0143 0.0256685 *
tax:zn 1 24.8 24.8 3.0477 0.0815952 .
tax:crim 1 41.9 41.9 5.1507 0.0237503 *
tax:rad 1 2.1 2.1 0.2604 0.6100884
tax:indus 1 44.4 44.4 5.4549 0.0199899 *
tax:ptratio 1 7.8 7.8 0.9579 0.3282936
black:rm 1 10.4 10.4 1.2785 0.2588338
black:lstat 1 271.8 271.8 33.4254 1.460e-08 ***
black:age 1 102.1 102.1 12.5507 0.0004412 ***
black:nox 1 1.9 1.9 0.2348 0.6282474
black:zn 1 10.6 10.6 1.2994 0.2549878
black:crim 1 35.3 35.3 4.3402 0.0378360 *
black:rad 1 2.8 2.8 0.3503 0.5542756
black:indus 1 26.9 26.9 3.3045 0.0698112 .
black:ptratio 1 5.6 5.6 0.6843 0.4085852
rm:lstat 1 705.8 705.8 86.7990 < 2.2e-16 ***
rm:age 1 13.5 13.5 1.6563 0.1988248
rm:nox 1 1.2 1.2 0.1453 0.7032901
rm:zn 1 79.3 79.3 9.7566 0.0019124 **
rm:crim 1 37.8 37.8 4.6444 0.0317315 *
rm:rad 1 39.9 39.9 4.9089 0.0272627 *
rm:indus 1 106.6 106.6 13.1098 0.0003302 ***
rm:ptratio 1 39.9 39.9 4.9030 0.0273535 *
lstat:age 1 59.8 59.8 7.3553 0.0069650 **
lstat:nox 1 9.2 9.2 1.1301 0.2883636
lstat:zn 1 31.6 31.6 3.8823 0.0494631 *
lstat:crim 1 118.1 118.1 14.5196 0.0001598 ***
lstat:rad 1 11.8 11.8 1.4504 0.2291523
lstat:indus 1 1.5 1.5 0.1814 0.6703950
lstat:ptratio 1 12.3 12.3 1.5135 0.2192980
age:nox 1 16.4 16.4 2.0191 0.1560794
age:zn 1 0.7 0.7 0.0918 0.7620300
age:crim 1 1.2 1.2 0.1423 0.7061824
age:rad 1 56.0 56.0 6.8824 0.0090262 **
age:indus 1 4.7 4.7 0.5778 0.4476205
age:ptratio 1 28.5 28.5 3.5049 0.0618937 .
nox:zn 1 2.7 2.7 0.3290 0.5665511
nox:crim 1 35.2 35.2 4.3323 0.0380099 *
nox:rad 1 46.8 46.8 5.7587 0.0168484 *
nox:indus 1 85.3 85.3 10.4926 0.0012952 **
nox:ptratio 1 9.1 9.1 1.1189 0.2907682
zn:crim 1 23.0 23.0 2.8271 0.0934414 .
zn:rad 1 0.4 0.4 0.0551 0.8145504
zn:indus 1 0.0 0.0 0.0012 0.9725670
zn:ptratio 1 0.0 0.0 0.0017 0.9666820
crim:rad 1 46.2 46.2 5.6793 0.0176164 *
crim:indus 1 17.5 17.5 2.1490 0.1434186
crim:ptratio 1 8.9 8.9 1.0946 0.2960597
rad:indus 1 1.3 1.3 0.1654 0.6844432
rad:ptratio 1 3.1 3.1 0.3761 0.5400088
indus:ptratio 1 20.5 20.5 2.5250 0.1128196
Residuals 414 3366.6 8.1
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I know it's long, but with some effort you could see which of the combinations had some information. The "stars" are very misleading and you should not believe them at the 0.05 level. You would be well-advised to apply some theory before throwing all possible variables at such a procedure.
Looking at this, however, I would might move the nox
, zn
, and crim
"outside" while keep the chad:rad
and chas:tax
interaction for further consideration. Since they only showed modest strength in a reduced model (p=0.01 for one and p=0.95 for the other), I might consider dropping it, too. Remember you would be looking at a really large set of hypotheses and p < 0.05 is entirely unreasonable. Also remember that these are only testing linear relationships and these interactions may be picking up behavior that is better described by spline transformations.