How to check interaction effects for a lot of predictors in R

T.Joe picture T.Joe · Oct 6, 2016 · Viewed 8k times · Source

I am trying to fit a regression model in R, after figuring out the main predictors, I want to check the interaction effects for the predictors. However, there are 14 predictors in total, which means hundreds of combinations possible. If I do this:

   lm.fit2=lm(medv~chas*dis*tax*black*rm*lstat*age*nox*zn*crim*rad*indus*ptratio,data=Boston)

summary(lm.fit2) Then error occurs because the degree of freedom reduces to negative which is not available.

To make it work:

lm.fit2=lm(medv~chas*dis*tax*black*rm,data=Boston)
summary(lm.fit2)

However, this still gives me too many options:

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)
(Intercept)           -2.082e+02  1.798e+02  -1.158    0.248
chas                  -2.585e+03  1.820e+03  -1.420    0.156
dis                    2.545e+01  6.613e+01   0.385    0.701
tax                    4.098e-01  3.021e-01   1.356    0.176
black                  3.434e-01  4.622e-01   0.743    0.458
rm                     4.234e+01  3.015e+01   1.405    0.161
chas:dis               8.677e+02  6.350e+02   1.367    0.172
chas:tax               6.656e+00  5.232e+00   1.272    0.204
dis:tax               -7.457e-02  1.259e-01  -0.593    0.554
chas:black             6.931e+00  4.936e+00   1.404    0.161
dis:black             -6.838e-02  1.688e-01  -0.405    0.686
tax:black             -7.198e-04  7.791e-04  -0.924    0.356
chas:rm                3.295e+02  2.864e+02   1.150    0.251
dis:rm                -5.586e+00  1.084e+01  -0.515    0.606
tax:rm                -7.681e-02  5.049e-02  -1.521    0.129
black:rm              -6.455e-02  7.744e-02  -0.833    0.405
chas:dis:tax          -1.971e+00  2.520e+00  -0.782    0.435
chas:dis:black        -2.280e+00  1.648e+00  -1.383    0.167
chas:tax:black        -1.835e-02  1.370e-02  -1.339    0.181
dis:tax:black          1.878e-04  3.227e-04   0.582    0.561
chas:dis:rm           -9.001e+01  1.018e+02  -0.884    0.377
chas:tax:rm           -8.002e-01  8.687e-01  -0.921    0.357
dis:tax:rm             1.447e-02  2.063e-02   0.702    0.483
chas:black:rm         -9.037e-01  7.670e-01  -1.178    0.239
dis:black:rm           1.414e-02  2.765e-02   0.511    0.609
tax:black:rm           1.318e-04  1.301e-04   1.013    0.312
chas:dis:tax:black     5.364e-03  6.461e-03   0.830    0.407
chas:dis:tax:rm        1.592e-01  4.289e-01   0.371    0.711
chas:dis:black:rm      2.436e-01  2.619e-01   0.930    0.353
chas:tax:black:rm      2.293e-03  2.250e-03   1.019    0.309
dis:tax:black:rm      -3.452e-05  5.286e-05  -0.653    0.514
chas:dis:tax:black:rm -4.712e-04  1.098e-03  -0.429    0.668

So if I include more predictors it is more likely taking much more time to make decisions. I want to ask if there is any way I can check the interaction effects more quickly.

Answer

IRTFM picture IRTFM · Oct 6, 2016

Generally the third and higher order interactions are weak and hard to interpret, so my suggestion is to first look at the main effects and second order interactions. The R formula syntax using ^2 to mean "all two-way interactions of the variables inside enclosing parentheses". You should use poly to model polynomial transforms:

lm.fit2=lm(medv ~ (chas+dis+tax_black+rm+lstat+age+nox+zn+
                                    crim+rad+indus+ptratio)^2,data=Boston)
> anova(lm.fit2)
Analysis of Variance Table

Response: medv
               Df  Sum Sq Mean Sq   F value    Pr(>F)    
chas            1  1312.1  1312.1  161.3513 < 2.2e-16 ***
dis             1  3082.6  3082.6  379.0794 < 2.2e-16 ***
tax             1  6078.7  6078.7  747.5244 < 2.2e-16 ***
black           1   765.8   765.8   94.1700 < 2.2e-16 ***
rm              1 14071.3 14071.3 1730.3969 < 2.2e-16 ***
lstat           1  3819.5  3819.5  469.6923 < 2.2e-16 ***
age             1   112.9   112.9   13.8843 0.0002214 ***
nox             1   109.6   109.6   13.4719 0.0002738 ***
zn              1   687.4   687.4   84.5305 < 2.2e-16 ***
crim            1   106.2   106.2   13.0561 0.0003395 ***
rad             1   288.8   288.8   35.5125 5.426e-09 ***
indus           1     8.6     8.6    1.0541 0.3051555    
ptratio         1  1194.2  1194.2  146.8594 < 2.2e-16 ***
chas:dis        1    78.6    78.6    9.6679 0.0020047 ** 
chas:tax        1   118.8   118.8   14.6093 0.0001526 ***
chas:black      1    50.4    50.4    6.2026 0.0131473 *  
chas:rm         1     5.4     5.4    0.6604 0.4168819    
chas:lstat      1   197.6   197.6   24.3037 1.193e-06 ***
chas:age        1    27.3    27.3    3.3584 0.0675818 .  
chas:nox        1   220.8   220.8   27.1561 2.967e-07 ***
chas:zn         1   131.9   131.9   16.2178 6.717e-05 ***
chas:crim       1   311.2   311.2   38.2735 1.479e-09 ***
chas:rad        1   101.3   101.3   12.4601 0.0004624 ***
chas:indus      1     0.8     0.8    0.1022 0.7493299    
chas:ptratio    1    38.1    38.1    4.6844 0.0310080 *  
dis:tax         1   113.7   113.7   13.9797 0.0002108 ***
dis:black       1    20.7    20.7    2.5508 0.1110013    
dis:rm          1   769.1   769.1   94.5817 < 2.2e-16 ***
dis:lstat       1   178.4   178.4   21.9372 3.826e-06 ***
dis:age         1   201.2   201.2   24.7456 9.607e-07 ***
dis:nox         1    33.1    33.1    4.0712 0.0442657 *  
dis:zn          1    48.1    48.1    5.9169 0.0154195 *  
dis:crim        1    45.2    45.2    5.5527 0.0189169 *  
dis:rad         1     4.8     4.8    0.5956 0.4407156    
dis:indus       1   138.1   138.1   16.9862 4.550e-05 ***
dis:ptratio     1   524.8   524.8   64.5419 9.940e-15 ***
tax:black       1     3.1     3.1    0.3829 0.5363790    
tax:rm          1  1453.4  1453.4  178.7271 < 2.2e-16 ***
tax:lstat       1   541.5   541.5   66.5939 4.046e-15 ***
tax:age         1    49.6    49.6    6.1056 0.0138770 *  
tax:nox         1    40.8    40.8    5.0143 0.0256685 *  
tax:zn          1    24.8    24.8    3.0477 0.0815952 .  
tax:crim        1    41.9    41.9    5.1507 0.0237503 *  
tax:rad         1     2.1     2.1    0.2604 0.6100884    
tax:indus       1    44.4    44.4    5.4549 0.0199899 *  
tax:ptratio     1     7.8     7.8    0.9579 0.3282936    
black:rm        1    10.4    10.4    1.2785 0.2588338    
black:lstat     1   271.8   271.8   33.4254 1.460e-08 ***
black:age       1   102.1   102.1   12.5507 0.0004412 ***
black:nox       1     1.9     1.9    0.2348 0.6282474    
black:zn        1    10.6    10.6    1.2994 0.2549878    
black:crim      1    35.3    35.3    4.3402 0.0378360 *  
black:rad       1     2.8     2.8    0.3503 0.5542756    
black:indus     1    26.9    26.9    3.3045 0.0698112 .  
black:ptratio   1     5.6     5.6    0.6843 0.4085852    
rm:lstat        1   705.8   705.8   86.7990 < 2.2e-16 ***
rm:age          1    13.5    13.5    1.6563 0.1988248    
rm:nox          1     1.2     1.2    0.1453 0.7032901    
rm:zn           1    79.3    79.3    9.7566 0.0019124 ** 
rm:crim         1    37.8    37.8    4.6444 0.0317315 *  
rm:rad          1    39.9    39.9    4.9089 0.0272627 *  
rm:indus        1   106.6   106.6   13.1098 0.0003302 ***
rm:ptratio      1    39.9    39.9    4.9030 0.0273535 *  
lstat:age       1    59.8    59.8    7.3553 0.0069650 ** 
lstat:nox       1     9.2     9.2    1.1301 0.2883636    
lstat:zn        1    31.6    31.6    3.8823 0.0494631 *  
lstat:crim      1   118.1   118.1   14.5196 0.0001598 ***
lstat:rad       1    11.8    11.8    1.4504 0.2291523    
lstat:indus     1     1.5     1.5    0.1814 0.6703950    
lstat:ptratio   1    12.3    12.3    1.5135 0.2192980    
age:nox         1    16.4    16.4    2.0191 0.1560794    
age:zn          1     0.7     0.7    0.0918 0.7620300    
age:crim        1     1.2     1.2    0.1423 0.7061824    
age:rad         1    56.0    56.0    6.8824 0.0090262 ** 
age:indus       1     4.7     4.7    0.5778 0.4476205    
age:ptratio     1    28.5    28.5    3.5049 0.0618937 .  
nox:zn          1     2.7     2.7    0.3290 0.5665511    
nox:crim        1    35.2    35.2    4.3323 0.0380099 *  
nox:rad         1    46.8    46.8    5.7587 0.0168484 *  
nox:indus       1    85.3    85.3   10.4926 0.0012952 ** 
nox:ptratio     1     9.1     9.1    1.1189 0.2907682    
zn:crim         1    23.0    23.0    2.8271 0.0934414 .  
zn:rad          1     0.4     0.4    0.0551 0.8145504    
zn:indus        1     0.0     0.0    0.0012 0.9725670    
zn:ptratio      1     0.0     0.0    0.0017 0.9666820    
crim:rad        1    46.2    46.2    5.6793 0.0176164 *  
crim:indus      1    17.5    17.5    2.1490 0.1434186    
crim:ptratio    1     8.9     8.9    1.0946 0.2960597    
rad:indus       1     1.3     1.3    0.1654 0.6844432    
rad:ptratio     1     3.1     3.1    0.3761 0.5400088    
indus:ptratio   1    20.5    20.5    2.5250 0.1128196    
Residuals     414  3366.6     8.1                        
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I know it's long, but with some effort you could see which of the combinations had some information. The "stars" are very misleading and you should not believe them at the 0.05 level. You would be well-advised to apply some theory before throwing all possible variables at such a procedure.

Looking at this, however, I would might move the nox, zn, and crim "outside" while keep the chad:rad and chas:tax interaction for further consideration. Since they only showed modest strength in a reduced model (p=0.01 for one and p=0.95 for the other), I might consider dropping it, too. Remember you would be looking at a really large set of hypotheses and p < 0.05 is entirely unreasonable. Also remember that these are only testing linear relationships and these interactions may be picking up behavior that is better described by spline transformations.