Model matrix with all pairwise interactions between columns

encircled picture encircled · Mar 26, 2014 · Viewed 12.5k times · Source

Let's say that I have a numeric data matrix with columns w, x, y, z and I also want to add in the columns that are equivalent to w*x, w*y, w*z, x*y, x*z, y*z since I want my covariate matrix to include all pairwise interactions.

Is there a clean and effective way to do this?

Answer

Gavin Simpson picture Gavin Simpson · Mar 26, 2014

If you mean in a model formula, then the ^ operator does this.

## dummy data
set.seed(1)
dat <- data.frame(Y = rnorm(10), x = rnorm(10), y = rnorm(10), z = rnorm(10))

The formula is

form <- Y ~ (x + y + z)^2

which gives (using model.matrix() - which is used internally by the standard model fitting functions)

model.matrix(form, data = dat)

R> form <- Y ~ (x + y + z)^2
R> form
Y ~ (x + y + z)^2
R> model.matrix(form, data = dat)
   (Intercept)        x        y        z       x:y       x:z      y:z
1            1  1.51178  0.91898  1.35868  1.389293  2.054026  1.24860
2            1  0.38984  0.78214 -0.10279  0.304911 -0.040071 -0.08039
3            1 -0.62124  0.07456  0.38767 -0.046323 -0.240837  0.02891
4            1 -2.21470 -1.98935 -0.05381  4.405817  0.119162  0.10704
5            1  1.12493  0.61983 -1.37706  0.697261 -1.549097 -0.85354
6            1 -0.04493 -0.05613 -0.41499  0.002522  0.018647  0.02329
7            1 -0.01619 -0.15580 -0.39429  0.002522  0.006384  0.06143
8            1  0.94384 -1.47075 -0.05931 -1.388149 -0.055982  0.08724
9            1  0.82122 -0.47815  1.10003 -0.392667  0.903364 -0.52598
10           1  0.59390  0.41794  0.76318  0.248216  0.453251  0.31896
attr(,"assign")
[1] 0 1 2 3 4 5 6

If you don't know how many variables you have, or it is tedious to write out all of them, use the . notation too

R> form <- Y ~ .^2
R> model.matrix(form, data = dat)
   (Intercept)        x        y        z       x:y       x:z      y:z
1            1  1.51178  0.91898  1.35868  1.389293  2.054026  1.24860
2            1  0.38984  0.78214 -0.10279  0.304911 -0.040071 -0.08039
3            1 -0.62124  0.07456  0.38767 -0.046323 -0.240837  0.02891
4            1 -2.21470 -1.98935 -0.05381  4.405817  0.119162  0.10704
5            1  1.12493  0.61983 -1.37706  0.697261 -1.549097 -0.85354
6            1 -0.04493 -0.05613 -0.41499  0.002522  0.018647  0.02329
7            1 -0.01619 -0.15580 -0.39429  0.002522  0.006384  0.06143
8            1  0.94384 -1.47075 -0.05931 -1.388149 -0.055982  0.08724
9            1  0.82122 -0.47815  1.10003 -0.392667  0.903364 -0.52598
10           1  0.59390  0.41794  0.76318  0.248216  0.453251  0.31896
attr(,"assign")
[1] 0 1 2 3 4 5 6

The "power" in the ^ operator, here 2, controls the order of interactions. With ^2 we get second order interactions of all pairs of variables considered by the ^ operator. If you want up to 3rd-order interactions, then use ^3.

R> form <- Y ~ .^3
R> head(model.matrix(form, data = dat))
  (Intercept)        x        y        z       x:y      x:z      y:z     x:y:z
1           1  1.51178  0.91898  1.35868  1.389293  2.05403  1.24860  1.887604
2           1  0.38984  0.78214 -0.10279  0.304911 -0.04007 -0.08039 -0.031341
3           1 -0.62124  0.07456  0.38767 -0.046323 -0.24084  0.02891 -0.017958
4           1 -2.21470 -1.98935 -0.05381  4.405817  0.11916  0.10704 -0.237055
5           1  1.12493  0.61983 -1.37706  0.697261 -1.54910 -0.85354 -0.960170
6           1 -0.04493 -0.05613 -0.41499  0.002522  0.01865  0.02329 -0.001047