How to fit a linear regression model with two principal components in R?

phpdash picture phpdash · Nov 26, 2009 · Viewed 21k times · Source

Let's say I have a data matrix d

pc = prcomp(d)

# pc1 and pc2 are the principal components  
pc1 = pc$rotation[,1] 
pc2 = pc$rotation[,2]

Then this should fit the linear regression model right?

r = lm(y ~ pc1+pc2)

But then I get this error :

Errormodel.frame.default(formula = y ~ pc1+pc2, drop.unused.levels = TRUE) : 
   unequal dimensions('pc1')

I guess there a packages out there who do this automatically, but this should work too?

Answer

Ben Bolker picture Ben Bolker · Nov 27, 2009

Answer: you don't want pc$rotation, it's the rotation matrix and not the matrix of rotated values (scores).

Make up some data:

x1 = runif(100)
x2 = runif(100)
y = rnorm(2+3*x1+4*x2)
d = cbind(x1,x2)

pc = prcomp(d)
dim(pc$rotation)
## [1] 2 2

Oops. The "x" component is what we want. From ?prcomp:

x: if ‘retx’ is true the value of the rotated data (the centred (and scaled if requested) data multiplied by the ‘rotation' matrix) is returned.

dim(pc$x)
## [1] 100   2
lm(y~pc$x[,1]+pc$x[,2])
## 
## Call:
## lm(formula = y ~ pc$x[, 1] + pc$x[, 2])

## Coefficients:
## (Intercept)    pc$x[, 1]    pc$x[, 2]  
##     0.04942      0.14272     -0.13557