Using lm and predict on data in matrices

pythonic metaphor picture pythonic metaphor · Mar 7, 2013 · Viewed 13.7k times · Source

I use R only a little bit and never use data frames, which makes understanding the correct use of predict difficult. I have my data in plain matrices, not data frames, call them a and b, which are N x p and M x p matrices respectively. I can run the regression lm(a[,1] ~ a[,-1]). I would like to use the resulting lm object to predict b[,1] from b[,-1]. My naive guess of predict(lm(a[,1] ~ a[,-1]), b[,-1]) doesn't work. What's the right syntax to use the lm to get a vector of predictions?

Answer

cbeleites unhappy with SX picture cbeleites unhappy with SX · Mar 7, 2013

You can store a whole matrix in one column of a data.frame:

x <- a [, -1]
y <- a [,  1]
data <- data.frame (y = y, x = I (x))
str (data)
## 'data.frame':    10 obs. of  2 variables:
## $ y: num  0.818 0.767 -0.666 0.788 -0.489 ...
## $ x: AsIs [1:10, 1:9] 0.916274.... 0.386565.... 0.703230.... -2.64091.... 0.274617.... ...

model <- lm (y ~ x)
newdata <- data.frame (x = I (b [, -1]))
predict (model, newdata) 
##         1         2 
## -3.795722 -4.778784 

The paper about the pls package, (Mevik, B.-H. and Wehrens, R. The pls Package: Principal Component and Partial Least Squares Regression in R Journal of Statistical Software, 2007, 18, 1 - 24.) explains this technique.

Another example with a spectroscopic data set (quinine fluorescence), is in vignette ("flu") of my package hyperSpec.