Predict.glm not predicting missing values in response

generic_user picture generic_user · Apr 28, 2013 · Viewed 14.3k times · Source

For some reason, when I specify glms (and lm's too, it turns out), R is not predicting missing values of the data. Here is an example:

y = round(runif(50))
y = c(y,rep(NA,50))
x = rnorm(100)
m = glm(y~x, family=binomial(link="logit"))
p = predict(m,na.action=na.pass)
length(p)

y = round(runif(50))
y = c(y,rep(NA,50))
x = rnorm(100)
m = lm(y~x)
p = predict(m)
length(p)

The length of p should be 100, but its 50. The weird thing is that I have other predicts in the same script that do predict from missing data.

EDIT: It turns out that those other predicts were quite wrong -- I was doing imputed.value = rnorm(N,mean.from.predict,var.of.prediction.interval). This recycled the mean and sd vectors from the lm predict or glm predict functions when length(predict)<N, which was quite different from what I was seeking.

So my question is what about my example code is stopping glm and lm from predicting missing values?

Thanks!

Answer

Hong Ooi picture Hong Ooi · Apr 28, 2013

When glm fits the model, it uses only the cases where there are no missing values. You can still get predictions for the cases where your y values are missing, by constructing a data frame and passing that to predict.glm.

predict(m, newdata=data.frame(y, x))