Prevent NA from being used in a lm regresion

r lm
Robert Kubrick picture Robert Kubrick · Dec 9, 2011 · Viewed 33.3k times · Source

I have a vector Y containing future returns and a vector X contain current returns. The last Y element is NA, as the last current return is also the very end of the available series.

X = { 0.1, 0.3, 0.2, 0.5 }
Y = { 0.3, 0.2, 0.5, NA }
Other = { 5500, 222, 523, 3677 }

lm(Y ~ X + Other)

I want to make sure that the last element of each series is not included in the regression. I read the na.action documentation but I'm not clear if this is the default behaviour.

For cor(), is this the correct solution to exclude X[4] and Y[4] from the calculation?

cor(X, Y, use = "pairwise.complete.obs")

Answer

NPE picture NPE · Dec 9, 2011

The factory-fresh default for lm is to disregard observations containing NA values. Since this could be overridden using global options, you might want to explicitly set na.action to na.omit:

> summary(lm(Y ~ X + Other, na.action=na.omit))

Call:
lm(formula = Y ~ X + Other, na.action = na.omit)

[snip]

  (1 observation deleted due to missingness)
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

As to your second question cor(X,Y,use='pairwise.complete.obs') is correct. Since there are only two variables, cor(X,Y,use='complete.obs') would also produce the expected result.