Receiving a "Variable Lengths Differ" error in predict

MattLH picture MattLH · Apr 3, 2015 · Viewed 16.6k times · Source

I have been receiving a the above message while trying to test the accuracy of my model. The plan was to predict the last 15 time points and compare them to the actual data for error values, but for some reason I got the "Variable Lengths Differ" error message.

This is using johnson and johnson data (data(jj)) from the astsa package. Here is the code and relevant errors-

> ##set up JJ data and time because its quarterly data
> X.all<-jj[1:84]
> t<-time(jj)
> 
> values<-length(t)-15
> ts<-t[1:values]
> tsq<-ts^2/factorial(2)
> X<-X.all[1:values]
> year.first<-values+1
> year.last<-length(t)
> ##setting t for 15 values using quarterly idea
> new<-data.frame(ts=t[year.first:year.last])
> X.true<-X.all[(values+1):length(t)]
> fit1<-lm(X~ts+tsq)
> Xhat<-predict(fit1,new,se.fit=TRUE)
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  variable lengths differ (found for 'tsq')
In addition: Warning message:
'newdata' had 15 rows but variables found have 69 rows 

> X.hat<-round(Xhat$fit,2)
> error<-X.true-X.hat

Answer

Thomas picture Thomas · Apr 3, 2015

The issue is that you're trying to call predict with a newdata argument that does not contain all of the variables used in your model. new only contains ts, not tsq. You can solve this by:

  1. Creating a data.frame new that contains both ts and tsq, OR
  2. A better solution is to define tsq using I() notation in your model specification, like: lm(X ~ ts + I(ts^2/factorial(2))). The I() notation generates transformations automatically, so that you don't have to manually create power terms, etc. just to include them in your lm specification.

As an example, you could try this out with the iris dataset to see how it works better than your current approach:

fit1 <- lm(Sepal.Length ~ Sepal.Width + I(Sepal.Width^2/factorial(2)), data = iris)
new <- data.frame(Sepal.Width = seq(1,5,by = 0.25))
predict(fit1, new)

We can compare this to your approach and observe the error you're encountering:

s2 <- I(iris$Sepal.Width^2/factorial(2))
fit1 <- lm(Sepal.Length ~ Sepal.Width + s2, data = iris)
new <- data.frame(Sepal.Width = seq(1,5,by = 0.25))
predict(fit1, new)
# Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
#   variable lengths differ (found for 's2')
# In addition: Warning message:
# 'newdata' had 17 rows but variables found have 150 rows