Error in R - no applicable method for 'predict' applied to an object of class "formula"

Moritz B picture Moritz B · Dec 12, 2016 · Viewed 12.9k times · Source

I'd like to ask for help with the predict function. I want to get a fitting line to my data analog to abline(). For a different system I used this approach before.

mod1<-glm(data$Lengthmm ~ data$qbH.yr.med, family=quasipoisson,
    subset = data$Age==1)

xv <- seq(min(data$qbH.yr.med), max(data$qbH.yr.med), 
    length.out = length(data$Lengthmm))                #    poisson regression

yv <- predict(mod1 ~ data$qbH.yr.med, family=quasipoisson, list(x = xv))

Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "formula"

typeof(mod1)
# [1] "list"
typeof(xv)
# [1] "double"
class(mod1)
# [1] "glm" "lm" 
class(xv)
# [1] "numeric"

I have no idea why it asks for the "formula" as non of my factors are of this class... I would by happy about help, or ideas for this error.

Answer

Ben Bolker picture Ben Bolker · Dec 12, 2016

As others have commented, it's hard to see how this could ever have worked in the past. There are a few points here:

  • it's best practice to supply a data argument and use only the names of the variables (i.e. Lengthhmm, not data$Lengthmm), especially if you want predict() and other post-fitting machinery to work
  • for predict you should supply the fitted model and (optionally) a newdata argument that matches the original data frame
  • it's a good idea not to call your data data (this masks a built-in R function, although it doesn't usually cause trouble)

Making up a reproducible example:

set.seed(101)
dd <- data.frame(Lengthmm=1:10,qbH.yr.med=rpois(10,1),
                 Age=rep(1,10))

Fitting:

mod1 <- glm(Lengthmm ~ qbH.yr.med, family=quasipoisson,
            data=dd,
            subset = (Age==1))
xv <- with(dd,
         data.frame(qbH.yr.med=seq(min(qbH.yr.med), max(qbH.yr.med), 
                        length.out = length(Lengthmm))))
yv <- predict(mod1, newdata=xv)

By the way, it seems a bit fishy to use family=quasipoisson for a response called Lengthmm - I would generally think that lengths would be continuous, and hence more likely to be Normal or log-Normal (or some other transformation of Normal) rather than Poisson-distributed or distributed with a variance proportional to their mean (i.e. "quasi-Poisson" ...)