R predict() function returning wrong/too many values

user3658874 picture user3658874 · Sep 30, 2014 · Viewed 29k times · Source

I am trying to convert Absorbance (Abs) values to Concentration (ng/mL), based on an established linear model & standard curve. I planned to do this by using the predict() function. I am having trouble getting predict() to return the desired results. Here is a sample of my code:

Standards<-data.frame(ng_mL=c(0,0.4,1,4),
                      Abs550nm=c(1.7535,1.5896,1.4285,0.9362))
LM.2<-lm(log(Standards[['Abs550nm']])~Standards[['ng_mL']])
Abs<-c(1.7812,1.7309,1.3537,1.6757,1.7409,1.7875,1.7533,1.8169,1.753,1.6721,1.7036,1.6707,
       0.3903,0.3362,0.2886,0.281,0.3596,0.4122,0.218,0.2331,1.3292,1.2734)


predict(object=LM.2,
        newdata=data.frame(Concentration=Abs[1]))#using Abs[1] as an example, but I eventually want predictions for all values in Abs

Running that last lines gives this output:

> predict(object=LM.2,
+         newdata=data.frame(Concentration=Abs[1]))
         1          2          3          4 
 0.5338437  0.4731341  0.3820697 -0.0732525 
Warning message:
'newdata' had 1 row but variables found have 4 rows 

This does not seem to be the output I want. I am trying to get a single predicted Concentration value for each Absorbance (Abs) entry. It would be nice to be able to predict all of the entries at once and add them to an existing data frame, but I can't even get it to give me a single value correctly. I've read many threads on here, webpages found on Google, and all of the help files, and for the life of me I cannot understand what is going on with this function. Any help would be appreciated, thanks.

Answer

Gavin Simpson picture Gavin Simpson · Sep 30, 2014

You must have a variable in newdata that has the same name as that used in the model formula used to fit the model initially.

You have two errors:

  1. You don't use a variable in newdata with the same name as the covariate used to fit the model, and
  2. You make the problem much more difficult to resolve because you abuse the formula interface.

Don't fit your model like this:

mod <- lm(log(Standards[['Abs550nm']])~Standards[['ng_mL']])

fit your model like this

mod <- lm(log(Abs550nm) ~ ng_mL, data = standards)

Isn't that some much more readable?

To predict you would need a data frame with a variable ng_mL:

predict(mod, newdata = data.frame(ng_mL = c(0.5, 1.2)))

Now you may have a third error. You appear to be trying to predict with new values of Absorbance, but the way you fitted the model, Absorbance is the response variable. You would need to supply new values for ng_mL.

The behaviour you are seeing is what happens when R can't find a correctly-named variable in newdata; it returns the fitted values from the model or the predictions at the observed data.

This makes me think you have the formula back to front. Did you mean:

mod2 <- lm(ng_mL ~ log(Abs550nm), data = standards)

?? In which case, you'd need

predict(mod2, newdata = data.frame(Abs550nm = c(1.7812,1.7309)))

say. Note you don't need to include the log() bit in the name. R recognises that as a function and applies to the variable Abs550nm for you.

If the model really is log(Abs550nm) ~ ng_mL and you want to find values of ng_mL for new values of Abs550nm you'll need to invert the fitted model in some way.