Plotting predicted values from a lme model (with polynomials) in R

jjulip picture jjulip · Jan 31, 2015 · Viewed 7.1k times · Source

I am using linear mixed-effect model (run with the lme() function in the nlme package in R) that has one fixed effect, and one random intercept term (to account for different groups). The model is a cubic polynomial model specified as so (following advice given below):

   M1 = lme(dv ~ poly(iv,3), data=dat, random= ~1|group, method="REML")

Some example data only:

> dput(dat)
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"2"), class = "factor"), iv = c(24L, 100L, 110L, 115L, 116L, 
120L, 125L, 127L, 138L, 139L, 142L, 150L, 152L, 154L, 157L, 161L, 
168L, 177L, 181L, 189L, 190L, 198L, 200L, 213L, 216L, 225L, 254L, 
284L, 40L, 51L, 76L, 130L, 155L, 158L, 160L, 163L, 167L, 169L, 
170L, 177L, 185L, 190L, 203L, 206L, 208L, 219L, 223L, 233L, 238L, 
244L, 251L, 260L, 265L), dv = c(0L, 8L, 6L, 8L, 10L, 10L, 9L, 
11L, 12L, 15L, 16L, 19L, 13L, 10L, 17L, 22L, 18L, 22L, 25L, 20L, 
27L, 28L, 29L, 30L, 29L, 30L, 30L, 30L, 0L, 0L, 2L, 7L, 14L, 
12L, 17L, 10L, 14L, 13L, 16L, 15L, 17L, 21L, 25L, 20L, 26L, 27L, 
28L, 29L, 30L, 30L, 30L, 30L, 30L)), .Names = c("group", "iv", 
"dv"), row.names = c(NA, -53L), class = "data.frame")

I would now like to plot fitted values using the predict function (the values of iv are not continuous in the dataset so I would like to improve the appearance /smoothness of a fitted curve).

Using on-line examples on how to plot predicted values from a simple lme model (without polynomials) (see here: Extract prediction band from lme fit and http://glmm.wikidot.com/faq), I can plot predicted ‘population’ means for an lme without polynomials using the below code:

#model without polynomials
dat$group = factor(dat$group)
M2 = lme(dv ~ iv, data=dat, random= ~1|group, method="REML")

#1.create new data frame with new values for predictors (where groups aren't accounted for)
range(dat$iv)
new.dat = data.frame(iv = seq(from =24, to =284, by=1))

#2. predict the mean population response
new.dat$pred = predict(M2, newdata=new.dat, level=0)

#3. create a design matrix
Designmat <- model.matrix(eval(eval(M2$call$fixed)[-2]), new.dat[-ncol(new.dat)])

#4. get standard error and CI for predictions
predvar <- diag(Designmat %*% M2$varFix %*% t(Designmat)) 
new.dat$SE <- sqrt(predvar) 
new.dat$SE2 <- sqrt(predvar+M2$sigma^2)

# Create plot with different colours for grouping levels and plot predicted values for population mean
G1 = dat[dat$group==1, ]
G2 = dat[dat$group==2, ]

plot(G1$iv, G1$dv, xlab="iv", ylab="dv", ylim=c(0,30), xlim=c(0,350), pch=16, col=2)
points(G2$iv, G2$dv, xlab="", ylab="", ylim=c(0,30), xlim=c(0,350), pch=16, col=3)

F0 = new.dat$pred
I = order(new.dat$iv); eff = sort(new.dat$iv)
lines(eff, F0[I], lwd=2, type="l", ylab="", xlab="", col=1, xlim=c(0,30))
#lines(eff, F0[I] + 2 * new.dat$SE[I], lty = 2)
#lines(eff, F0[I] - 2 * new.dat$SE[I], lty = 2)

I would like expand this code to 1) plot the within-group predicted lines as well as the mean population values and 2) determine how the code can be adapted to plot predicted ‘population’ and ‘within-group’ curves for an lme with polynomials (i.e. model M1 above).

Obtaining group predictions: I can obtain one set of predicted values for groups using the below code, but I would like to plot a line for each group, as well as the population mean, and in the case of the example data I cannot see how predicted values for two group lines could be extracted?

new.dat = data.frame(iv = dat$iv, group=rep(c("1","2"),c(28,25)))
Pred = predict(M2, newdata=new.dat, level=0:1)

Also, this does not work if you want to predict a larger number of values than the number of original iv values (e.g. in cases where you have irregular data). The below obviously won’t work because of a differing number of rows, but I am struggling with the syntax.

new.dat = data.frame(iv = seq(from =24, to =284, by=1), group=rep(c("1","2"),c(28,25)))

For a polynomial model: I don't understand how one would incorporate poly(iv,3) into a new.dat data frame to feed into the predict function.

Any advice of how to achieve these two goals would be much appreciated as I have been trying to figure this out for a while with no joy (I would rather use base graphics than ggplot if possible). Thanks!

Answer

IRTFM picture IRTFM · Jan 31, 2015

Let me explain in more detail why I think your are jumping into non-linear terms too quickly and should be spending more time examining your data before considering polynomial terms:

First the more correct way to enter polynomial terms of 2nd and third order:

> M1 = lme(dv ~ poly(iv ,3), data=dat, random= ~1|group, method="REML")
> summary(M1)
Linear mixed-effects model fit by REML
 Data: dat 
       AIC      BIC    logLik
  245.4883 256.8393 -116.7442

Random effects:
 Formula: ~1 | group
        (Intercept) Residual
StdDev:    2.465855 2.435135

Fixed effects: dv ~ poly(iv, 3) 
                 Value Std.Error DF   t-value p-value
(Intercept)   18.14854  1.775524 48 10.221507  0.0000
poly(iv, 3)1  64.86375  2.476145 48 26.195452  0.0000
poly(iv, 3)2   2.76606  2.462331 48  1.123349  0.2669
poly(iv, 3)3 -13.90253  2.485106 48 -5.594339  0.0000
 Correlation: 
             (Intr) p(,3)1 p(,3)2
poly(iv, 3)1 -0.002              
poly(iv, 3)2 -0.002  0.027       
poly(iv, 3)3  0.002 -0.036 -0.030

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-2.6349301 -0.6172897  0.1653097  0.7076490  1.6581112 

Number of Observations: 53
Number of Groups: 2 

Now, why would the cubic term be significant when the quadratic term was not? Look at the data ... which should have been the first order of business rather than an after-thought:

library(lattice)
xyplot( dv ~ iv|group, dat)
png(); print(xyplot( dv ~ iv|group, dat) ); dev.off()

enter image description here

As becomes apparent with a simple plotting call their is a systematic cutoff at 30 (and possibly at 0 although the data is a bit sparse down there). So you would be attributing a ceiling effect imposed by constraints your measurement methods to some sort of non-linear term.