nlme error "Invalid formula for groups" although random effect specified

Michelle picture Michelle · Dec 29, 2011 · Viewed 16.2k times · Source

I have done some searching for this, but the mailing list posts I have found are associated with the person not specifying a random effect in nlme whereas I have done this. I also own the book Mixed Effect Models in S and S-Plus by Pinheiro and Bates, but can't work out my problem from the book.

I'm still working on my nutrient data analysis, and have now shifted onto real data. The data come from a population survey, and feature a repeated measures design as each respondent has two 24-hour intake recalls for the nutrient.

I have successfully fit a lme4 model to my data, and now I am trying to find out what happens if I use a nonlinear method instead. A snapshot of my data is below:

head(Male.Data)
   RespondentID Age SampleWeight  IntakeDay IntakeAmt AgeFactor BoxCoxXY
2        100020  12    0.4952835 Day1Intake 12145.852     9to13 15.61196
7        100419  14    0.3632839 Day1Intake  9591.953    14to18 15.01444
8        100459  11    0.4952835 Day1Intake  7838.713     9to13 14.51458
12       101138  15    1.3258785 Day1Intake 11113.266    14to18 15.38541
14       101214   6    2.1198688 Day1Intake  7150.133      4to8 14.29022
18       101389   5    2.1198688 Day1Intake  5091.528      4to8 13.47928

And the summary information about the data is:

str(Male.Data)
'data.frame':   4498 obs. of  7 variables:
$ RespondentID: Factor w/ 4487 levels "100013","100020",..: 2 7 8 12 14 18 19 20 21 22 ...
$ Age         : int  12 14 11 15 6 5 10 2 2 9 ...
$ SampleWeight: num  0.495 0.363 0.495 1.326 2.12 ...
$ IntakeDay   : Factor w/ 2 levels "Day1Intake","Day2Intake": 1 1 1 1 1 1 1 1 1 1 ...
$ IntakeAmt   : num  12146 9592 7839 11113 7150 ...
$ AgeFactor   : Factor w/ 4 levels "1to3","4to8",..: 3 4 3 4 2 2 3 1 1 3 ...
$ BoxCoxXY    : num  15.6 15 14.5 15.4 14.3 ...

Using the lme4 package, I have successfully fit a linear mixed effects model using (the random effect is from the subjects and IntakeDay is the repeated measure factor associated with BoxCoxXY, which is a transform of IntakeAmt):

Male.lme1 <- lmer(BoxCoxXY ~ AgeFactor + IntakeDay + (1|RespondentID),
        data = Male.Data, 
        weights = SampleWeight)

I have been trying to use the nlme package to look at fitting a nonlinear model to compare the two, but I cannot get my syntax to work. My initial problem was that there does not seem to be a relevant SelfStart model for my data, so I used geeglm to generate starting values (coefficients saved to a data frame called Male.nlme.start). But now I just get the error:

Error in getGroups.data.frame(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]),  : 
Invalid formula for groups

I can't work out what I am doing wrong, the nlme syntax I am using is:

Male.nlme1 <- nlme(BoxCoxXY ~ AgeFactor + IntakeDay + RespondentID , data = Male.Data, 
fixed = AgeFactor + IntakeDay ~ 1, 
random = RespondentID ~ 1,
start=c(Male.nlme.start[,"Estimate"]))

I have tried the analysis both with and without RespondentID being included in the overall model specification, and this seems to have no impact.

The reason I am trying to persevere with the nonlinear method is that the original analysis in SAS used a nonlinear approach. While my residuals etc look acceptably good from the lme analysis, I am curious to see what impact a nonlinear approach would have.

In case it is helpful, the traceback() results from the last analysis attempt, which includes RespondentID is:

5: stop("Invalid formula for groups")
4: getGroups.data.frame(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]), 
   sep = "|"))))
3: getGroups(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]), 
   sep = "|"))))
2: nlme.formula(BoxCoxXY ~ AgeFactor + IntakeDay, data = Male.Data, 
   fixed = AgeFactor + IntakeDay ~ 1, random = RespondentID ~ 
       1, start = c(Male.nlme.start[, "Estimate"]))
1: nlme(BoxCoxXY ~ AgeFactor + IntakeDay, data = Male.Data, fixed = AgeFactor + 
   IntakeDay ~ 1, random = RespondentID ~ 1, start = c(Male.nlme.start[, 
   "Estimate"])) 

Can anyone suggest where I have gone wrong? I'm starting to wonder if either (1) there are too many factor levels for RespondentID to work in nlme or (2) the method will only work if I supply a start parameter for RespondentID, which seems nonsensical with the data I have as this is my subject identifier.

Update: to answer Ben, the SAS nlmixed model is a general log likelihood function for the fixed effects:

ll1 <- log(1/sqrt(2*pi*Scale))
ll2 <- as.data.frame(-(BoxCoxXY - Intercept + AgeFactor + IntakeDay + u2)^2)/(2*Scale)+(Lambda.Value-1)*log(IntakeAmt)
ll <- ll1 + ll2
model IntakeAmt ~ general(ll)

where:

Scale = dispersion value from geeglm and

Lambda.Value = lambda value associated with the maximum log likelihood output from an earlier boxcox() which was used to transform IntakeAmt to BoxCoxXY through the formula Male.Data$BoxCoxXY <- (Male.Data$IntakeAmt^Lambda.Value-1)/Lambda.Value

The random statement in the SAS code is:

random u1 u2 ~ normal([0,0][&vu1,COV_U1U2,&vu2]) subject=RespondentID

so there are two error terms in the model and they are both being fit as random effects. The second square bracket represents the lower triangle of the random-effects variance matrix listed in row order, and is specified using SAS macro variables in the SAS syntax.

The summary of the model that I have been given is the normal one-line overview that shows matrix of covariates (BX) plus an error component, so it's not a lot of help here.

Second update: I realised that I had not removed the RespondentID levels associated with the female subjects as I factorised RespondentID over the entire data frame before I did the split into separate data frames, by gender, for analysis. I have repeated the nlme analysis after removing unused factor levels for RespondentID and I get the same error. The lmer results are the same - which is good to know. :)

Answer