I have done some searching for this, but the mailing list posts I have found are associated with the person not specifying a random effect in nlme
whereas I have done this. I also own the book Mixed Effect Models in S and S-Plus by Pinheiro and Bates, but can't work out my problem from the book.
I'm still working on my nutrient data analysis, and have now shifted onto real data. The data come from a population survey, and feature a repeated measures design as each respondent has two 24-hour intake recalls for the nutrient.
I have successfully fit a lme4 model to my data, and now I am trying to find out what happens if I use a nonlinear method instead. A snapshot of my data is below:
head(Male.Data)
RespondentID Age SampleWeight IntakeDay IntakeAmt AgeFactor BoxCoxXY
2 100020 12 0.4952835 Day1Intake 12145.852 9to13 15.61196
7 100419 14 0.3632839 Day1Intake 9591.953 14to18 15.01444
8 100459 11 0.4952835 Day1Intake 7838.713 9to13 14.51458
12 101138 15 1.3258785 Day1Intake 11113.266 14to18 15.38541
14 101214 6 2.1198688 Day1Intake 7150.133 4to8 14.29022
18 101389 5 2.1198688 Day1Intake 5091.528 4to8 13.47928
And the summary information about the data is:
str(Male.Data)
'data.frame': 4498 obs. of 7 variables:
$ RespondentID: Factor w/ 4487 levels "100013","100020",..: 2 7 8 12 14 18 19 20 21 22 ...
$ Age : int 12 14 11 15 6 5 10 2 2 9 ...
$ SampleWeight: num 0.495 0.363 0.495 1.326 2.12 ...
$ IntakeDay : Factor w/ 2 levels "Day1Intake","Day2Intake": 1 1 1 1 1 1 1 1 1 1 ...
$ IntakeAmt : num 12146 9592 7839 11113 7150 ...
$ AgeFactor : Factor w/ 4 levels "1to3","4to8",..: 3 4 3 4 2 2 3 1 1 3 ...
$ BoxCoxXY : num 15.6 15 14.5 15.4 14.3 ...
Using the lme4
package, I have successfully fit a linear mixed effects model using (the random effect is from the subjects and IntakeDay
is the repeated measure factor associated with BoxCoxXY
, which is a transform of IntakeAmt
):
Male.lme1 <- lmer(BoxCoxXY ~ AgeFactor + IntakeDay + (1|RespondentID),
data = Male.Data,
weights = SampleWeight)
I have been trying to use the nlme
package to look at fitting a nonlinear model to compare the two, but I cannot get my syntax to work. My initial problem was that there does not seem to be a relevant SelfStart model for my data, so I used geeglm
to generate starting values (coefficients saved to a data frame called Male.nlme.start
). But now I just get the error:
Error in getGroups.data.frame(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]), :
Invalid formula for groups
I can't work out what I am doing wrong, the nlme
syntax I am using is:
Male.nlme1 <- nlme(BoxCoxXY ~ AgeFactor + IntakeDay + RespondentID , data = Male.Data,
fixed = AgeFactor + IntakeDay ~ 1,
random = RespondentID ~ 1,
start=c(Male.nlme.start[,"Estimate"]))
I have tried the analysis both with and without RespondentID
being included in the overall model specification, and this seems to have no impact.
The reason I am trying to persevere with the nonlinear method is that the original analysis in SAS used a nonlinear approach. While my residuals etc look acceptably good from the lme analysis, I am curious to see what impact a nonlinear approach would have.
In case it is helpful, the traceback()
results from the last analysis attempt, which includes RespondentID
is:
5: stop("Invalid formula for groups")
4: getGroups.data.frame(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]),
sep = "|"))))
3: getGroups(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]),
sep = "|"))))
2: nlme.formula(BoxCoxXY ~ AgeFactor + IntakeDay, data = Male.Data,
fixed = AgeFactor + IntakeDay ~ 1, random = RespondentID ~
1, start = c(Male.nlme.start[, "Estimate"]))
1: nlme(BoxCoxXY ~ AgeFactor + IntakeDay, data = Male.Data, fixed = AgeFactor +
IntakeDay ~ 1, random = RespondentID ~ 1, start = c(Male.nlme.start[,
"Estimate"]))
Can anyone suggest where I have gone wrong? I'm starting to wonder if either (1) there are too many factor levels for RespondentID
to work in nlme
or (2) the method will only work if I supply a start parameter for RespondentID
, which seems nonsensical with the data I have as this is my subject identifier.
Update: to answer Ben, the SAS nlmixed
model is a general log likelihood function for the fixed effects:
ll1 <- log(1/sqrt(2*pi*Scale))
ll2 <- as.data.frame(-(BoxCoxXY - Intercept + AgeFactor + IntakeDay + u2)^2)/(2*Scale)+(Lambda.Value-1)*log(IntakeAmt)
ll <- ll1 + ll2
model IntakeAmt ~ general(ll)
where:
Scale
= dispersion value from geeglm
and
Lambda.Value
= lambda value associated with the maximum log likelihood output from an earlier boxcox()
which was used to transform IntakeAmt
to BoxCoxXY
through the formula Male.Data$BoxCoxXY <- (Male.Data$IntakeAmt^Lambda.Value-1)/Lambda.Value
The random
statement in the SAS code is:
random u1 u2 ~ normal([0,0][&vu1,COV_U1U2,&vu2]) subject=RespondentID
so there are two error terms in the model and they are both being fit as random effects. The second square bracket represents the lower triangle of the random-effects variance matrix listed in row order, and is specified using SAS macro variables in the SAS syntax.
The summary of the model that I have been given is the normal one-line overview that shows matrix of covariates (BX) plus an error component, so it's not a lot of help here.
Second update: I realised that I had not removed the RespondentID levels associated with the female subjects as I factorised RespondentID over the entire data frame before I did the split into separate data frames, by gender, for analysis. I have repeated the nlme
analysis after removing unused factor levels for RespondentID and I get the same error. The lmer
results are the same - which is good to know. :)