So here's my problem. I have a data set in R that I need to run a mixed effects model on. Here's the code:
data <- read.csv("D:/blahblah.csv")
analysis.data <- lmer(intdiff ~ stress_limit * word_position * follows + (1|speaker), data)
summary(analysis.data)
When I try to run the script, it returns the following error:
Error in mer_finalize(ans) : Downdated X'X is not positive definite, 15.
I have tracked the error down to the "follows" parameter because when I just use stress_limit and word_position, it runs fine. If it helps, the data in "follows" are only 3 strings: n or l, consonant, vowel. I've tried replacing the spaces with _ but with no success. Is there something about the internal workings of the lmer() function that is preventing the use of "follows" in this case? Any help would be great!
For more info: intdiff contains numeric values, stress_limit is strings (Stressed or Unstressed) and word position is also strings (Word Medial or Word Initial).
EDIT: Here is a data sample that reproduces the error:
structure(list(intdiff = c(11.45007951, 12.40144758, 13.47898367,
6.279497762, 18.19461897, 16.15539707), word_position = structure(c(2L,
2L, 2L, 1L, 1L, 1L), .Label = c("Word Initial", "Word Medial"
), class = "factor"), follows = structure(c(4L, 4L, 4L, 1L, 2L,
4L), .Label = c("Consonant", "n or l", "Pause", "Vowel"), class = "factor"),
stress_limit = structure(c(2L, 1L, 1L, 2L, 2L, 2L), .Label = c("Stressed",
"Unstressed"), class = "factor"), speaker = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("f11r", "f13r", "f15a", "f16a",
"m09a", "m10a", "m12r", "m14r"), class = "factor")), .Names = c("intdiff",
"word_position", "follows", "stress_limit", "speaker"), row.names = c(NA,
6L), class = "data.frame")
I tried the lme() function as well, but it returned this error:
Error in MEEM(object, conLin, control$niterEM) :
Singularity in backsolve at level 0, block 1
The code in my original post is the exact code I'm using, minus the library(lme4) call, so I'm not leaving any information out I can think of.
My R version is 2.15.2
Hard to tell for sure without a reproducible example: How to make a great R reproducible example?
But, guessing: these sorts of problems are generally due to collinearity in the design matrix. Centering your continuous predictor (intdiff
) may help. You can also explore the design matrix directly
X <- model.matrix( ~ stress_limit * word_position * follows, data)
Collinearity between pairs: cor(X)
. Unfortunately I don't have a suggestion for detecting multi-collinearity (i.e. not between pairs, but between combinations of >2 predictors) off the top of my head, although you can look into the tools for computing variance inflation factors (e.g. library("sos"); findFn("VIF")
).
As a cross-check, lme
should also be able to handle your model:
library(nlme)
lme(intdiff ~ stress_limit * word_position * follows,
random=~1|speaker, data=data)
When I run your test data in the development version of lme4 (available on github) I get Error in lmer(intdiff ~ stress_limit * word_position * follows + (1 | : rank of X = 5 < ncol(X) = 12
. On the other hand, with this small an input data set (6 observations), there's no possible way you could fit 12 parameters. It's a little harder to tell exactly where your problem is. Do all 12 combinations of your 3 variables actually occur in your data? If some are missing, then you need to follow the advice given in the development version's help:
Unlike some simpler modeling frameworks such as ‘lm’ and ‘glm’ which automatically detect perfectly collinear predictor variables, ‘[gn]lmer’ cannot handle design matrices of less than full rank. For example, in cases of models with interactions that have unobserved combinations of levels, it is up to the user to define a new variable (for example creating ‘ab’ within the data from the results of ‘droplevels(interaction(a,b))’).
In particular, you can fit this model as follows:
data <- transform(data,
allcomb=interaction(stress_limit,word_position,follow,drop=TRUE))
lme(intdiff ~ allcomb, random=~1|speaker, data=data)
This will give you a one-way ANOVA treating the unique combinations of levels that are actually present in the data as the categories. You'll have to figure out for yourself what they mean.
The alternative is to reduce the number of interactions in the model until you get to a set that don't have any missing combinations; if you're lucky (stress_limit+word_position+follow)^2
(all two-way interactions) will work, but you might have to reduce the model still farther (e.g. stress_limit + word_position*follow
).
Another way to test this is to use lm()
on your proposed models and check that there are no NA
values in the estimated coefficients.
The main thing you will be losing in these ways is convenience/ease of interpretation, because the parameters for the missing combinations couldn't have been estimated from the data anyway ...