I encountered a problem in R when trying to run a simple linear model with a categorical variable as predictor. When running the model, R throws the error
`Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : `
The data, however, seem to be okay (data set attached below):
str(minimal)
'data.frame': 330 obs. of 2 variables:
$ swls : num 5.2 NaN 7 6 NaN NaN NaN NaN NaN NaN ...
$ exp.factor: Factor w/ 2 levels "erlebt","nicht erlebt": 1 1 1 1 2 2 2 2 NA 2 ...
There seems also to be enough variation in the data, so similar threads I found do not apply here:
table(minimal$exp.factor)
erlebt nicht erlebt
148 163
` However, lm() still refuses to work:
lm(swls ~ exp.factor, data = minimal)
Fehler in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
Kontraste können nur auf Faktoren mit 2 oder mehr Stufen angewendet werden
The really strange is that lm() works as expected with other factors (e.g., sex)
Any ideas what goes wrong here?
The Dataset:
structure(list(swls = c(5.2, NaN, 7, 6, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
6.8, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 7, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, 5.8, 2.6, NaN, NaN, NaN, NaN, NaN,
NaN, 5.4, NaN, 6.4, NaN, NaN, NaN, NaN, NaN, NaN, 6.8, NaN, NaN,
NaN, NaN, NaN, NaN, 1.2, NaN, 6.2, 6.4, 5.2, NaN, 5.4, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 4.6, NaN, NaN,
NaN, NaN, 5.8, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 5, NaN,
NaN, NaN, 6.8, NaN, NaN, NaN, 6, 7, NaN, NaN, NaN, NaN, 6, NaN,
6.4, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 6.8, NaN, NaN,
6.8, NaN, NaN, NaN, NaN, NaN, 5, 5.6, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, 3, NaN, NaN, NaN, NaN, NaN, NaN, 1.2, 4.2, NaN, 5.4,
NaN, NaN, NaN, NaN, NaN, NaN, 6.6, 5.8, NaN, NaN, NaN, 6.4, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, 2.8, NaN, 4, NaN, NaN, NaN,
6, 5, NaN, NaN, NaN, 4.4, NaN, 2, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 2.2, NaN, NaN,
NaN, 7, NaN, NaN, NaN, 5.6, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, 6.2, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
5, NaN, NaN, 5.2, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, 5.8, NaN, NaN, 3.6, 5.6, NaN, NaN, 2.8, NaN, NaN, NaN, 6.2,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, 5.8, 6.2, NaN, NaN, 5, 6.2,
NaN, NaN, NaN, NaN, NaN, NaN, 4.8, NaN, NaN, NaN, NaN, 4.8, NaN,
NaN, NaN, NaN, NaN, NaN, 4.4, NaN, NaN, 3, 5.2, NaN, 3.8, NaN,
NaN, NaN, NaN, 3, NaN, NaN, NaN, NaN, 1.6, NaN, NaN, 6.6, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN), exp.factor = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, NA, 2L, 2L, NA, 2L, 2L, NA, 2L, 1L,
2L, 2L, 2L, NA, 2L, 1L, NA, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L,
1L, 1L, NA, NA, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, NA, 2L, 1L,
1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, NA, 1L, 1L, 1L, 2L, 2L, 1L, 2L,
1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, NA, 1L, 2L,
2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 2L, 1L, 2L, NA, 2L, 1L, NA, 1L, 1L, 1L, 2L, 2L, NA, 1L, 2L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, NA, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L,
2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L,
1L, NA, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L,
1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, NA, 2L, 2L, 1L,
2L, NA, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L,
1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L,
NA, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, NA, 1L, 2L, 1L, 1L, 1L,
1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L,
1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("erlebt", "nicht erlebt"
), class = "factor")), .Names = c("swls", "exp.factor"), row.names = c(7L,
9L, 19L, 36L, 50L, 63L, 67L, 75L, 84L, 85L, 94L, 100L, 109L,
122L, 128L, 135L, 137L, 145L, 156L, 158L, 163L, 182L, 188L, 198L,
204L, 213L, 221L, 240L, 251L, 254L, 258L, 261L, 271L, 284L, 286L,
295L, 308L, 309L, 313L, 319L, 334L, 340L, 344L, 351L, 354L, 365L,
372L, 382L, 385L, 391L, 398L, 404L, 427L, 431L, 435L, 438L, 441L,
452L, 468L, 469L, 474L, 483L, 486L, 493L, 502L, 508L, 513L, 519L,
524L, 528L, 537L, 543L, 546L, 557L, 572L, 578L, 591L, 606L, 611L,
613L, 624L, 633L, 640L, 642L, 651L, 662L, 667L, 672L, 673L, 696L,
703L, 709L, 718L, 722L, 732L, 735L, 747L, 749L, 753L, 770L, 780L,
787L, 799L, 801L, 812L, 818L, 825L, 838L, 864L, 874L, 887L, 896L,
897L, 906L, 909L, 920L, 923L, 929L, 944L, 959L, 964L, 973L, 978L,
986L, 991L, 996L, 1001L, 1008L, 1014L, 1017L, 1033L, 1040L, 1046L,
1067L, 1075L, 1085L, 1090L, 1100L, 1102L, 1113L, 1144L, 1145L,
1150L, 1155L, 1158L, 1165L, 1175L, 1180L, 1189L, 1196L, 1198L,
1203L, 1212L, 1217L, 1230L, 1235L, 1257L, 1264L, 1266L, 1279L,
1285L, 1290L, 1299L, 1308L, 1320L, 1331L, 1338L, 1345L, 1350L,
1366L, 1376L, 1381L, 1400L, 1403L, 1406L, 1409L, 1419L, 1424L,
1456L, 1462L, 1467L, 1469L, 1490L, 1499L, 1501L, 1509L, 1515L,
1518L, 1524L, 1531L, 1533L, 1538L, 1560L, 1571L, 1573L, 1578L,
1587L, 1600L, 1602L, 1624L, 1626L, 1631L, 1637L, 1646L, 1656L,
1661L, 1667L, 1677L, 1683L, 1692L, 1694L, 1699L, 1705L, 1712L,
1714L, 1726L, 1739L, 1741L, 1750L, 1763L, 1768L, 1780L, 1795L,
1811L, 1816L, 1821L, 1830L, 1864L, 1869L, 1883L, 1887L, 1891L,
1904L, 1914L, 1917L, 1928L, 1934L, 1939L, 1941L, 1948L, 1950L,
1961L, 1969L, 1975L, 1982L, 1992L, 1998L, 2004L, 2019L, 2025L,
2040L, 2041L, 2046L, 2051L, 2068L, 2083L, 2085L, 2090L, 2098L,
2103L, 2109L, 2116L, 2124L, 2131L, 2133L, 2138L, 2144L, 2154L,
2160L, 2161L, 2183L, 2188L, 2190L, 2203L, 2218L, 2221L, 2229L,
2234L, 2243L, 2252L, 2262L, 2265L, 2275L, 2280L, 2282L, 2286L,
2289L, 2299L, 2308L, 2309L, 2319L, 2332L, 2335L, 2350L, 2353L,
2360L, 2363L, 2366L, 2369L, 2376L, 2401L, 2406L, 2415L, 2426L,
2429L, 2436L, 2447L, 2453L, 2459L, 2476L, 2478L, 2486L, 2492L,
2499L, 2501L, 2511L, 2517L, 2522L, 2528L, 2541L, 2547L, 2552L,
2554L, 2557L, 2566L, 2580L, 2587L, 2594L, 2603L, 2608L), class = "data.frame")
And my SessionInfo():
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] splines grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] foreign_0.8-61 lavaan_0.5-15 plyr_1.8.1 xtable_1.7-3 Hmisc_3.14-4 Formula_1.1-1 survival_2.37-7 lattice_0.20-29 Lambda4_3.0 MBESS_3.3.3
loaded via a namespace (and not attached):
[1] cluster_1.15.2 latticeExtra_0.6-26 MASS_7.3-32 mnormt_1.4-7 pbivnorm_0.5-1 quadprog_1.5-5 RColorBrewer_1.0-5 Rcpp_0.11.1
[9] stats4_3.1.0 tools_3.1.0
The problem is that once the NA values are omitted from the data set, there aren't any "nicht erlebt" observations left:
summary(na.omit(minimal))
swls exp.factor
Min. :1.200 erlebt :64
1st Qu.:4.400 nicht erlebt: 0
Median :5.500
Mean :5.119
3rd Qu.:6.200
Max. :7.000
So lm
is going to have trouble fitting a model to a factor with only one (remaining) level ...
You can also deduce this by looking at the cross-tabulation of exp.factor
and is.na()
of the response ...
with(minimal,table(exp.factor,is.na(swls)))
exp.factor FALSE TRUE
erlebt 64 84
nicht erlebt 0 163