Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) with an obviously correct dataset

r lm
user3588777 picture user3588777 · Apr 30, 2014 · Viewed 11.4k times · Source

I encountered a problem in R when trying to run a simple linear model with a categorical variable as predictor. When running the model, R throws the error

`Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : `

The data, however, seem to be okay (data set attached below):

str(minimal)
'data.frame':   330 obs. of  2 variables:
 $ swls      : num  5.2 NaN 7 6 NaN NaN NaN NaN NaN NaN ...
 $ exp.factor: Factor w/ 2 levels "erlebt","nicht erlebt": 1 1 1 1 2 2 2 2 NA 2 ...

There seems also to be enough variation in the data, so similar threads I found do not apply here:

table(minimal$exp.factor)

      erlebt nicht erlebt 
         148          163 

` However, lm() still refuses to work:

lm(swls ~ exp.factor, data = minimal)
Fehler in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  Kontraste können nur auf Faktoren mit 2 oder mehr Stufen angewendet werden

The really strange is that lm() works as expected with other factors (e.g., sex)

Any ideas what goes wrong here?

The Dataset:

structure(list(swls = c(5.2, NaN, 7, 6, NaN, NaN, NaN, NaN, NaN, 
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
6.8, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 7, NaN, 
NaN, NaN, NaN, NaN, NaN, NaN, 5.8, 2.6, NaN, NaN, NaN, NaN, NaN, 
NaN, 5.4, NaN, 6.4, NaN, NaN, NaN, NaN, NaN, NaN, 6.8, NaN, NaN, 
NaN, NaN, NaN, NaN, 1.2, NaN, 6.2, 6.4, 5.2, NaN, 5.4, NaN, NaN, 
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 4.6, NaN, NaN, 
NaN, NaN, 5.8, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 5, NaN, 
NaN, NaN, 6.8, NaN, NaN, NaN, 6, 7, NaN, NaN, NaN, NaN, 6, NaN, 
6.4, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 6.8, NaN, NaN, 
6.8, NaN, NaN, NaN, NaN, NaN, 5, 5.6, NaN, NaN, NaN, NaN, NaN, 
NaN, NaN, 3, NaN, NaN, NaN, NaN, NaN, NaN, 1.2, 4.2, NaN, 5.4, 
NaN, NaN, NaN, NaN, NaN, NaN, 6.6, 5.8, NaN, NaN, NaN, 6.4, NaN, 
NaN, NaN, NaN, NaN, NaN, NaN, NaN, 2.8, NaN, 4, NaN, NaN, NaN, 
6, 5, NaN, NaN, NaN, 4.4, NaN, 2, NaN, NaN, NaN, NaN, NaN, NaN, 
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 2.2, NaN, NaN, 
NaN, 7, NaN, NaN, NaN, 5.6, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
NaN, NaN, NaN, 6.2, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
5, NaN, NaN, 5.2, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
NaN, 5.8, NaN, NaN, 3.6, 5.6, NaN, NaN, 2.8, NaN, NaN, NaN, 6.2, 
NaN, NaN, NaN, NaN, NaN, NaN, NaN, 5.8, 6.2, NaN, NaN, 5, 6.2, 
NaN, NaN, NaN, NaN, NaN, NaN, 4.8, NaN, NaN, NaN, NaN, 4.8, NaN, 
NaN, NaN, NaN, NaN, NaN, 4.4, NaN, NaN, 3, 5.2, NaN, 3.8, NaN, 
NaN, NaN, NaN, 3, NaN, NaN, NaN, NaN, 1.6, NaN, NaN, 6.6, NaN, 
NaN, NaN, NaN, NaN, NaN, NaN, NaN), exp.factor = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, NA, 2L, 2L, NA, 2L, 2L, NA, 2L, 1L, 
2L, 2L, 2L, NA, 2L, 1L, NA, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 
1L, 1L, NA, NA, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, NA, 2L, 1L, 
1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, NA, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 
1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, NA, 1L, 2L, 
2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 
1L, 2L, 1L, 2L, NA, 2L, 1L, NA, 1L, 1L, 1L, 2L, 2L, NA, 1L, 2L, 
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 
1L, 1L, NA, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 
2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 
1L, NA, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 
1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, NA, 2L, 2L, 1L, 
2L, NA, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 
1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 
NA, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, NA, 1L, 2L, 1L, 1L, 1L, 
1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 
1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("erlebt", "nicht erlebt"
), class = "factor")), .Names = c("swls", "exp.factor"), row.names = c(7L, 
9L, 19L, 36L, 50L, 63L, 67L, 75L, 84L, 85L, 94L, 100L, 109L, 
122L, 128L, 135L, 137L, 145L, 156L, 158L, 163L, 182L, 188L, 198L, 
204L, 213L, 221L, 240L, 251L, 254L, 258L, 261L, 271L, 284L, 286L, 
295L, 308L, 309L, 313L, 319L, 334L, 340L, 344L, 351L, 354L, 365L, 
372L, 382L, 385L, 391L, 398L, 404L, 427L, 431L, 435L, 438L, 441L, 
452L, 468L, 469L, 474L, 483L, 486L, 493L, 502L, 508L, 513L, 519L, 
524L, 528L, 537L, 543L, 546L, 557L, 572L, 578L, 591L, 606L, 611L, 
613L, 624L, 633L, 640L, 642L, 651L, 662L, 667L, 672L, 673L, 696L, 
703L, 709L, 718L, 722L, 732L, 735L, 747L, 749L, 753L, 770L, 780L, 
787L, 799L, 801L, 812L, 818L, 825L, 838L, 864L, 874L, 887L, 896L, 
897L, 906L, 909L, 920L, 923L, 929L, 944L, 959L, 964L, 973L, 978L, 
986L, 991L, 996L, 1001L, 1008L, 1014L, 1017L, 1033L, 1040L, 1046L, 
1067L, 1075L, 1085L, 1090L, 1100L, 1102L, 1113L, 1144L, 1145L, 
1150L, 1155L, 1158L, 1165L, 1175L, 1180L, 1189L, 1196L, 1198L, 
1203L, 1212L, 1217L, 1230L, 1235L, 1257L, 1264L, 1266L, 1279L, 
1285L, 1290L, 1299L, 1308L, 1320L, 1331L, 1338L, 1345L, 1350L, 
1366L, 1376L, 1381L, 1400L, 1403L, 1406L, 1409L, 1419L, 1424L, 
1456L, 1462L, 1467L, 1469L, 1490L, 1499L, 1501L, 1509L, 1515L, 
1518L, 1524L, 1531L, 1533L, 1538L, 1560L, 1571L, 1573L, 1578L, 
1587L, 1600L, 1602L, 1624L, 1626L, 1631L, 1637L, 1646L, 1656L, 
1661L, 1667L, 1677L, 1683L, 1692L, 1694L, 1699L, 1705L, 1712L, 
1714L, 1726L, 1739L, 1741L, 1750L, 1763L, 1768L, 1780L, 1795L, 
1811L, 1816L, 1821L, 1830L, 1864L, 1869L, 1883L, 1887L, 1891L, 
1904L, 1914L, 1917L, 1928L, 1934L, 1939L, 1941L, 1948L, 1950L, 
1961L, 1969L, 1975L, 1982L, 1992L, 1998L, 2004L, 2019L, 2025L, 
2040L, 2041L, 2046L, 2051L, 2068L, 2083L, 2085L, 2090L, 2098L, 
2103L, 2109L, 2116L, 2124L, 2131L, 2133L, 2138L, 2144L, 2154L, 
2160L, 2161L, 2183L, 2188L, 2190L, 2203L, 2218L, 2221L, 2229L, 
2234L, 2243L, 2252L, 2262L, 2265L, 2275L, 2280L, 2282L, 2286L, 
2289L, 2299L, 2308L, 2309L, 2319L, 2332L, 2335L, 2350L, 2353L, 
2360L, 2363L, 2366L, 2369L, 2376L, 2401L, 2406L, 2415L, 2426L, 
2429L, 2436L, 2447L, 2453L, 2459L, 2476L, 2478L, 2486L, 2492L, 
2499L, 2501L, 2511L, 2517L, 2522L, 2528L, 2541L, 2547L, 2552L, 
2554L, 2557L, 2566L, 2580L, 2587L, 2594L, 2603L, 2608L), class = "data.frame")

And my SessionInfo():

R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8     LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] splines   grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] foreign_0.8-61  lavaan_0.5-15   plyr_1.8.1      xtable_1.7-3    Hmisc_3.14-4    Formula_1.1-1   survival_2.37-7 lattice_0.20-29 Lambda4_3.0     MBESS_3.3.3    

loaded via a namespace (and not attached):
 [1] cluster_1.15.2      latticeExtra_0.6-26 MASS_7.3-32         mnormt_1.4-7        pbivnorm_0.5-1      quadprog_1.5-5      RColorBrewer_1.0-5  Rcpp_0.11.1        
 [9] stats4_3.1.0        tools_3.1.0        

Answer

Ben Bolker picture Ben Bolker · Apr 30, 2014

The problem is that once the NA values are omitted from the data set, there aren't any "nicht erlebt" observations left:

summary(na.omit(minimal))
      swls              exp.factor
 Min.   :1.200   erlebt      :64  
 1st Qu.:4.400   nicht erlebt: 0  
 Median :5.500                    
 Mean   :5.119                    
 3rd Qu.:6.200                    
 Max.   :7.000     

So lm is going to have trouble fitting a model to a factor with only one (remaining) level ...

You can also deduce this by looking at the cross-tabulation of exp.factor and is.na() of the response ...

with(minimal,table(exp.factor,is.na(swls)))

exp.factor     FALSE TRUE
  erlebt          64   84
  nicht erlebt     0  163