When I try to define my linear model in R as follows:
lm1 <- lm(predictorvariable ~ x1+x2+x3, data=dataframe.df)
I get the following error message:
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
Is there any way to ignore this or fix it? Some of the variables are factors and some are not.
If your independent variable (RHS variable) is a factor or a character taking only one value then that type of error occurs.
Example: iris data in R
(model1 <- lm(Sepal.Length ~ Sepal.Width + Species, data=iris))
# Call:
# lm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris)
# Coefficients:
# (Intercept) Sepal.Width Speciesversicolor Speciesvirginica
# 2.2514 0.8036 1.4587 1.9468
Now, if your data consists of only one species:
(model1 <- lm(Sepal.Length ~ Sepal.Width + Species,
data=iris[iris$Species == "setosa", ]))
# Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
# contrasts can be applied only to factors with 2 or more levels
If the variable is numeric (Sepal.Width
) but taking only a single value say 3, then the model runs but you will get NA
as coefficient of that variable as follows:
(model2 <-lm(Sepal.Length ~ Sepal.Width + Species,
data=iris[iris$Sepal.Width == 3, ]))
# Call:
# lm(formula = Sepal.Length ~ Sepal.Width + Species,
# data = iris[iris$Sepal.Width == 3, ])
# Coefficients:
# (Intercept) Sepal.Width Speciesversicolor Speciesvirginica
# 4.700 NA 1.250 2.017
Solution: There is not enough variation in dependent variable with only one value. So, you need to drop that variable, irrespective of whether that is numeric or character or factor variable.
Updated as per comments: Since you know that the error will only occur with factor/character, you can focus only on those and see whether the length of levels of those factor variables is 1 (DROP) or greater than 1 (NODROP).
To see, whether the variable is a factor or not, use the following code:
(l <- sapply(iris, function(x) is.factor(x)))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# FALSE FALSE FALSE FALSE TRUE
Then you can get the data frame of factor variables only
m <- iris[, l]
Now, find the number of levels of factor variables, if this is one you need to drop that
ifelse(n <- sapply(m, function(x) length(levels(x))) == 1, "DROP", "NODROP")
Note: If the levels of factor variable is only one then that is the variable, you have to drop.