Linear model (lm) when dependent variable is a factor/categorical variable?

Tim_Utrecht picture Tim_Utrecht · Mar 5, 2014 · Viewed 20k times · Source

I want to do linear regression with the lm function. My dependent variable is a factor called AccountStatus:

1:0 days in arrears, 2:30-60 days in arrears, 3:60-90 days in arrears and 4:90+ days in arrears. (4)

As independent variable I have several numeric variables: Loan to value, debt to income and interest rate.

Is it possible to do a linear regression with these variables? I looked on the internet and found something about dummy's, but those were all for the independent variable.

This did not work:

fit <- lm(factor(AccountStatus) ~ OriginalLoanToValue, data=mydata)
summary(fit)

Answer

Maxim.K picture Maxim.K · Mar 5, 2014

Linear regression does not take categorical variables for the dependent part, it has to be continuous. Considering that your AccountStatus variable has only four levels, it is unfeasible to treat it is continuous. Before commencing any statistical analysis, one should be aware of the measurement levels of one's variables.

What you can do is use multinomial logistic regression, see here for instance. Alternatively, you can recode the AccountStatus as dichotomous and use simple logistic regression.

Sorry to disappoint you, but this is just an inherent restriction of multiple regression, nothing to do with R really. If you want to learn more about which statistical technique is appropriate for different combinations of measurement levels of dependent and independent variables, I can wholeheartedly advise this book.