statsmodels linear regression - patsy formula to include all predictors in model

Greg picture Greg · Mar 13, 2014 · Viewed 9.7k times · Source

Say I have a dataframe (let's call it DF) where y is the dependent variable and x1, x2, x3 are my independent variables. In R I can fit a linear model using the following code, and the . will include all of my independent variables in the model:

# R code for fitting linear model
result = lm(y ~ ., data=DF)

I can't figure out how to do this with statsmodels using patsy formulas without explicitly adding all of my independent variables to the formula. Does patsy have an equivalent to R's .? I haven't had any luck finding it in the patsy documentation.

Answer

Sudeep Juvekar picture Sudeep Juvekar · Mar 13, 2014

I haven't found . equivalent in patsy documentation either. But what it lacks in conciseness, it can make-up for by giving strong string manipulation in Python. So, you can get formula involving all variable columns in DF using

all_columns = "+".join(DF.columns - ["y"])

This gives x1+x2+x3 in your case. Finally, you can create a string formula using y and pass it to any fitting procedure

my_formula = "y~" + all_columns
result = lm(formula=my_formula, data=DF)