Say I have a dataframe (let's call it DF
) where y
is the dependent variable and x1, x2, x3
are my independent variables. In R I can fit a linear model using the following code, and the .
will include all of my independent variables in the model:
# R code for fitting linear model
result = lm(y ~ ., data=DF)
I can't figure out how to do this with statsmodels using patsy formulas without explicitly adding all of my independent variables to the formula. Does patsy have an equivalent to R's .
? I haven't had any luck finding it in the patsy documentation.
I haven't found .
equivalent in patsy documentation either. But what it lacks in conciseness, it can make-up for by giving strong string manipulation in Python. So, you can get formula involving all variable columns in DF
using
all_columns = "+".join(DF.columns - ["y"])
This gives x1+x2+x3
in your case. Finally, you can create a string formula using y
and pass it to any fitting procedure
my_formula = "y~" + all_columns
result = lm(formula=my_formula, data=DF)