I am trying to use python to compute multiple linear regression and multiple correlation between a response array and a set of arrays of predictors. I saw the very simple example to compute multiple linear regression, which is easy. But how to compute multiple correlation with statsmodels? or with anything else, as an alternative. I guess i could use rpy and R, but i'd prefer to stay in python if possible.
edit [clarification]: Considering a situation like the one described here: http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704-EP713_MultivariableMethods/ I would like to compute also multiple correlation coefficients for the predictors, in addition to the regression coefficients and the other regression parameters
You could certainly do this with statsmodels and pandas. Something like this might get you started
import pandas
import statsmodels.api as sm
from statsmodels.formula.api import ols
data = pandas.DataFrame([["A", 4, 0, 1, 27],
["B", 7, 1, 1, 29],
["C", 6, 1, 0, 23],
["D", 2, 0, 0, 20],
["etc.", 3, 0, 1, 21]],
columns=["ID", "score", "male", "age20", "BMI"])
print data.corr()
model = ols("BMI ~ score + male + age20", data=data).fit()
print model.params
print model.summary()
Have a look at the documentation:
http://statsmodels.sourceforge.net/devel/
Edit: I'm not familiar with the terminology multiple correlation coefficient, but I believe this is just square root of the R-squared of a multiple regression model no?
print model.rsquared**.5
print model.rsquared_adj**.5
Is this what you're after?