What to use to do multiple correlation?

Pa_ picture Pa_ · Nov 19, 2012 · Viewed 12.8k times · Source

I am trying to use python to compute multiple linear regression and multiple correlation between a response array and a set of arrays of predictors. I saw the very simple example to compute multiple linear regression, which is easy. But how to compute multiple correlation with statsmodels? or with anything else, as an alternative. I guess i could use rpy and R, but i'd prefer to stay in python if possible.

edit [clarification]: Considering a situation like the one described here: http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704-EP713_MultivariableMethods/ I would like to compute also multiple correlation coefficients for the predictors, in addition to the regression coefficients and the other regression parameters

Answer

jseabold picture jseabold · Nov 19, 2012

You could certainly do this with statsmodels and pandas. Something like this might get you started

import pandas
import statsmodels.api as sm
from statsmodels.formula.api import ols

data = pandas.DataFrame([["A", 4, 0, 1, 27], 
                         ["B", 7, 1, 1, 29], 
                         ["C", 6, 1, 0, 23], 
                         ["D", 2, 0, 0, 20], 
                         ["etc.", 3, 0, 1, 21]], 
                         columns=["ID", "score", "male", "age20", "BMI"])
print data.corr()

model = ols("BMI ~ score + male + age20", data=data).fit()
print model.params
print model.summary()

Have a look at the documentation:

http://statsmodels.sourceforge.net/devel/

http://pandas.pydata.org/

Edit: I'm not familiar with the terminology multiple correlation coefficient, but I believe this is just square root of the R-squared of a multiple regression model no?

print model.rsquared**.5
print model.rsquared_adj**.5

Is this what you're after?