Access standardized residuals, cook's values, hatvalues (leverage) etc. easily in Python?

Question 1

Access standardized residuals, cook's values, hatvalues (leverage) etc. easily in Python?

python scikit-learn linear-regression statsmodels

Jaynes01 · Sep 19, 2017 · Viewed 7.1k times · Source

Answer

Answer

Although the accepted answer is correct, I found it helpful to separately access the statistics as instance attributes of an influence instance (statsmodels.regression.linear_model.OLSResults.get_influence) after I fit my model. This saved me from having to index the summary_frame as I was only interested in one of the statistics and not all of them. So maybe this helps somebody else:

import statsmodels.api as sm

#Fit linear model to any dataset
model = sm.OLS(Y,X)
results = model.fit()

#create instance of influence
influence = results.get_influence()

#leverage (hat values)
leverage = influence.hat_matrix_diag

#Cook's D values (and p-values) as tuple of arrays
cooks_d = influence.cooks_distance

#standardized residuals
standardized_residuals = influence.resid_studentized_internal

#studentized residuals
studentized_residuals = influence.resid_studentized_external

Question 2

I am looking for influence statistics after fitting a linear regression. In R I can obtain them (e.g.) like this:

hatvalues(fitted_model) #hatvalues (leverage)
cooks.distance(fitted_model) #Cook's D values
rstandard(fitted_model) #standardized residuals
rstudent(fitted_model) #studentized residuals

etc.

How can I obtain the same statistics when using statsmodels in Python after fitting a model like this:

#import statsmodels
import statsmodels.api as sm

#Fit linear model to any dataset
model = sm.OLS(Y,X)
results = model.fit()

#Creating a dataframe that includes the studentized residuals
sm.regression.linear_model.OLSResults.outlier_test(results)

Edit: See answer below...

Access standardized residuals, cook's values, hatvalues (leverage) etc. easily in Python?

Answer

Related questions