Python sklearn - how to calculate p-values

user1096808 picture user1096808 · Mar 10, 2014 · Viewed 43.7k times · Source

This is probably a simple question but I am trying to calculate the p-values for my features either using classifiers for a classification problem or regressors for regression. Could someone suggest what is the best method for each case and provide sample code? I want to just see the p-value for each feature rather than keep the k best / percentile of features etc as explained in the documentation.

Thank you

Answer

LinNotFound picture LinNotFound · Apr 12, 2019

You can use statsmodels

import statsmodels.api as sm
logit_model=sm.Logit(y_train,X_train)
result=logit_model.fit()
print(result.summary())

The results would be something like this

                           Logit Regression Results                           
==============================================================================
Dep. Variable:                      y   No. Observations:               406723
Model:                          Logit   Df Residuals:                   406710
Method:                           MLE   Df Model:                           12
Date:                Fri, 12 Apr 2019   Pseudo R-squ.:                0.001661
Time:                        16:48:45   Log-Likelihood:            -2.8145e+05
converged:                      False   LL-Null:                   -2.8192e+05
                                        LLR p-value:                8.758e-193
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1            -0.0037      0.003     -1.078      0.281      -0.010       0.003