Controlling the threshold in Logistic Regression in Scikit Learn

London guy picture London guy · Feb 25, 2015 · Viewed 44.8k times · Source

I am using the LogisticRegression() method in scikit-learn on a highly unbalanced data set. I have even turned the class_weight feature to auto.

I know that in Logistic Regression it should be possible to know what is the threshold value for a particular pair of classes.

Is it possible to know what the threshold value is in each of the One-vs-All classes the LogisticRegression() method designs?

I did not find anything in the documentation page.

Does it by default apply the 0.5 value as threshold for all the classes regardless of the parameter values?

Answer

jazib jamil picture jazib jamil · May 15, 2018

There is a little trick that I use, instead of using model.predict(test_data) use model.predict_proba(test_data). Then use a range of values for thresholds to analyze the effects on the prediction;

pred_proba_df = pd.DataFrame(model.predict_proba(x_test))
threshold_list = [0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,.7,.75,.8,.85,.9,.95,.99]
for i in threshold_list:
    print ('\n******** For i = {} ******'.format(i))
    Y_test_pred = pred_proba_df.applymap(lambda x: 1 if x>i else 0)
    test_accuracy = metrics.accuracy_score(Y_test.as_matrix().reshape(Y_test.as_matrix().size,1),
                                           Y_test_pred.iloc[:,1].as_matrix().reshape(Y_test_pred.iloc[:,1].as_matrix().size,1))
    print('Our testing accuracy is {}'.format(test_accuracy))

    print(confusion_matrix(Y_test.as_matrix().reshape(Y_test.as_matrix().size,1),
                           Y_test_pred.iloc[:,1].as_matrix().reshape(Y_test_pred.iloc[:,1].as_matrix().size,1)))

Best!