I am using the LogisticRegression()
method in scikit-learn
on a highly unbalanced data set. I have even turned the class_weight
feature to auto
.
I know that in Logistic Regression it should be possible to know what is the threshold value for a particular pair of classes.
Is it possible to know what the threshold value is in each of the One-vs-All classes the LogisticRegression()
method designs?
I did not find anything in the documentation page.
Does it by default apply the 0.5
value as threshold for all the classes regardless of the parameter values?
There is a little trick that I use, instead of using model.predict(test_data)
use model.predict_proba(test_data)
. Then use a range of values for thresholds to analyze the effects on the prediction;
pred_proba_df = pd.DataFrame(model.predict_proba(x_test))
threshold_list = [0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,.7,.75,.8,.85,.9,.95,.99]
for i in threshold_list:
print ('\n******** For i = {} ******'.format(i))
Y_test_pred = pred_proba_df.applymap(lambda x: 1 if x>i else 0)
test_accuracy = metrics.accuracy_score(Y_test.as_matrix().reshape(Y_test.as_matrix().size,1),
Y_test_pred.iloc[:,1].as_matrix().reshape(Y_test_pred.iloc[:,1].as_matrix().size,1))
print('Our testing accuracy is {}'.format(test_accuracy))
print(confusion_matrix(Y_test.as_matrix().reshape(Y_test.as_matrix().size,1),
Y_test_pred.iloc[:,1].as_matrix().reshape(Y_test_pred.iloc[:,1].as_matrix().size,1)))
Best!