Sklearn - How to predict probability for all target labels

Bert Carremans picture Bert Carremans · Jul 15, 2016 · Viewed 18.4k times · Source

I have a data set with a target variable that can have 7 different labels. Each sample in my training set has only one label for the target variable.

For each sample, I want to calculate the probability for each of the target labels. So my prediction would consist of 7 probabilities for each row.

On the sklearn website I read about multi-label classification, but this doesn't seem to be what I want.

I tried the following code, but this only gives me one classification per sample.

from sklearn.multiclass import OneVsRestClassifier
clf = OneVsRestClassifier(DecisionTreeClassifier())
clf.fit(X_train, y_train)
pred = clf.predict(X_test)

Does anyone have some advice on this? Thanks!

Answer

Abhinav Arora picture Abhinav Arora · Jul 16, 2016

You can do that by simply removing the OneVsRestClassifer and using predict_proba method of the DecisionTreeClassifier. You can do the following:

clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
pred = clf.predict_proba(X_test)

This will give you a probability for each of your 7 possible classes.

Hope that helps!