I wrote this code and wanted to obtain probabilities of classification.
from sklearn import svm
X = [[0, 0], [10, 10],[20,30],[30,30],[40, 30], [80,60], [80,50]]
y = [0, 1, 2, 3, 4, 5, 6]
clf = svm.SVC()
clf.probability=True
clf.fit(X, y)
prob = clf.predict_proba([[10, 10]])
print prob
I obtained this output:
[[0.15376986 0.07691205 0.15388546 0.15389275 0.15386348 0.15383004 0.15384636]]
which is very weird because the probability should have been
[0 1 0 0 0 0 0 0]
(Observe that the sample for which class has to be predicted is same as 2nd sample) also, probability obtained for that class is the lowest.
You should disable probability
and use decision_function
instead, because there is no guarantee that predict_proba
and predict
return the same result.
You can read more about it, here in the documentation.
clf.predict([[10, 10]]) // returns 1 as expected
prop = clf.decision_function([[10, 10]]) // returns [[ 4.91666667 6.5 3.91666667 2.91666667 1.91666667 0.91666667
-0.08333333]]
prediction = np.argmax(prop) // returns 1