XGBoost predictions output not binary?

orak picture orak · Jun 27, 2016 · Viewed 8.2k times · Source

i'm trying to run a very simple example where XGBoost takes some data and do a binary classification. The documentation says that xgboost outputs the probabilities when "binary:logistic" is used

import numpy as np
import xgboost as xgb

data = np.random.rand(7,10)
label = np.random.randint(2,size=7)
#print data
#print label

dtrain = xgb.DMatrix(data, label=label)
param = {'bst:max_depth':2, 'bst:eta':1, 'silent':1, 'objective':'binary:logistic' }
plst = param.items()

bst = xgb.train(plst,dtrain,)

dtest= xgb.DMatrix(np.random.rand(4,10))
ypred = bst.predict(dtest)

print ypred

The output is:

[ 0.31350434  0.31350434  0.31350434  0.31350434]

So what does this output mean? Does it mean that i've a 31 % chance of getting 1?

How do I translate it to 0,1?

This question seems related, but i cannot get anything useful out of it.

Answer

Mentosovitz picture Mentosovitz · Jun 18, 2018

To convert the probabilities into an outcome or a class (0 or 1), you can use a threshold, as suggested above (it doesn't necessarily have to be 0.5). The problem lies in finding a Decision Boundary, you can see a good high-level explanation here.