i'm trying to run a very simple example where XGBoost takes some data and do a binary classification. The documentation says that xgboost outputs the probabilities when "binary:logistic" is used
import numpy as np
import xgboost as xgb
data = np.random.rand(7,10)
label = np.random.randint(2,size=7)
#print data
#print label
dtrain = xgb.DMatrix(data, label=label)
param = {'bst:max_depth':2, 'bst:eta':1, 'silent':1, 'objective':'binary:logistic' }
plst = param.items()
bst = xgb.train(plst,dtrain,)
dtest= xgb.DMatrix(np.random.rand(4,10))
ypred = bst.predict(dtest)
print ypred
The output is:
[ 0.31350434 0.31350434 0.31350434 0.31350434]
So what does this output mean? Does it mean that i've a 31 % chance of getting 1?
How do I translate it to 0,1?
This question seems related, but i cannot get anything useful out of it.
To convert the probabilities into an outcome or a class (0 or 1), you can use a threshold, as suggested above (it doesn't necessarily have to be 0.5). The problem lies in finding a Decision Boundary, you can see a good high-level explanation here.