Difference between original xgboost (Learning API) and sklearn XGBClassifier (Scikit-Learn API)

ybdesire picture ybdesire · Jun 21, 2016 · Viewed 7.4k times · Source

I use the xgboots sklearn interface below to create and train an xgb model-1.

clf = xgb.XGBClassifier(n_estimators = 100, objective= 'binary:logistic',)
clf.fit(x_train, y_train,  early_stopping_rounds=10, eval_metric="auc", 
    eval_set=[(x_valid, y_valid)])

And the xgboost model can be created by original xgboost as model-2 below:

param = {}
param['objective'] = 'binary:logistic'
param['eval_metric'] = "auc"
num_rounds = 100
xgtrain = xgb.DMatrix(x_train, label=y_train)
xgval = xgb.DMatrix(x_valid, label=y_valid)
watchlist = [(xgtrain, 'train'),(xgval, 'val')]
model = xgb.train(plst, xgtrain, num_rounds, watchlist, early_stopping_rounds=10)

I think all the parameters are the same between model-1 and model-2. But the validation score is different. Is any difference between model-1 and model-2 ?

Answer

Du Phan picture Du Phan · Aug 22, 2016

As I understand, there are many differences between default parameters in xgb and in its sklearn interface. For example: default xgb has eta=0.3 while the other has eta=0.1. You can see more about default parameters of each implements here:

https://github.com/dmlc/xgboost/blob/master/doc/parameter.md http://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn