I am trying out multi-class classification with xgboost and I've built it using this code,
clf = xgb.XGBClassifier(max_depth=7, n_estimators=1000)
clf.fit(byte_train, y_train)
train1 = clf.predict_proba(train_data)
test1 = clf.predict_proba(test_data)
This gave me some good results. I've got log-loss below 0.7 for my case. But after looking through few pages I've found that we have to use another objective in XGBClassifier for multi-class problem. Here's what is recommended from those pages.
clf = xgb.XGBClassifier(max_depth=5, objective='multi:softprob', n_estimators=1000,
num_classes=9)
clf.fit(byte_train, y_train)
train1 = clf.predict_proba(train_data)
test1 = clf.predict_proba(test_data)
This code is also working but it's taking a lot of time to complete compared when to my first code.
Why is my first code also working for multi-class case? I have checked that it's default objective is binary:logistic used for binary classification but it worked really well for multi-class? Which one should I use if both are correct?
In fact, even if the default obj parameter of XGBClassifier
is binary:logistic
, it will internally judge the number of class of label y. When the class number is greater than 2, it will modify the obj parameter to multi:softmax
.
https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/sklearn.py
class XGBClassifier(XGBModel, XGBClassifierBase):
# pylint: disable=missing-docstring,invalid-name,too-many-instance-attributes
def __init__(self, objective="binary:logistic", **kwargs):
super().__init__(objective=objective, **kwargs)
def fit(self, X, y, sample_weight=None, base_margin=None,
eval_set=None, eval_metric=None,
early_stopping_rounds=None, verbose=True, xgb_model=None,
sample_weight_eval_set=None, callbacks=None):
# pylint: disable = attribute-defined-outside-init,arguments-differ
evals_result = {}
self.classes_ = np.unique(y)
self.n_classes_ = len(self.classes_)
xgb_options = self.get_xgb_params()
if callable(self.objective):
obj = _objective_decorator(self.objective)
# Use default value. Is it really not used ?
xgb_options["objective"] = "binary:logistic"
else:
obj = None
if self.n_classes_ > 2:
# Switch to using a multiclass objective in the underlying
# XGB instance
xgb_options['objective'] = 'multi:softprob'
xgb_options['num_class'] = self.n_classes_