XGBOOST: sample_Weights vs scale_pos_weight

mamafoku picture mamafoku · Jan 3, 2018 · Viewed 8.9k times · Source

I have a highly unbalanced dataset and am wondering where to account for the weights, and thus am trying to comprehend the difference between scale_pos_weight argument in XGBClassifier and the sample_weight parameter of the fit method. Would appreciate an intuitive explanation of the difference between the two, if they can be used simultaneously or how either approach is selected.

The documentation indicates that scale_pos_weight:

control the balance of positive and negative weights..& typical value to consider: sum(negative cases) / sum(positive cases)

Example:

from xgboost import XGBClassifier
import xgboost as xgb
LR=0.1
NumTrees=1000
xgbmodel=XGBClassifier(booster='gbtree',seed=0,nthread=-1,
                       gamma=0,scale_pos_weight=14,learning_rate=LR,n_estimators=NumTrees,
                      max_depth=5,objective='binary:logistic',subsample=1)
xgbmodel.fit(X_train, y_train)

OR

from xgboost import XGBClassifier
import xgboost as xgb
LR=0.1
NumTrees=1000
xgbmodel=XGBClassifier(booster='gbtree',seed=0,nthread=-1,
                       gamma=0,learning_rate=LR,n_estimators=NumTrees,
                      max_depth=5,objective='binary:logistic',subsample=1)
xgbmodel.fit(X_train, y_train,sample_weight=weights_train)

Answer

Milad Shahidi picture Milad Shahidi · Sep 5, 2018

The sample_weight parameter allows you to specify a different weight for each training example. The scale_pos_weight parameter lets you provide a weight for an entire class of examples ("positive" class).

These correspond to two different approaches to cost-sensitive learning. If you believe that the cost of misclassifying positive examples (missing a cancer patient) is the same for all positive examples (but more than misclassifying negative ones, e.g. telling someone they have cancer when they actually don't) then you can specify one single weight for all positive examples via scale_pos_weight.

XGBoost treats labels = 1 as the "positive" class. This is evident from the following piece of code:

if (info.labels[i] == 1.0f) w *= param_.scale_pos_weight

See this question.

The other scenario is where you have example-dependent costs. One example is detecting fraudulent transactions. Not only a false negative (missing a fraudulent transaction) is more costly than a false positive (blocking a legal transaction), but the cost of missing a false negative is proportional to the amount of money being stolen. So you want to give larger weights to positive (fraudulent) examples with higher amounts. In this case, you can use the sample_weight parameter to specify example-specific weights.