How is the parameter "weight" (DMatrix) used in the gradient boosting procedure (xgboost)?

Olivier_s_j picture Olivier_s_j · Mar 14, 2016 · Viewed 14.6k times · Source

In xgboost it is possible to set the parameter weight for a DMatrix. This is apparently a list of weights wherein each value is a weight for a corresponding sample. I can't find any information on how these weights are actually used in the gradient boosting procedure. Are they related to eta ?

For example, if I would set weight to 0.3 for all samples and eta to 1, would this be the same as setting eta to 0.3 and weight to 1?

Answer

T. Scharf picture T. Scharf · Mar 17, 2016

xgboost allows for instance weighting during the construction of the DMatrix, as you noted. This weight is directly tied the instance and travels with it throughout the entire training. Thus it is included in the calculations of the gradients and hessians, and directly impacts the split points and traing of an xgboost model.

see here and here

Instance Weight File

XGBoost supports providing each instance an weight to differentiate the importance of instances. For example, if we provide an instance weight file for the "train.txt" file in the example as below:

train.txt.weight

1

0.5

0.5

1

0.5

It means that XGBoost will emphasize more on the first and fourth instance, that is to say positive instances while training. The configuration is similar to configuring the group information. If the instance file name is "xxx", XGBoost will check whether there is a file named "xxx.weight" in the same directory and if there is, will use the weights while training models.

It is very different from eta

eta simply tells xgboost how much the blend the last tree trained into the ensemble. A measure of how greedy the ensemble should be at each iteration.

For example, if I would set weight to 0.3 for all samples and eta to 1, would this be the same as setting eta to 0.3 and weight to 1?

  • A constant weight of 1 for all instances is the default, so changing that to a constant of .3 for all instances would still be equal weighting, so this shouldn't impact things too much. However, setting eta up to 1, from .3, would make the training much more aggressive.