Xgboost-How to use "mae" as objective function?

Sam Qian picture Sam Qian · Jul 10, 2017 · Viewed 14k times · Source

I know xgboost need first gradient and second gradient, but anybody else has used "mae" as obj function?

Answer

Little Bobby Tables picture Little Bobby Tables · Jul 28, 2017

A little bit of theory first, sorry! You asked for the grad and hessian for MAE, however, the MAE is not continuously twice differentiable so trying to calculate the first and second derivatives becomes tricky. Below we can see the "kink" at x=0 which prevents the MAE from being continuously differentiable.

Moreover, the second derivative is zero at all the points where it is well behaved. In XGBoost, the second derivative is used as a denominator in the leaf weights, and when zero, creates serious math-errors.

Given these complexities, our best bet is to try to approximate the MAE using some other, nicely behaved function. Let's take a look.

Some different loss functions

We can see above that there are several functions that approximate the absolute value. Clearly, for very small values, the Squared Error (MSE) is a fairly good approximation of the MAE. However, I assume that this is not sufficient for your use case.

Huber Loss is a well documented loss function. However, it is not smooth so we cannot guarantee smooth derivatives. We can approximate it using the Psuedo-Huber function. It can be implemented in python XGBoost as follows,

import xgboost as xgb

dtrain = xgb.DMatrix(x_train, label=y_train)
dtest = xgb.DMatrix(x_test, label=y_test)

param = {'max_depth': 5}
num_round = 10

def huber_approx_obj(preds, dtrain):
    d = preds - dtrain.get_labels() #remove .get_labels() for sklearn
    h = 1  #h is delta in the graphic
    scale = 1 + (d / h) ** 2
    scale_sqrt = np.sqrt(scale)
    grad = d / scale_sqrt
    hess = 1 / scale / scale_sqrt
    return grad, hess

bst = xgb.train(param, dtrain, num_round, obj=huber_approx_obj)  

Other function can be used by replacing the obj=huber_approx_obj.

Fair Loss is not well documented at all but it seems to work rather well. The fair loss function is:

Fair Loss Function

It can be implemented as such,

def fair_obj(preds, dtrain):
    """y = c * abs(x) - c**2 * np.log(abs(x)/c + 1)"""
    x = preds - dtrain.get_labels()
    c = 1
    den = abs(x) + c
    grad = c*x / den
    hess = c*c / den ** 2
    return grad, hess

This code is taken and adapted from the second place solution in the Kaggle Allstate Challenge.

Log-Cosh Loss function.

def log_cosh_obj(preds, dtrain):
    x = preds - dtrain.get_labels()
    grad = np.tanh(x)
    hess = 1 / np.cosh(x)**2
    return grad, hess

Finally, you can create your own custom loss functions using the above functions as templates.