How to balance classification using DecisionTreeClassifier?

RoyaumeIX picture RoyaumeIX · May 30, 2016 · Viewed 12.5k times · Source

I have a data set where the classes are unbalanced. The classes are either 0, 1 or 2.

How can I calculate the prediction error for each class and then re-balance weights accordingly in scikit-learn?

Answer

lejlot picture lejlot · May 30, 2016

If you want to fully balance (treat each class as equally important) you can simply pass class_weight='balanced', as it is stated in the docs:

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))