Oversampling: SMOTE for binary and categorical data in Python

TTZ picture TTZ · Dec 5, 2017 · Viewed 13.9k times · Source

I would like to apply SMOTE to unbalanced dataset which contains binary, categorical and continuous data. Is there a way to apply SMOTE to binary and categorical data?

Answer

Azaf Tanveer picture Azaf Tanveer · Jan 29, 2019

As per the documentation, this is now possible with the use of SMOTENC. SMOTE-NC is capable of handling a mix of categorical and continuous features.

Here is the code from the documentation

from imblearn.over_sampling import SMOTENC smote_nc = SMOTENC(categorical_features=[0, 2], random_state=0) X_resampled, y_resampled = smote_nc.fit_resample(X, y)