I'm resampling my data (multiclass) by using SMOTE.
sm = SMOTE(random_state=1)
X_res, Y_res = sm.fit_resample(X_train, Y_train)
However, I'm getting this attribute error. Can anyone help?
Short answer
You need to upgrade scikit-learn
to version 0.23.1.
Long answer
The newest version 0.7.0 of imbalanced-learn
seems to have an undocumented dependency on scikit-learn
v0.23.1. It would give you AttributeError: 'SMOTE' object has no attribute '_validate_data'
if your scikit-learn
is 0.22 or below.
If you are using Anaconda
, installing scikit-learn
version 0.23.1 might be tricky. conda update scikit-learn
might not update scikit-learn
version 0.23 or higher because the newest scikit-learn
version Conda has at this point of time is 0.22.1. If you try to install it using conda install scikit-learn=0.23.1
or pip install scikit-learn==0.23.1
, you will get tons of compatibility checks and installation might not be quick. Therefore the easiest way to install scikit-learn
version 0.23.1 in Anaconda is to create a new virtual environment with minimum packages so that there are less or no conflict issues. Then, in the new virtual environment install scikit-learn
version 0.23.1 followed by version 0.7.0 of imbalanced-learn
.
conda create -n test python=3.7.6
conda activate test
pip install scikit-learn==0.23.1
pip install imbalanced-learn==0.7.0
Finally, you need to reinstall your IDE in the new virtual environment in order to use these packages.
However, once scikit-learn
version 0.23.1 becomes available in Conda and there are no compatibility issues, you can install it in the base environment directly.