I am doing multiclass/multilabel text classification. I trying to get rid of the "ConvergenceWarning".
When I tuned the max_iter from default to 4000, the warning is disappeared. However, my model accuracy is reduced from 78 to 75.
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
logreg = Pipeline([('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', LogisticRegression(n_jobs=1, C=1e5, solver='lbfgs',multi_class='ovr' ,random_state=0, class_weight='balanced' )),
])
logreg.fit(X_train, y_train)
y_pred = logreg.predict(X_test)
print('Logistic Regression Accuracy %s' % accuracy_score(y_pred, y_test))
cv_score = cross_val_score(logreg, train_tfidf, y_train, cv=10, scoring='accuracy')
print("CV Score : Mean : %.7g | Std : %.7g | Min : %.7g | Max : %.7g" % (np.mean(cv_score),np.std(cv_score),np.min(cv_score),np.max(cv_score)))
Why my accuracy is reducing when max_iter =4000? Is there any other way to fix * "ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. "of iterations.", ConvergenceWarning)" *
It's missing the data used in the question so it's not possible to reproduce the problem but just guess.
Some things to check:
1) Many estimators such as LogisticRegression
likes (not to say requires) scaled data. Depending on your data, you may want to scale with MaxAbsScaler
, MinMaxScaler
, StandardScaler
or RobustAScaler
. The optimal choice depends on the kind of problem you are trying to solve, data properties like sparsity, whether negative values are welcomed by the downstream estimator, etc. Scaling data usually speeds up convergence, that may even not require to increase max_iter
.
2) In my experience, solver
not "liblinear"
requires more max_iter
iterations to converge given the same input data.
3) I didn't see any 'max_iterset in your code snippet. It currently defaults to
100` (sklearn 0.22).
4) I saw you set the the regularization parameter C=100000
. It's drastically reduce the regularization, as C is the inverse of regularization strength. It's expected to consume more iterations and may lead to overfit the model.
5) I didn't expect that a higher max_iter
would get you lower accuracy. The solver is diverging rather than converging. The data may not be scaled or the random state is not fixed or the tolerance tol
(defaults 1e-4) became to high.
6) Check you cross_val_score
cross-validation parameter cv
. If I'm not wrong, the default behavior doesn't set the random state which result in variable mean accuracy.