ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0.0

Amey Yadav picture Amey Yadav · Nov 10, 2016 · Viewed 16.6k times · Source

I have applied Logistic Regression on train set after splitting the data set into test and train sets, but I got the above error. I tried to work it out, and when i tried to print my response vector y_train in the console it prints integer values like 0 or 1. But when i wrote it into a file I found the values were float numbers like 0.0 and 1.0. If thats the problem, how can I over come it.

lenreg = LogisticRegression()

print y_train[0:10]
y_train.to_csv(path='ytard.csv')

lenreg.fit(X_train, y_train)
y_pred = lenreg.predict(X_test)
print metics.accuracy_score(y_test, y_pred)

StrackTrace is as follows,

Traceback (most recent call last):

  File "/home/amey/prog/pd.py", line 82, in <module>

    lenreg.fit(X_train, y_train)

  File "/usr/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1154, in fit

    self.max_iter, self.tol, self.random_state)

  File "/usr/lib/python2.7/dist-packages/sklearn/svm/base.py", line 885, in _fit_liblinear

    " class: %r" % classes_[0])

ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0.0

Meanwhile I've gone across the link which was unanswered. Is there a solution.

Answer

jeffery_the_wind picture jeffery_the_wind · May 9, 2017

The problem here is that your y_train vector, for whatever reason, only has zeros. It is actually not your fault, and its kind of a bug ( I think ). The classifier needs 2 classes or else it throws this error.

It makes sense. If your y_train vector only has zeros, ( ie only 1 class ), then the classifier doesn't really need to do any work, since all predictions should just be the one class.

In my opinion the classifier should still complete and just predict the one class ( all zeros in this case ) and then throw a warning, but it doesn't. It throws the error in stead.

A way to check for this condition is like this:

lenreg = LogisticRegression()

print y_train[0:10]
y_train.to_csv(path='ytard.csv')

if len(np.sum(y_train)) in [len(y_train),0]:
    print "all one class"
    #do something else
else:
    #OK to proceed
    lenreg.fit(X_train, y_train)
    y_pred = lenreg.predict(X_test)
    print metics.accuracy_score(y_test, y_pred)

TO overcome the problem more easily i would recommend just including more samples in you test set, like 100 or 1000 instead of 10.