python error Can't handle mix of multiclass and continuous-multioutput

lupejuares picture lupejuares · Nov 19, 2016 · Viewed 15.3k times · Source

im getting this error "Can't handle mix of multiclass and continuous-multioutput " when i try to get the accuracy of my model. been trying to figure out what is goin on for a while , but i have no idea and im confused on what is wrong.

# TRAINING data
#Convert crime labels to numbers
df_crime = preprocessing.LabelEncoder()
crime = df_crime.fit_transform(train.Category)
#Get binarized weekdays, districts, and hours using dummy variables
days = pd.get_dummies(train.DayOfWeek)
district = pd.get_dummies(train.PdDistrict)
hour = train.Dates.dt.hour
hour = pd.get_dummies(hour)
#Build new array
train_data = pd.concat([hour, days, district], axis=1)
train_data['crime']=crime
#train_data.head()

#Repeat for test data
days = pd.get_dummies(test.DayOfWeek)
district = pd.get_dummies(test.PdDistrict)

hour = test.Dates.dt.hour
hour = pd.get_dummies(hour) 

test_data = pd.concat([hour, days, district], axis=1)

features = ['Friday', 'Monday', 'Saturday', 'Sunday', 'Thursday', 'Tuesday',
 'Wednesday', 'BAYVIEW', 'CENTRAL', 'INGLESIDE', 'MISSION',
 'NORTHERN', 'PARK', 'RICHMOND', 'SOUTHERN', 'TARAVAL', 'TENDERLOIN']

training, testing = train_test_split(train_data, train_size=.60) 

#bernoulliNB
# predicting only on the training data
model_B = BernoulliNB()
model_B.fit(training[features], training['crime'])
predicted2 = np.array(model_B.predict_proba(testing[features]))
log_loss(testing['crime'], predicted2)

score_b = accuracy_score(testing['crime'], predicted2)
print(score_b)


ValueError                                Traceback (most recent call last)
<ipython-input-27-7d9db3ef89cc> in <module>()
----> 1 score_b = accuracy_score(testing['crime'], predicted2)
      2 
      3 print(score_b)

C:\Users\Michael\Anaconda3\lib\site-packages\sklearn\metrics\classification.py in accuracy_score(y_true, y_pred, normalize, sample_weight)
    170 
    171     # Compute accuracy for each possible representation
--> 172     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    173     if y_type.startswith('multilabel'):
    174         differing_labels = count_nonzero(y_true - y_pred, axis=1)

C:\Users\Michael\Anaconda3\lib\site-packages\sklearn\metrics\classification.py in _check_targets(y_true, y_pred)
     80     if len(y_type) > 1:
     81         raise ValueError("Can't handle mix of {0} and {1}"
---> 82                          "".format(type_true, type_pred))
     83 
     84     # We can't have more than one value on y_type => The set is no more needed

ValueError: Can't handle mix of multiclass and continuous-multioutput

Answer

Mikhail Korobov picture Mikhail Korobov · Nov 20, 2016

predicted2 is an array of class probabilities (.predict_proba(X) result); accuracy_score takes only the top classes (predict(X) result). It means this should work:

predicted3 = model_B.predict(testing[features])
accuracy_score(testing['crime'], predicted3)

But calling predict/predict_proba two times is not a great idea: it is inefficient, and you can get non-matching scores if prediction is non-deterministic for some reason. So it is better to do something like that:

accuracy_score(testing['crime'], predicted2.argmax(axis=1))