im getting this error "Can't handle mix of multiclass and continuous-multioutput " when i try to get the accuracy of my model. been trying to figure out what is goin on for a while , but i have no idea and im confused on what is wrong.
# TRAINING data
#Convert crime labels to numbers
df_crime = preprocessing.LabelEncoder()
crime = df_crime.fit_transform(train.Category)
#Get binarized weekdays, districts, and hours using dummy variables
days = pd.get_dummies(train.DayOfWeek)
district = pd.get_dummies(train.PdDistrict)
hour = train.Dates.dt.hour
hour = pd.get_dummies(hour)
#Build new array
train_data = pd.concat([hour, days, district], axis=1)
train_data['crime']=crime
#train_data.head()
#Repeat for test data
days = pd.get_dummies(test.DayOfWeek)
district = pd.get_dummies(test.PdDistrict)
hour = test.Dates.dt.hour
hour = pd.get_dummies(hour)
test_data = pd.concat([hour, days, district], axis=1)
features = ['Friday', 'Monday', 'Saturday', 'Sunday', 'Thursday', 'Tuesday',
'Wednesday', 'BAYVIEW', 'CENTRAL', 'INGLESIDE', 'MISSION',
'NORTHERN', 'PARK', 'RICHMOND', 'SOUTHERN', 'TARAVAL', 'TENDERLOIN']
training, testing = train_test_split(train_data, train_size=.60)
#bernoulliNB
# predicting only on the training data
model_B = BernoulliNB()
model_B.fit(training[features], training['crime'])
predicted2 = np.array(model_B.predict_proba(testing[features]))
log_loss(testing['crime'], predicted2)
score_b = accuracy_score(testing['crime'], predicted2)
print(score_b)
ValueError Traceback (most recent call last)
<ipython-input-27-7d9db3ef89cc> in <module>()
----> 1 score_b = accuracy_score(testing['crime'], predicted2)
2
3 print(score_b)
C:\Users\Michael\Anaconda3\lib\site-packages\sklearn\metrics\classification.py in accuracy_score(y_true, y_pred, normalize, sample_weight)
170
171 # Compute accuracy for each possible representation
--> 172 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
173 if y_type.startswith('multilabel'):
174 differing_labels = count_nonzero(y_true - y_pred, axis=1)
C:\Users\Michael\Anaconda3\lib\site-packages\sklearn\metrics\classification.py in _check_targets(y_true, y_pred)
80 if len(y_type) > 1:
81 raise ValueError("Can't handle mix of {0} and {1}"
---> 82 "".format(type_true, type_pred))
83
84 # We can't have more than one value on y_type => The set is no more needed
ValueError: Can't handle mix of multiclass and continuous-multioutput
predicted2
is an array of class probabilities (.predict_proba(X)
result); accuracy_score
takes only the top classes (predict(X)
result). It means this should work:
predicted3 = model_B.predict(testing[features])
accuracy_score(testing['crime'], predicted3)
But calling predict/predict_proba two times is not a great idea: it is inefficient, and you can get non-matching scores if prediction is non-deterministic for some reason. So it is better to do something like that:
accuracy_score(testing['crime'], predicted2.argmax(axis=1))