ValueError: multiclass format is not supported

Question 1

ValueError: multiclass format is not supported

python pandas machine-learning scikit-learn training-data

Sudhansu Kumar · Oct 17, 2019 · Viewed 8.3k times · Source

Answer

Answer

It seems the task you are trying to solve is regression: predicting the price. However, you are training a classification model, that assigns a class to every input.

ROC-AUC score is meant for classification problems where the output is the probability of the input belonging to a class. If you do a multi-class classification, then you can compute the score for each class independently.

Moreover, the predict method returns a discrete class, not a probability. Let's imagine you do a binary classification and have only one example, it should be classified as False. If your classifier yields a probability of 0.7, the ROC-AUC value is 1.0-0.7=0.3. If you use the predict method, the ROC-AUC value will be 1.0-1.0=0.0, which won't tell you much.

Question 2

While I am trying to use metrics.roc_auc_score, I am getting ValueError: multiclass format is not supported.

import lightgbm as lgb
from sklearn import metrics
def train_model(train, valid):

    dtrain = lgb.Dataset(train, label=y_train)
    dvalid = lgb.Dataset(valid, label=y_valid)

    param = {'num_leaves': 64, 'objective': 'binary', 
             'metric': 'auc', 'seed': 7}
    print("Training model!")
    bst = lgb.train(param, dtrain, num_boost_round=1000, valid_sets=[dvalid], 
                    early_stopping_rounds=10, verbose_eval=False)

    valid_pred = bst.predict(valid)
    print('Valid_pred: ')
    print(valid_pred)
    print('y_valid:')
    print(y_valid)
    valid_score = metrics.roc_auc_score(y_valid, valid_pred)
    print(f"Validation AUC score: {valid_score:.4f}")
    return bst

bst = train_model(X_train_final, X_valid_final)

valid_pred and y_valid are:

Training model!
Valid_pred: 
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1.]
y_valid:
Id
530     200624
492     133000
460     110000
280     192000
656      88000
         ...  
327     324000
441     555000
1388    136000
1324     82500
62      101000
Name: SalePrice, Length: 292, dtype: int64

Error:

ValueError                                Traceback (most recent call last)
<ipython-input-80-df034caf8c9b> in <module>
----> 1 bst = train_model(X_train_final, X_valid_final)

<ipython-input-79-483a6fb5ab9b> in train_model(train, valid)
     17     print('y_valid:')
     18     print(y_valid)
---> 19     valid_score = metrics.roc_auc_score(y_valid, valid_pred)
     20     print(f"Validation AUC score: {valid_score:.4f}")
     21     return bst

/opt/conda/lib/python3.6/site-packages/sklearn/metrics/ranking.py in roc_auc_score(y_true, y_score, average, sample_weight, max_fpr)
    353     return _average_binary_score(
    354         _binary_roc_auc_score, y_true, y_score, average,
--> 355         sample_weight=sample_weight)
    356 
    357 

/opt/conda/lib/python3.6/site-packages/sklearn/metrics/base.py in _average_binary_score(binary_metric, y_true, y_score, average, sample_weight)
     71     y_type = type_of_target(y_true)
     72     if y_type not in ("binary", "multilabel-indicator"):
---> 73         raise ValueError("{0} format is not supported".format(y_type))
     74 
     75     if y_type == "binary":

ValueError: multiclass format is not supported

I tried: valid_pred = pd.Series(bst.predict(valid)).astype(np.int64) also I removed 'objective': 'binary' and tried but no success.

Still not able to figure out what is the issue.

ValueError: multiclass format is not supported

Answer

Related questions