ValueError: feature_names mismatch: in xgboost in the predict() function

Sujay S Kumar picture Sujay S Kumar · Feb 20, 2017 · Viewed 24.9k times · Source

I have trained an XGBoostRegressor model. When I have to use this trained model for predicting for a new input, the predict() function throws a feature_names mismatch error, although the input feature vector has the same structure as the training data.

Also, in order to build the feature vector in the same structure as the training data, I am doing a lot inefficient processing such as adding new empty columns (if data does not exist) and then rearranging the data columns so that it matches with the training structure. Is there a better and cleaner way of formatting the input so that it matches the training structure?

Answer

Athar picture Athar · Jun 4, 2019

This is the case where the order of column-names while model building is different from order of column-names while model scoring.

I have used the following steps to overcome this error

First load the pickle file

model = pickle.load(open("saved_model_file", "rb"))

extraxt all the columns with order in which they were used

cols_when_model_builds = model.get_booster().feature_names

reorder the pandas dataframe

pd_dataframe = pd_dataframe[cols_when_model_builds]