How to find the features names of the coefficients using scikit linear regression?

amehta picture amehta · Jan 7, 2016 · Viewed 37.5k times · Source
#training the model
model_1_features = ['sqft_living', 'bathrooms', 'bedrooms', 'lat', 'long']
model_2_features = model_1_features + ['bed_bath_rooms']
model_3_features = model_2_features + ['bedrooms_squared', 'log_sqft_living', 'lat_plus_long']

model_1 = linear_model.LinearRegression()
model_1.fit(train_data[model_1_features], train_data['price'])

model_2 = linear_model.LinearRegression()
model_2.fit(train_data[model_2_features], train_data['price'])

model_3 = linear_model.LinearRegression()
model_3.fit(train_data[model_3_features], train_data['price'])

# extracting the coef
print model_1.coef_
print model_2.coef_
print model_3.coef_

If I change the order of the features, the coef are still printed in the same order, hence I would like to know the mapping of the feature with the coeff

Answer

Robin Spiess picture Robin Spiess · Jan 18, 2016

The trick is that right after you have trained your model, you know the order of the coefficients:

model_1 = linear_model.LinearRegression()
model_1.fit(train_data[model_1_features], train_data['price'])
print(list(zip(model_1.coef_, model_1_features)))

This will print the coefficients and the correct feature. (Tested with pandas DataFrame)

If you want to reuse the coefficients later you can also put them in a dictionary:

coef_dict = {}
for coef, feat in zip(model_1.coef_,model_1_features):
    coef_dict[feat] = coef

(You can test it for yourself by training two models with the same features but, as you said, shuffled order of features.)