Interpreting the DecisionTreeRegressor score?

wasabi picture wasabi · Sep 10, 2017 · Viewed 10.8k times · Source

I am trying to evaluate a relevance of features and I am using DecisionTreeRegressor()

The related part of the code is presented below:

# TODO: Make a copy of the DataFrame, using the 'drop' function to drop the given feature
new_data = data.drop(['Frozen'], axis = 1)

# TODO: Split the data into training and testing sets(0.25) using the given feature as the target
# TODO: Set a random state.

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(new_data, data['Frozen'], test_size = 0.25, random_state = 1)

# TODO: Create a decision tree regressor and fit it to the training set

from sklearn.tree import DecisionTreeRegressor


regressor = DecisionTreeRegressor(random_state=1)
regressor.fit(X_train, y_train)

# TODO: Report the score of the prediction using the testing set

from sklearn.model_selection import cross_val_score


#score = cross_val_score(regressor, X_test, y_test)
score = regressor.score(X_test, y_test)

print score  # python 2.x 

When I run the print function, it returns the given score:

-0.649574327334

You can find the score function implementatioin and some explanation below here and below:

Returns the coefficient of determination R^2 of the prediction. ... The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse).

I could not grasp the whole concept yet, so this explanation is not very helpful for me. For instance I could not understand why score could be negative and what exactly it indicates (if something is squared, I would expect it can only be positive).


What does this score indicates and why can it be negative?

If you know any article (for starters) it might be helpful as well!

Answer

Longyu Zhao picture Longyu Zhao · Jan 30, 2018

R^2 can be negative from its definition (https://en.wikipedia.org/wiki/Coefficient_of_determination) if the model fits the data worse than a horizontal line. Basically

R^2 = 1 - SS_res/SS_tot

and SS_res and SS_tot are always positive. If SS_res >> SS_tot, you have a negative R^2. Look at this answer as well: https://stats.stackexchange.com/questions/12900/when-is-r-squared-negative