I am trying to evaluate a relevance of features and I am using DecisionTreeRegressor()
The related part of the code is presented below:
# TODO: Make a copy of the DataFrame, using the 'drop' function to drop the given feature
new_data = data.drop(['Frozen'], axis = 1)
# TODO: Split the data into training and testing sets(0.25) using the given feature as the target
# TODO: Set a random state.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(new_data, data['Frozen'], test_size = 0.25, random_state = 1)
# TODO: Create a decision tree regressor and fit it to the training set
from sklearn.tree import DecisionTreeRegressor
regressor = DecisionTreeRegressor(random_state=1)
regressor.fit(X_train, y_train)
# TODO: Report the score of the prediction using the testing set
from sklearn.model_selection import cross_val_score
#score = cross_val_score(regressor, X_test, y_test)
score = regressor.score(X_test, y_test)
print score # python 2.x
When I run the print
function, it returns the given score:
-0.649574327334
You can find the score function implementatioin and some explanation below here and below:
Returns the coefficient of determination R^2 of the prediction. ... The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse).
I could not grasp the whole concept yet, so this explanation is not very helpful for me. For instance I could not understand why score could be negative and what exactly it indicates (if something is squared, I would expect it can only be positive).
What does this score indicates and why can it be negative?
If you know any article (for starters) it might be helpful as well!
R^2
can be negative from its definition (https://en.wikipedia.org/wiki/Coefficient_of_determination) if the model fits the data worse than a horizontal line. Basically
R^2 = 1 - SS_res/SS_tot
and SS_res
and SS_tot
are always positive. If SS_res >> SS_tot
, you have a negative R^2
. Look at this answer as well: https://stats.stackexchange.com/questions/12900/when-is-r-squared-negative