Im trying to create a Random Forest model with GridSearchCV but am getting an error pertaining to param_grid: "ValueError: Invalid parameter max_features for estimator Pipeline. Check the list of available parameters with `estimator.get_params().keys()". I'm classifying documents so I am also pushing tf-idf vectorizer to the pipeline. Here is the code:
from sklearn import metrics
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, f1_score, accuracy_score, precision_score, confusion_matrix
from sklearn.pipeline import Pipeline
#Classifier Pipeline
pipeline = Pipeline([
('tfidf', TfidfVectorizer()),
('classifier', RandomForestClassifier())
])
# Params for classifier
params = {"max_depth": [3, None],
"max_features": [1, 3, 10],
"min_samples_split": [1, 3, 10],
"min_samples_leaf": [1, 3, 10],
# "bootstrap": [True, False],
"criterion": ["gini", "entropy"]}
# Grid Search Execute
rf_grid = GridSearchCV(estimator=pipeline , param_grid=params) #cv=10
rf_detector = rf_grid.fit(X_train, Y_train)
print(rf_grid.grid_scores_)
I can't figure out why the error is showing. The same btw is occurring when I run a decision tree with GridSearchCV. (Scikit-learn 0.17)
You have to assign the parameters to the named step in the pipeline. In your case classifier
. Try prepending classifier__
to the parameter name. Sample pipeline
params = {"classifier__max_depth": [3, None],
"classifier__max_features": [1, 3, 10],
"classifier__min_samples_split": [1, 3, 10],
"classifier__min_samples_leaf": [1, 3, 10],
# "bootstrap": [True, False],
"classifier__criterion": ["gini", "entropy"]}