I would like to use the xgboost cv function to find the best parameters for my training data set. I am confused by the api. How do I find the best parameter? Is this similar to the sklearn grid_search
cross-validation function? How can I find which of the options for the max_depth
parameter ([2,4,6]) was determined optimal?
from sklearn.datasets import load_iris
import xgboost as xgb
iris = load_iris()
DTrain = xgb.DMatrix(iris.data, iris.target)
x_parameters = {"max_depth":[2,4,6]}
xgb.cv(x_parameters, DTrain)
...
Out[6]:
test-rmse-mean test-rmse-std train-rmse-mean train-rmse-std
0 0.888435 0.059403 0.888052 0.022942
1 0.854170 0.053118 0.851958 0.017982
2 0.837200 0.046986 0.833532 0.015613
3 0.829001 0.041960 0.824270 0.014501
4 0.825132 0.038176 0.819654 0.013975
5 0.823357 0.035454 0.817363 0.013722
6 0.822580 0.033540 0.816229 0.013598
7 0.822265 0.032209 0.815667 0.013538
8 0.822158 0.031287 0.815390 0.013508
9 0.822140 0.030647 0.815252 0.013494
Sklearn GridSearchCV
should be a way to go if you are looking for parameter tuning. You need to just pass the xgb classifier to GridSearchCV and evaluate on the best CV score.
here is nice tutorial which might help you getting started with parameter tuning: http://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/