understanding python xgboost cv

kilojoules picture kilojoules · Dec 26, 2015 · Viewed 31.8k times · Source

I would like to use the xgboost cv function to find the best parameters for my training data set. I am confused by the api. How do I find the best parameter? Is this similar to the sklearn grid_search cross-validation function? How can I find which of the options for the max_depth parameter ([2,4,6]) was determined optimal?

from sklearn.datasets import load_iris
import xgboost as xgb
iris = load_iris()
DTrain = xgb.DMatrix(iris.data, iris.target)
x_parameters = {"max_depth":[2,4,6]}
xgb.cv(x_parameters, DTrain)
...
Out[6]: 
   test-rmse-mean  test-rmse-std  train-rmse-mean  train-rmse-std
0        0.888435       0.059403         0.888052        0.022942
1        0.854170       0.053118         0.851958        0.017982
2        0.837200       0.046986         0.833532        0.015613
3        0.829001       0.041960         0.824270        0.014501
4        0.825132       0.038176         0.819654        0.013975
5        0.823357       0.035454         0.817363        0.013722
6        0.822580       0.033540         0.816229        0.013598
7        0.822265       0.032209         0.815667        0.013538
8        0.822158       0.031287         0.815390        0.013508
9        0.822140       0.030647         0.815252        0.013494

Answer

Deepish picture Deepish · Apr 14, 2016

Sklearn GridSearchCV should be a way to go if you are looking for parameter tuning. You need to just pass the xgb classifier to GridSearchCV and evaluate on the best CV score.

here is nice tutorial which might help you getting started with parameter tuning: http://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/