How to customize sklearn cross validation iterator by indices?

tangy picture tangy · Nov 24, 2014 · Viewed 9.6k times · Source

Similar to Custom cross validation split sklearn I want to define my own splits for GridSearchCV for which I need to customize the built in cross-validation iterator.

I want to pass my own set of train-test indices for cross validation to the GridSearch instead of allowing the iterator to determine them for me. I went through the available cv iterators on the sklearn documentation page but couldn't find it.

For example I want to implement something like this Data has 9 samples For 2 fold cv I create my own set of training-testing indices

>>> train_indices = [[1,3,5,7,9],[2,4,6,8]]
>>> test_indices = [[2,4,6,8],[1,3,5,7,9]]
                 1st fold^    2nd fold^
>>> custom_cv = sklearn.cross_validation.customcv(train_indices,test_indices)
>>> clf = GridSearchCV(X,y,params,cv=custom_cv)

What can be used to work like customcv?

Answer

eickenberg picture eickenberg · Nov 24, 2014

Actually, cross-validation iterators are just that: Iterators. They give back a tuple of train/test fold at each iteration. This should then work for you:

custom_cv = zip(train_indices, test_indices)

Also, for the specific case you are mentioning, you can do

import numpy as np
labels = np.arange(0, 10) % 2
from sklearn.cross_validation import LeaveOneLabelOut
cv = LeaveOneLabelOut(labels)

Observe that list(cv) yields

[(array([1, 3, 5, 7, 9]), array([0, 2, 4, 6, 8])),
 (array([0, 2, 4, 6, 8]), array([1, 3, 5, 7, 9]))]