After using cross_validation.KFold(n, n_folds=folds) I would like to access the indexes for training and testing of single fold, instead of going through all the folds.
So let's take the example code:
from sklearn import cross_validation
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4])
kf = cross_validation.KFold(4, n_folds=2)
>>> print(kf)
sklearn.cross_validation.KFold(n=4, n_folds=2, shuffle=False,
random_state=None)
>>> for train_index, test_index in kf:
I would like to access the first fold in kf like this (instead of for loop):
train_index, test_index in kf[0]
This should return just the first fold, but instead I get the error: "TypeError: 'KFold' object does not support indexing"
What I want as output:
>>> train_index, test_index in kf[0]
>>> print("TRAIN:", train_index, "TEST:", test_index)
TRAIN: [2 3] TEST: [0 1]
Link: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.KFold.html
How do I retrieve the indexes for train and test for only a single fold, without going through the whole for loop?
You are on the right track. All you need to do now is:
kf = cross_validation.KFold(4, n_folds=2)
mylist = list(kf)
train, test = mylist[0]
kf
is actually a generator, which doesn't compute the train-test split until it is needed. This improves memory usage, as you are not storing items you don't need. Making a list of the KFold
object forces it to make all values available.
Here are two great SO question that explain what generators are: one and two
Edit Nov 2018
The API has changed since sklearn 0.20. An updated example (for py3.6):
from sklearn.model_selection import KFold
import numpy as np
kf = KFold(n_splits=4)
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
X_train, X_test = next(kf.split(X))
In [12]: X_train
Out[12]: array([2, 3])
In [13]: X_test
Out[13]: array([0, 1])