I've looked at a few tutorials to crack into Keras for deep learning using Convolutional Neural Networks. In the tutorial (and in Keras' official documentation), the MNIST dataset is loaded like so:
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
However, no explanation is offered as to why we have two tuples of data. My question is: what are x_train
and y_train
and how do they differ from their x_test
and y_test
counterparts?
The training set is a subset of the data set used to train a model.
x_train
is the training data set. y_train
is the set of labels to all the data in x_train
.The test set is a subset of the data set that you use to test your model after the model has gone through initial vetting by the validation set.
x_test
is the test data set.y_test
is the set of labels to all the data in x_test
.The validation set is a subset of the data set (separate from the training set) that you use to adjust hyperparameters.
I've made a Deep Learning with Keras playlist on Youtube. It contains the basics for getting started with Keras, and a couple of the videos demo how to organize images into train/valid/test sets, as well as how to get Keras to create a validation set for you. Seeing this implementation may help you get a firmer grasp on how these different data sets are used in practice.