What is the difference between x_train and x_test in Keras?

Kenny Worden picture Kenny Worden · Sep 29, 2017 · Viewed 16.5k times · Source

I've looked at a few tutorials to crack into Keras for deep learning using Convolutional Neural Networks. In the tutorial (and in Keras' official documentation), the MNIST dataset is loaded like so:

from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

However, no explanation is offered as to why we have two tuples of data. My question is: what are x_train and y_train and how do they differ from their x_test and y_test counterparts?

Answer

blackHoleDetector picture blackHoleDetector · Sep 29, 2017

The training set is a subset of the data set used to train a model.

  • x_train is the training data set.
  • y_train is the set of labels to all the data in x_train.

The test set is a subset of the data set that you use to test your model after the model has gone through initial vetting by the validation set.

  • x_test is the test data set.
  • y_test is the set of labels to all the data in x_test.

The validation set is a subset of the data set (separate from the training set) that you use to adjust hyperparameters.

  • The example you listed doesn't mention the validation set.

I've made a Deep Learning with Keras playlist on Youtube. It contains the basics for getting started with Keras, and a couple of the videos demo how to organize images into train/valid/test sets, as well as how to get Keras to create a validation set for you. Seeing this implementation may help you get a firmer grasp on how these different data sets are used in practice.