What is the difference between steps and epochs in TensorFlow?

Yang picture Yang · Jul 13, 2016 · Viewed 87.9k times · Source

In most of the models, there is a steps parameter indicating the number of steps to run over data. But yet I see in most practical usage, we also execute the fit function N epochs.

What is the difference between running 1000 steps with 1 epoch and running 100 steps with 10 epoch? Which one is better in practice? Any logic changes between consecutive epochs? Data shuffling?

Answer

MarvMind picture MarvMind · Jun 7, 2017

A training step is one gradient update. In one step batch_size many examples are processed.

An epoch consists of one full cycle through the training data. This is usually many steps. As an example, if you have 2,000 images and use a batch size of 10 an epoch consists of 2,000 images / (10 images / step) = 200 steps.

If you choose our training image randomly (and independent) in each step, you normally do not call it epoch. [This is where my answer differs from the previous one. Also see my comment.]