I am new to deep learning and Keras and one of the improvement I try to make to my model training process is to make use of Keras's keras.callbacks.EarlyStopping
callback function.
Based on the output from training my model, does it seem reasonable to use the following parameters for EarlyStopping
?
EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=5, verbose=0, mode='auto')
Also, why does it appear to be stopped sooner than it should if it was to wait for 5 consecutive epochs where the difference in val_loss
is lesser than a min_delta
of 0.0001?
Output while training LSTM model (without EarlyStop)
Runs all 100 epochs
Epoch 1/100
10200/10200 [==============================] - 133s 12ms/step - loss: 1.1236 - val_loss: 0.6431
Epoch 2/100
10200/10200 [==============================] - 141s 13ms/step - loss: 0.2783 - val_loss: 0.0301
Epoch 3/100
10200/10200 [==============================] - 143s 13ms/step - loss: 0.1131 - val_loss: 0.1716
Epoch 4/100
10200/10200 [==============================] - 145s 13ms/step - loss: 0.0586 - val_loss: 0.3671
Epoch 5/100
10200/10200 [==============================] - 146s 13ms/step - loss: 0.0785 - val_loss: 0.0038
Epoch 6/100
10200/10200 [==============================] - 146s 13ms/step - loss: 0.0549 - val_loss: 0.0041
Epoch 7/100
10200/10200 [==============================] - 147s 13ms/step - loss: 4.7482e-04 - val_loss: 8.9437e-05
Epoch 8/100
10200/10200 [==============================] - 149s 14ms/step - loss: 1.5181e-05 - val_loss: 4.7367e-06
Epoch 9/100
10200/10200 [==============================] - 149s 14ms/step - loss: 9.1632e-07 - val_loss: 3.6576e-07
Epoch 10/100
10200/10200 [==============================] - 149s 14ms/step - loss: 1.4117e-07 - val_loss: 1.6058e-07
Epoch 11/100
10200/10200 [==============================] - 152s 14ms/step - loss: 1.2024e-07 - val_loss: 1.2804e-07
Epoch 12/100
10200/10200 [==============================] - 150s 14ms/step - loss: 0.0151 - val_loss: 0.4181
Epoch 13/100
10200/10200 [==============================] - 148s 14ms/step - loss: 0.0701 - val_loss: 0.0057
Epoch 14/100
10200/10200 [==============================] - 148s 14ms/step - loss: 0.0332 - val_loss: 5.0014e-04
Epoch 15/100
10200/10200 [==============================] - 147s 14ms/step - loss: 0.0367 - val_loss: 0.0020
Epoch 16/100
10200/10200 [==============================] - 151s 14ms/step - loss: 0.0040 - val_loss: 0.0739
Epoch 17/100
10200/10200 [==============================] - 148s 14ms/step - loss: 0.0282 - val_loss: 6.4996e-05
Epoch 18/100
10200/10200 [==============================] - 147s 13ms/step - loss: 0.0346 - val_loss: 1.6545e-04
Epoch 19/100
10200/10200 [==============================] - 147s 14ms/step - loss: 4.6678e-05 - val_loss: 6.8101e-06
Epoch 20/100
10200/10200 [==============================] - 148s 14ms/step - loss: 1.7270e-06 - val_loss: 6.7108e-07
Epoch 21/100
10200/10200 [==============================] - 147s 14ms/step - loss: 2.4334e-07 - val_loss: 1.5736e-07
Epoch 22/100
10200/10200 [==============================] - 147s 14ms/step - loss: 0.0416 - val_loss: 0.0547
Epoch 23/100
10200/10200 [==============================] - 148s 14ms/step - loss: 0.0413 - val_loss: 0.0145
Epoch 24/100
10200/10200 [==============================] - 148s 14ms/step - loss: 0.0045 - val_loss: 1.1096e-04
Epoch 25/100
10200/10200 [==============================] - 149s 14ms/step - loss: 0.0218 - val_loss: 0.0083
Epoch 26/100
10200/10200 [==============================] - 148s 14ms/step - loss: 0.0029 - val_loss: 5.0954e-05
Epoch 27/100
10200/10200 [==============================] - 148s 14ms/step - loss: 0.0316 - val_loss: 0.0035
Epoch 28/100
10200/10200 [==============================] - 148s 14ms/step - loss: 0.0032 - val_loss: 0.2343
Epoch 29/100
10200/10200 [==============================] - 149s 14ms/step - loss: 0.0299 - val_loss: 0.0021
Epoch 30/100
10200/10200 [==============================] - 150s 14ms/step - loss: 0.0171 - val_loss: 9.3622e-04
Epoch 31/100
10200/10200 [==============================] - 149s 14ms/step - loss: 0.0167 - val_loss: 0.0023
Epoch 32/100
10200/10200 [==============================] - 148s 14ms/step - loss: 7.3654e-04 - val_loss: 4.1998e-05
Epoch 33/100
10200/10200 [==============================] - 149s 14ms/step - loss: 7.3300e-06 - val_loss: 1.9043e-06
Epoch 34/100
10200/10200 [==============================] - 148s 14ms/step - loss: 6.6648e-07 - val_loss: 2.3814e-07
Epoch 35/100
10200/10200 [==============================] - 147s 14ms/step - loss: 1.5611e-07 - val_loss: 1.3155e-07
Epoch 36/100
10200/10200 [==============================] - 149s 14ms/step - loss: 1.2159e-07 - val_loss: 1.2398e-07
Epoch 37/100
10200/10200 [==============================] - 149s 14ms/step - loss: 1.1940e-07 - val_loss: 1.1977e-07
Epoch 38/100
10200/10200 [==============================] - 150s 14ms/step - loss: 1.1939e-07 - val_loss: 1.1935e-07
Epoch 39/100
10200/10200 [==============================] - 149s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1935e-07
Epoch 40/100
10200/10200 [==============================] - 149s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1935e-07
Epoch 41/100
10200/10200 [==============================] - 150s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1921e-07
Epoch 42/100
10200/10200 [==============================] - 149s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1921e-07
Epoch 43/100
10200/10200 [==============================] - 149s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1921e-07
Epoch 44/100
10200/10200 [==============================] - 149s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1921e-07
Epoch 45/100
10200/10200 [==============================] - 149s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1921e-07
Epoch 46/100
10200/10200 [==============================] - 151s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1921e-07
Epoch 47/100
10200/10200 [==============================] - 151s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1921e-07
Epoch 48/100
10200/10200 [==============================] - 151s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1921e-07
Output with EarlyStop
Stops (too early?) after 11 epoches
10200/10200 [==============================] - 134s 12ms/step - loss: 1.2733 - val_loss: 0.9022
Epoch 2/100
10200/10200 [==============================] - 144s 13ms/step - loss: 0.5429 - val_loss: 0.4093
Epoch 3/100
10200/10200 [==============================] - 144s 13ms/step - loss: 0.1644 - val_loss: 0.0552
Epoch 4/100
10200/10200 [==============================] - 144s 13ms/step - loss: 0.0263 - val_loss: 0.9872
Epoch 5/100
10200/10200 [==============================] - 145s 13ms/step - loss: 0.1297 - val_loss: 0.1175
Epoch 6/100
10200/10200 [==============================] - 146s 13ms/step - loss: 0.0287 - val_loss: 0.0136
Epoch 7/100
10200/10200 [==============================] - 145s 13ms/step - loss: 0.0718 - val_loss: 0.0270
Epoch 8/100
10200/10200 [==============================] - 145s 13ms/step - loss: 0.0272 - val_loss: 0.0530
Epoch 9/100
10200/10200 [==============================] - 150s 14ms/step - loss: 3.3879e-04 - val_loss: 0.0575
Epoch 10/100
10200/10200 [==============================] - 146s 13ms/step - loss: 1.6789e-05 - val_loss: 0.0766
Epoch 11/100
10200/10200 [==============================] - 149s 14ms/step - loss: 1.4124e-06 - val_loss: 0.0981
Training stops early here.
EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='min')
Tried setting min_delta
to 0. Why is it stopping even though val_loss
increased from 0.0011 to 0.1045?
10200/10200 [==============================] - 140s 13ms/step - loss: 1.1938 - val_loss: 0.5941
Epoch 2/100
10200/10200 [==============================] - 150s 14ms/step - loss: 0.3307 - val_loss: 0.0989
Epoch 3/100
10200/10200 [==============================] - 151s 14ms/step - loss: 0.0946 - val_loss: 0.0213
Epoch 4/100
10200/10200 [==============================] - 149s 14ms/step - loss: 0.0521 - val_loss: 0.0011
Epoch 5/100
10200/10200 [==============================] - 150s 14ms/step - loss: 0.0793 - val_loss: 0.0313
Epoch 6/100
10200/10200 [==============================] - 154s 14ms/step - loss: 0.0367 - val_loss: 0.0369
Epoch 7/100
10200/10200 [==============================] - 154s 14ms/step - loss: 0.0323 - val_loss: 0.0014
Epoch 8/100
10200/10200 [==============================] - 153s 14ms/step - loss: 0.0408 - val_loss: 0.0011
Epoch 9/100
10200/10200 [==============================] - 154s 14ms/step - loss: 0.0379 - val_loss: 0.1045
Training stops early here.
The role of two parameters is clear from keras documentation.
min_delta : minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.
patience : number of epochs with no improvement after which training will be stopped.
Actually there is no standard value for these parameters. You need to analyse the participants(dataset,environment,model-type) of the training process to decide their values.
(1). patience
patience
. And vice-versa for a
good & clear dataset.checkpoint files
after
specific number of epochs with a low value of patience
. And then
use checkpoints to further improve as required. Analyse similarly for other model types.patience
. And
may try larger value with GPU.(2). min_delta
0
works pretty well in many cases.