Early Stopping
In the previous lesson, we learned how epochs, batch size, and learning rate control the training process.
Now we address a very practical problem faced in real-world deep learning systems — knowing when to stop training.
This is where early stopping becomes one of the most important training techniques.
Why Training Too Long Is Dangerous
At the beginning of training, a model learns useful patterns. Both training and validation accuracy improve.
After a certain point, the model may start memorizing training data instead of learning general rules.
This leads to a situation where:
Training accuracy keeps increasing, but validation accuracy starts decreasing.
Early stopping helps us stop training at the right moment — before overfitting begins.
How Early Stopping Works
Early stopping continuously monitors a validation metric such as validation loss or validation accuracy.
If the monitored metric does not improve for a certain number of epochs, training is automatically stopped.
This number of epochs is called patience.
Early stopping allows the model to train long enough to learn meaningful patterns but prevents unnecessary overtraining.
Real-World Analogy
Imagine preparing for an exam.
At first, studying improves your understanding. After a point, additional hours cause fatigue and reduce performance.
A good teacher knows when to stop. Early stopping plays the same role for neural networks.
Implementing Early Stopping
Modern deep learning frameworks provide built-in support for early stopping.
Below is an example using TensorFlow/Keras.
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(
monitor="val_loss",
patience=3,
restore_best_weights=True
)
This configuration means:
If validation loss does not improve for 3 consecutive epochs, training will stop and the best model weights will be restored.
Using Early Stopping During Training
model.fit(
X_train,
y_train,
epochs=50,
batch_size=32,
validation_split=0.2,
callbacks=[early_stop]
)
Even though we set epochs to 50, training may stop much earlier.
This makes training efficient and prevents overfitting without manual monitoring.
What Metric Should We Monitor?
In most classification problems, validation loss is preferred because it reflects both confidence and correctness.
Validation accuracy is sometimes misleading, especially when class imbalance exists.
That is why professional systems usually monitor validation loss.
Common Mistakes
A very small patience value may stop training too early.
A very large patience value reduces the benefit of early stopping.
Choosing patience is a balance between stability and responsiveness.
Mini Practice
If validation loss improves very slowly, should patience be increased or decreased? Think before answering.
Exercises
Exercise 1:
Why do we restore the best weights in early stopping?
Exercise 2:
What happens if patience is set too high?
Quick Quiz
Q1. Does early stopping reduce training time?
Q2. Which metric is commonly monitored?
In the next lesson, we will explore vanishing and exploding gradients and understand why deep networks become difficult to train as they grow deeper.