DL Lesson 21 – Early Stopping | Dataplexa

Early Stopping

In the previous lesson, we learned how epochs, batch size, and learning rate control the training process.

Now we address a very practical problem faced in real-world deep learning systems — knowing when to stop training.

This is where early stopping becomes one of the most important training techniques.


Why Training Too Long Is Dangerous

At the beginning of training, a model learns useful patterns. Both training and validation accuracy improve.

After a certain point, the model may start memorizing training data instead of learning general rules.

This leads to a situation where:

Training accuracy keeps increasing, but validation accuracy starts decreasing.

Early stopping helps us stop training at the right moment — before overfitting begins.


How Early Stopping Works

Early stopping continuously monitors a validation metric such as validation loss or validation accuracy.

If the monitored metric does not improve for a certain number of epochs, training is automatically stopped.

This number of epochs is called patience.

Early stopping allows the model to train long enough to learn meaningful patterns but prevents unnecessary overtraining.


Real-World Analogy

Imagine preparing for an exam.

At first, studying improves your understanding. After a point, additional hours cause fatigue and reduce performance.

A good teacher knows when to stop. Early stopping plays the same role for neural networks.


Implementing Early Stopping

Modern deep learning frameworks provide built-in support for early stopping.

Below is an example using TensorFlow/Keras.

from tensorflow.keras.callbacks import EarlyStopping

early_stop = EarlyStopping(
    monitor="val_loss",
    patience=3,
    restore_best_weights=True
)

This configuration means:

If validation loss does not improve for 3 consecutive epochs, training will stop and the best model weights will be restored.


Using Early Stopping During Training

model.fit(
    X_train,
    y_train,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    callbacks=[early_stop]
)

Even though we set epochs to 50, training may stop much earlier.

This makes training efficient and prevents overfitting without manual monitoring.


What Metric Should We Monitor?

In most classification problems, validation loss is preferred because it reflects both confidence and correctness.

Validation accuracy is sometimes misleading, especially when class imbalance exists.

That is why professional systems usually monitor validation loss.


Common Mistakes

A very small patience value may stop training too early.

A very large patience value reduces the benefit of early stopping.

Choosing patience is a balance between stability and responsiveness.


Mini Practice

If validation loss improves very slowly, should patience be increased or decreased? Think before answering.


Exercises

Exercise 1:
Why do we restore the best weights in early stopping?

Because the final epoch may not be the best-performing model.

Exercise 2:
What happens if patience is set too high?

Training may continue too long and lead to overfitting.

Quick Quiz

Q1. Does early stopping reduce training time?

Yes. It stops training once improvements stop.

Q2. Which metric is commonly monitored?

Validation loss.

In the next lesson, we will explore vanishing and exploding gradients and understand why deep networks become difficult to train as they grow deeper.