Dropout
In the previous lesson, we learned how regularization controls model complexity by limiting how large weights can grow.
In this lesson, we study a technique that takes a very different approach to the same problem — Dropout.
Dropout is one of the most powerful and widely used regularization techniques in deep learning.
What Is Dropout?
Dropout is a technique where, during training, some neurons are randomly turned off.
This means that on each training step, the network behaves like a slightly different smaller network.
The key idea is simple: do not let neurons depend too much on each other.
Why Dropout Works
When a network trains normally, some neurons become highly specialized and rely on specific other neurons.
This co-dependence increases the risk of overfitting.
Dropout breaks this dependency by forcing neurons to learn independently useful representations.
As a result, the final model becomes more robust.
Important Training vs Inference Difference
Dropout is applied only during training.
During inference (prediction), all neurons are active.
This is very important:
The network learns under difficult conditions, but performs at full capacity when making predictions.
Dropout Probability Explained
Dropout uses a probability value,
commonly called rate.
If the dropout rate is 0.5, then 50% of neurons are randomly dropped on each training step.
Typical values range from 0.2 to 0.5.
Code Example: Dropout Layer
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
model = Sequential([
Dense(128, activation="relu"),
Dropout(0.5),
Dense(64, activation="relu"),
Dropout(0.3),
Dense(1)
])
Here, the first hidden layer drops 50% of neurons, and the second drops 30%.
Why Different Dropout Rates?
Earlier layers often learn more general patterns, so they can tolerate higher dropout.
Deeper layers capture more specific patterns, so lower dropout is usually safer.
There is no perfect value — dropout is always tuned experimentally.
Dropout Is Not Noise
Dropout may look like random noise, but it has a structured purpose.
Each neuron is forced to work without knowing which other neurons will be present.
This encourages redundancy and stability.
Common Mistakes with Dropout
Dropout should not be applied everywhere.
Using dropout on very small networks or with very little data may slow learning too much.
Dropout is most effective in large deep networks.
Exercises
Exercise 1:
Why does dropout reduce overfitting?
Exercise 2:
Is dropout active during prediction?
Quick Quiz
Q1. What does a dropout rate of 0.4 mean?
Q2. Why is dropout considered a regularization technique?
In the next lesson, we will explore Batch Normalization, a technique that improves training stability and allows deeper networks to train faster.