Regularization
In the previous lesson, we clearly identified one of the biggest challenges in deep learning — overfitting.
In this lesson, we learn how deep learning models are controlled so that they learn meaningful patterns instead of memorizing noise.
This control mechanism is called regularization.
What Is Regularization?
Regularization is a collection of techniques used to prevent a neural network from becoming too complex.
Instead of allowing the model to freely adjust its weights, regularization gently restricts how large or aggressive those weights can become.
The goal is not to reduce accuracy, but to improve generalization.
Why Deep Learning Needs Regularization
Deep neural networks often contain millions of parameters.
With enough capacity, a model can memorize the training data perfectly without understanding the true relationship between inputs and outputs.
Regularization forces the model to learn simpler, more robust patterns that perform well on unseen data.
Intuition Behind Regularization
Imagine drawing a curve through data points.
A very complex curve may pass through every point perfectly, but small changes in new data will cause large prediction errors.
A smoother curve may not fit every point exactly, but it behaves more reliably.
Regularization encourages smoother solutions.
L2 Regularization (Weight Decay)
L2 regularization penalizes large weights by adding their squared values to the loss function.
This discourages the model from relying heavily on any single feature.
Mathematically, the loss becomes:
Loss = Original Loss + λ × (sum of squared weights)
Here, λ (lambda) controls how strong the regularization is.
Code Example: L2 Regularization
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.regularizers import l2
model = Sequential([
Dense(128, activation="relu", kernel_regularizer=l2(0.01)),
Dense(64, activation="relu", kernel_regularizer=l2(0.01)),
Dense(1)
])
The regularizer quietly influences training without changing how predictions are made.
L1 Regularization
L1 regularization penalizes the absolute values of weights.
This encourages many weights to become exactly zero, effectively performing feature selection.
L1 is useful when we suspect only a few features are truly important.
Code Example: L1 Regularization
from tensorflow.keras.regularizers import l1
model = Sequential([
Dense(128, activation="relu", kernel_regularizer=l1(0.001)),
Dense(64, activation="relu", kernel_regularizer=l1(0.001)),
Dense(1)
])
Compared to L2, L1 creates sparser models.
L1 vs L2 (Practical Difference)
L2 regularization spreads learning across many features, keeping weights small but non-zero.
L1 regularization aggressively removes weak features by driving weights to zero.
In practice, L2 is more commonly used in deep learning, while L1 is used when interpretability matters.
Regularization Is Not a Hack
Regularization is not used to fix bad data or poor design.
It is a principled way to guide learning when model capacity exceeds data quality.
Almost every successful deep learning system uses regularization in some form.
Exercises
Exercise 1:
Why does regularization reduce overfitting?
Exercise 2:
Which regularization encourages sparse models?
Quick Quiz
Q1. What does the lambda parameter control?
Q2. Does regularization change inference behavior?
In the next lesson, we will study Dropout, one of the most powerful and practical regularization techniques used in modern deep learning.