DL Lesson 11 – Regularization | Dataplexa

Regularization

In the previous lesson, we clearly identified one of the biggest challenges in deep learning — overfitting.

In this lesson, we learn how deep learning models are controlled so that they learn meaningful patterns instead of memorizing noise.

This control mechanism is called regularization.


What Is Regularization?

Regularization is a collection of techniques used to prevent a neural network from becoming too complex.

Instead of allowing the model to freely adjust its weights, regularization gently restricts how large or aggressive those weights can become.

The goal is not to reduce accuracy, but to improve generalization.


Why Deep Learning Needs Regularization

Deep neural networks often contain millions of parameters.

With enough capacity, a model can memorize the training data perfectly without understanding the true relationship between inputs and outputs.

Regularization forces the model to learn simpler, more robust patterns that perform well on unseen data.


Intuition Behind Regularization

Imagine drawing a curve through data points.

A very complex curve may pass through every point perfectly, but small changes in new data will cause large prediction errors.

A smoother curve may not fit every point exactly, but it behaves more reliably.

Regularization encourages smoother solutions.


L2 Regularization (Weight Decay)

L2 regularization penalizes large weights by adding their squared values to the loss function.

This discourages the model from relying heavily on any single feature.

Mathematically, the loss becomes:

Loss = Original Loss + λ × (sum of squared weights)

Here, λ (lambda) controls how strong the regularization is.


Code Example: L2 Regularization

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.regularizers import l2

model = Sequential([
    Dense(128, activation="relu", kernel_regularizer=l2(0.01)),
    Dense(64, activation="relu", kernel_regularizer=l2(0.01)),
    Dense(1)
])

The regularizer quietly influences training without changing how predictions are made.


L1 Regularization

L1 regularization penalizes the absolute values of weights.

This encourages many weights to become exactly zero, effectively performing feature selection.

L1 is useful when we suspect only a few features are truly important.


Code Example: L1 Regularization

from tensorflow.keras.regularizers import l1

model = Sequential([
    Dense(128, activation="relu", kernel_regularizer=l1(0.001)),
    Dense(64, activation="relu", kernel_regularizer=l1(0.001)),
    Dense(1)
])

Compared to L2, L1 creates sparser models.


L1 vs L2 (Practical Difference)

L2 regularization spreads learning across many features, keeping weights small but non-zero.

L1 regularization aggressively removes weak features by driving weights to zero.

In practice, L2 is more commonly used in deep learning, while L1 is used when interpretability matters.


Regularization Is Not a Hack

Regularization is not used to fix bad data or poor design.

It is a principled way to guide learning when model capacity exceeds data quality.

Almost every successful deep learning system uses regularization in some form.


Exercises

Exercise 1:
Why does regularization reduce overfitting?

It limits model complexity and prevents weight explosion.

Exercise 2:
Which regularization encourages sparse models?

L1 regularization.

Quick Quiz

Q1. What does the lambda parameter control?

The strength of regularization.

Q2. Does regularization change inference behavior?

No, it only affects training.

In the next lesson, we will study Dropout, one of the most powerful and practical regularization techniques used in modern deep learning.