DL Lesson 30 – Gradient Checking | Dataplexa

Gradient Checking

In the previous lesson, we built a neural network from scratch and discussed how gradients guide learning.

Now comes a very important question:

How do we know our gradients are correct?

This is where gradient checking becomes essential.

Why Gradient Errors Are Dangerous

Backpropagation relies on calculus.

A small mistake in derivative calculation can silently break learning.

The model may:

• Train extremely slowly • Never converge • Appear random

And you may never know why.

What Is Gradient Checking?

Gradient checking is a debugging technique.

It compares:

• Analytical gradients (from backpropagation) • Numerical gradients (from approximation)

If both match closely, your implementation is correct.

Numerical Gradient Intuition

Instead of computing derivatives using math, we estimate them using small changes.

Think of standing on a hill.

You move slightly forward, measure height change, and estimate slope.

Numerical Gradient Formula

We approximate the gradient using:

(f(w + ε) − f(w − ε)) / (2ε)

Where ε is a very small number like 0.00001.

Step 1: Define a Simple Loss Function

We start with a simple squared error loss.

def loss(w):
    x = 0.5
    y = 1
    prediction = w * x
    return (prediction - y) ** 2

This keeps calculations clear and focused.

Step 2: Compute Numerical Gradient

We approximate the gradient numerically.

epsilon = 1e-5
w = 0.4

grad_approx = (loss(w + epsilon) - loss(w - epsilon)) / (2 * epsilon)
grad_approx

This gives us the slope estimate.

Step 3: Compute Analytical Gradient

Now compute the gradient manually.

x = 0.5
y = 1
prediction = w * x

grad_analytic = 2 * (prediction - y) * x
grad_analytic

This gradient comes from calculus.

Step 4: Compare the Gradients

We compare both values.

If they are very close, the implementation is correct.

Small numerical differences are normal, but large differences indicate bugs.

When Should You Use Gradient Checking?

Gradient checking is slow.

So it is used only:

• During development • For small models • For debugging

It is NEVER used during real training.

Why Frameworks Rarely Need This

Libraries like TensorFlow and PyTorch are well-tested.

But when you:

• Write custom loss functions • Implement research ideas • Modify backprop logic

Gradient checking becomes essential.

Real-World Insight

Many research papers fail not because the idea is bad, but because gradients are wrong.

Gradient checking saves weeks of confusion.

Mini Practice

Try changing:

• Weight values • Epsilon size

Observe how numerical gradients behave.

Exercises

Exercise 1:
Why should epsilon be very small?

To closely approximate the true derivative.

Exercise 2:
Why is gradient checking slow?

It requires multiple forward passes for each parameter.

Quick Quiz

Q1. What does gradient checking compare?

Numerical gradients and analytical gradients.

Q2. Should gradient checking be used during training?

No, it is only for debugging.

In the next lesson, we move into a new phase of Deep Learning — Convolutional Neural Networks (CNNs) and understand why they work so well with images.

← Previous Lesson DL Index Next ➜