DL Lesson 40 – ResNet | Dataplexa

ResNet (Residual Networks)

In the previous lesson, we studied VGG architecture and learned how increasing depth improves learning capacity.

However, very deep networks introduced a serious and unexpected problem.

This lesson explains that problem and how ResNet solved it completely.


The Problem with Very Deep Networks

Intuitively, adding more layers should make a network stronger.

But in practice, researchers observed something strange:

As networks became very deep, training accuracy started to degrade, not improve.

This was not overfitting. It was a training problem.


Why Depth Became a Problem

The main issue was vanishing gradients.

During backpropagation, gradients become smaller as they travel through many layers.

Eventually, earlier layers receive almost no learning signal.

This makes deep networks hard or impossible to train.


Another Hidden Problem: Degradation

Even when vanishing gradients were reduced using better activations and normalization, a new problem appeared.

Deeper networks performed worse than shallower ones, even on training data.

This is called degradation problem.

More layers actually hurt performance.


The Key Insight Behind ResNet

Researchers asked a simple question:

What if deeper layers do nothing useful?

Ideally, extra layers should learn an identity function — meaning they should not change the input.

But learning identity mappings turned out to be difficult.


Residual Learning Concept

Instead of learning the full mapping directly, ResNet learns the residual.

Mathematically:

Instead of learning:

H(x)

The network learns:

F(x) = H(x) − x

And then reconstructs:

H(x) = F(x) + x


What Is a Residual Connection?

A residual connection simply adds the input directly to the output of a block.

This is often called a skip connection.

It allows gradients to flow directly through the network.


Residual Block Structure

A basic residual block works like this:

Input → Conv → ReLU → Conv → Add Input → ReLU

If the convolutions fail to learn, the block behaves like an identity function.


Why Residual Connections Work

Residual connections:

Preserve gradient flow Prevent degradation Make optimization easier Allow very deep networks

With ResNet, networks with 50, 101, or even 152 layers became trainable.


ResNet in Code (Conceptual)

Below is a simplified residual block using Keras.

from tensorflow.keras.layers import Conv2D, Add, ReLU

def residual_block(x, filters):
    shortcut = x

    x = Conv2D(filters, (3,3), padding="same")(x)
    x = ReLU()(x)

    x = Conv2D(filters, (3,3), padding="same")(x)
    x = Add()([x, shortcut])
    x = ReLU()(x)

    return x

This small idea changed deep learning forever.


Different ResNet Variants

Popular versions include:

ResNet-18 ResNet-34 ResNet-50 ResNet-101 ResNet-152

The number indicates the total depth of the network.


Bottleneck Blocks

For very deep models, ResNet introduced bottleneck blocks.

They use 1×1 convolutions to reduce and restore dimensions, making computation efficient.

This allows extremely deep networks without exploding cost.


Real-World Analogy

Think of learning as studying from notes.

Residual connections are like keeping the original notes while adding improvements.

If new notes are useless, you still have the original knowledge.


Why ResNet Changed Everything

Most modern architectures — DenseNet, EfficientNet, Transformers — are inspired by residual connections.

ResNet proved that depth is powerful when designed correctly.


Mini Practice

Think carefully:

Why does learning F(x) make optimization easier than learning H(x) directly?


Exercises

Exercise 1:
What problem does ResNet primarily solve?

The degradation problem in very deep networks.

Exercise 2:
What is the role of skip connections?

They allow gradients and information to flow directly across layers.

Quick Quiz

Q1. What does ResNet learn instead of H(x)?

The residual function F(x).

Q2. Why are very deep networks hard to train without ResNet?

Because gradients vanish and optimization becomes unstable.

In the next lesson, we will explore Transfer Learning and see how pre-trained ResNet models are reused in real-world applications.