ResNet (Residual Networks)
In the previous lesson, we studied VGG architecture and learned how increasing depth improves learning capacity.
However, very deep networks introduced a serious and unexpected problem.
This lesson explains that problem and how ResNet solved it completely.
The Problem with Very Deep Networks
Intuitively, adding more layers should make a network stronger.
But in practice, researchers observed something strange:
As networks became very deep, training accuracy started to degrade, not improve.
This was not overfitting. It was a training problem.
Why Depth Became a Problem
The main issue was vanishing gradients.
During backpropagation, gradients become smaller as they travel through many layers.
Eventually, earlier layers receive almost no learning signal.
This makes deep networks hard or impossible to train.
Another Hidden Problem: Degradation
Even when vanishing gradients were reduced using better activations and normalization, a new problem appeared.
Deeper networks performed worse than shallower ones, even on training data.
This is called degradation problem.
More layers actually hurt performance.
The Key Insight Behind ResNet
Researchers asked a simple question:
What if deeper layers do nothing useful?
Ideally, extra layers should learn an identity function — meaning they should not change the input.
But learning identity mappings turned out to be difficult.
Residual Learning Concept
Instead of learning the full mapping directly, ResNet learns the residual.
Mathematically:
Instead of learning:
H(x)
The network learns:
F(x) = H(x) − x
And then reconstructs:
H(x) = F(x) + x
What Is a Residual Connection?
A residual connection simply adds the input directly to the output of a block.
This is often called a skip connection.
It allows gradients to flow directly through the network.
Residual Block Structure
A basic residual block works like this:
Input → Conv → ReLU → Conv → Add Input → ReLU
If the convolutions fail to learn, the block behaves like an identity function.
Why Residual Connections Work
Residual connections:
Preserve gradient flow Prevent degradation Make optimization easier Allow very deep networks
With ResNet, networks with 50, 101, or even 152 layers became trainable.
ResNet in Code (Conceptual)
Below is a simplified residual block using Keras.
from tensorflow.keras.layers import Conv2D, Add, ReLU
def residual_block(x, filters):
shortcut = x
x = Conv2D(filters, (3,3), padding="same")(x)
x = ReLU()(x)
x = Conv2D(filters, (3,3), padding="same")(x)
x = Add()([x, shortcut])
x = ReLU()(x)
return x
This small idea changed deep learning forever.
Different ResNet Variants
Popular versions include:
ResNet-18 ResNet-34 ResNet-50 ResNet-101 ResNet-152
The number indicates the total depth of the network.
Bottleneck Blocks
For very deep models, ResNet introduced bottleneck blocks.
They use 1×1 convolutions to reduce and restore dimensions, making computation efficient.
This allows extremely deep networks without exploding cost.
Real-World Analogy
Think of learning as studying from notes.
Residual connections are like keeping the original notes while adding improvements.
If new notes are useless, you still have the original knowledge.
Why ResNet Changed Everything
Most modern architectures — DenseNet, EfficientNet, Transformers — are inspired by residual connections.
ResNet proved that depth is powerful when designed correctly.
Mini Practice
Think carefully:
Why does learning F(x) make optimization easier than learning H(x) directly?
Exercises
Exercise 1:
What problem does ResNet primarily solve?
Exercise 2:
What is the role of skip connections?
Quick Quiz
Q1. What does ResNet learn instead of H(x)?
Q2. Why are very deep networks hard to train without ResNet?
In the next lesson, we will explore Transfer Learning and see how pre-trained ResNet models are reused in real-world applications.