Convolution Operation
In the previous lesson, we understood why Convolutional Neural Networks are essential for image-based learning.
Now it is time to understand the core mathematical operation that makes CNNs powerful — convolution.
This lesson explains convolution slowly, logically, and practically.
What Is Convolution?
Convolution is a mathematical operation used to extract meaningful patterns from data.
In CNNs, convolution helps detect important visual features such as edges, curves, and textures.
Instead of learning from raw pixels, the model learns from patterns inside pixels.
The Filter (Kernel)
A convolution operation uses a small matrix called a filter or kernel.
Common filter sizes include:
3 × 3 5 × 5 7 × 7
These filters slide over the image and perform element-wise multiplication.
How Convolution Works (Conceptually)
At each position:
• The filter overlaps part of the image • Corresponding values are multiplied • Results are summed • A single output value is produced
This value represents how strongly that pattern appears in the image region.
Simple Numerical Example
Consider a small 3 × 3 image region and a 3 × 3 filter.
Image region:
[1 2 3
4 5 6
7 8 9]
Filter:
[1 0 -1
1 0 -1
1 0 -1]
We multiply corresponding values and sum the results.
(1×1) + (2×0) + (3×-1)
+ (4×1) + (5×0) + (6×-1)
+ (7×1) + (8×0) + (9×-1)
This computation highlights vertical edge patterns.
Why Filters Detect Edges
Filters are designed to emphasize differences between neighboring pixels.
Edges represent sharp changes in intensity, so convolution naturally detects them.
Early CNN layers mainly learn:
• Horizontal edges • Vertical edges • Simple curves
Multiple Filters, Multiple Features
A CNN does not use just one filter.
It uses dozens or hundreds of filters in a single layer.
Each filter learns a different pattern:
• One detects edges • Another detects textures • Another detects shapes
Each filter produces its own feature map.
Convolution in Code (Basic Example)
Let us see how convolution is defined in a deep learning framework.
from tensorflow.keras.layers import Conv2D
conv_layer = Conv2D(
filters=32,
kernel_size=(3,3),
activation='relu'
)
Here:
• 32 filters are learned • Each filter is 3 × 3 • ReLU adds non-linearity
You do not manually design filters — the network learns them automatically.
Stride: How the Filter Moves
Stride controls how far the filter moves at each step.
A stride of 1 means:
The filter moves one pixel at a time.
Larger strides reduce output size and computation.
Padding: Preserving Image Size
Padding adds extra pixels around the image border.
This helps:
• Preserve spatial dimensions • Avoid losing edge information
Common padding types:
• Valid (no padding) • Same (output size = input size)
Why Convolution Is Efficient
Convolution drastically reduces the number of parameters.
Instead of millions of weights, CNNs reuse the same filter weights across the image.
This makes CNNs:
• Faster • More memory-efficient • Better at generalization
Mini Practice
Think about this:
Why would using a 3 × 3 filter be better than a 10 × 10 filter in early layers?
Exercises
Exercise 1:
What is the purpose of a convolution filter?
Exercise 2:
Why are multiple filters used in a CNN layer?
Quick Quiz
Q1. What does stride control?
Q2. Does the CNN manually define filter values?
In the next lesson, we will explore how CNNs reduce spatial dimensions using pooling layers, and why pooling improves robustness.