DL Lesson 32 – Convolution Operation | Dataplexa

Convolution Operation

In the previous lesson, we understood why Convolutional Neural Networks are essential for image-based learning.

Now it is time to understand the core mathematical operation that makes CNNs powerful — convolution.

This lesson explains convolution slowly, logically, and practically.


What Is Convolution?

Convolution is a mathematical operation used to extract meaningful patterns from data.

In CNNs, convolution helps detect important visual features such as edges, curves, and textures.

Instead of learning from raw pixels, the model learns from patterns inside pixels.


The Filter (Kernel)

A convolution operation uses a small matrix called a filter or kernel.

Common filter sizes include:

3 × 3 5 × 5 7 × 7

These filters slide over the image and perform element-wise multiplication.


How Convolution Works (Conceptually)

At each position:

• The filter overlaps part of the image • Corresponding values are multiplied • Results are summed • A single output value is produced

This value represents how strongly that pattern appears in the image region.


Simple Numerical Example

Consider a small 3 × 3 image region and a 3 × 3 filter.

Image region:
[1  2  3
 4  5  6
 7  8  9]

Filter:
[1  0 -1
 1  0 -1
 1  0 -1]

We multiply corresponding values and sum the results.

(1×1) + (2×0) + (3×-1)
+ (4×1) + (5×0) + (6×-1)
+ (7×1) + (8×0) + (9×-1)

This computation highlights vertical edge patterns.


Why Filters Detect Edges

Filters are designed to emphasize differences between neighboring pixels.

Edges represent sharp changes in intensity, so convolution naturally detects them.

Early CNN layers mainly learn:

• Horizontal edges • Vertical edges • Simple curves


Multiple Filters, Multiple Features

A CNN does not use just one filter.

It uses dozens or hundreds of filters in a single layer.

Each filter learns a different pattern:

• One detects edges • Another detects textures • Another detects shapes

Each filter produces its own feature map.


Convolution in Code (Basic Example)

Let us see how convolution is defined in a deep learning framework.

from tensorflow.keras.layers import Conv2D

conv_layer = Conv2D(
    filters=32,
    kernel_size=(3,3),
    activation='relu'
)

Here:

• 32 filters are learned • Each filter is 3 × 3 • ReLU adds non-linearity

You do not manually design filters — the network learns them automatically.


Stride: How the Filter Moves

Stride controls how far the filter moves at each step.

A stride of 1 means:

The filter moves one pixel at a time.

Larger strides reduce output size and computation.


Padding: Preserving Image Size

Padding adds extra pixels around the image border.

This helps:

• Preserve spatial dimensions • Avoid losing edge information

Common padding types:

• Valid (no padding) • Same (output size = input size)


Why Convolution Is Efficient

Convolution drastically reduces the number of parameters.

Instead of millions of weights, CNNs reuse the same filter weights across the image.

This makes CNNs:

• Faster • More memory-efficient • Better at generalization


Mini Practice

Think about this:

Why would using a 3 × 3 filter be better than a 10 × 10 filter in early layers?


Exercises

Exercise 1:
What is the purpose of a convolution filter?

To detect specific patterns such as edges or textures in an image.

Exercise 2:
Why are multiple filters used in a CNN layer?

Each filter learns a different visual feature.

Quick Quiz

Q1. What does stride control?

How far the filter moves across the image at each step.

Q2. Does the CNN manually define filter values?

No, the network learns filter values during training.

In the next lesson, we will explore how CNNs reduce spatial dimensions using pooling layers, and why pooling improves robustness.