Computer Vision Lesson 27 – Convolutions | Dataplexa

Convolutions and Filters

In the previous lesson, you learned that CNNs work by learning features automatically. Now we zoom into the most important operation behind that learning: convolution.

If you truly understand convolutions, CNNs will never feel confusing again.

What Is a Convolution?

A convolution is an operation where a small matrix called a filter (or kernel) slides across an image to extract useful patterns.

Instead of looking at the entire image at once, the network looks at small local regions.

This is how CNNs detect edges, corners, textures, and shapes.

What Is a Filter (Kernel)?

A filter is a small grid of numbers, usually of size:

3 × 3
5 × 5
7 × 7

Each filter is designed to respond strongly to a particular pattern.

Examples of patterns:

Vertical edges
Horizontal edges
Diagonal edges
Textures

In CNNs, these filters are learned automatically, not manually defined.

How Convolution Works (Conceptually)

The process follows these steps:

Place the filter on top of a small image region
Multiply corresponding values
Sum the results
Move the filter to the next position

Each position produces a single number. All these numbers together form a new image called a feature map.

Why Local Regions Matter

Objects are made of local patterns.

For example:

An eye is part of a face
A wheel is part of a car
Edges form shapes

CNNs exploit this idea by learning locally, then combining those local patterns at deeper layers.

Stride – How the Filter Moves

Stride defines how many pixels the filter moves at each step.

Stride = 1 → move one pixel at a time
Stride = 2 → skip every other pixel

Larger stride means:

Smaller output feature map
Less computation

Stride controls the balance between detail and efficiency.

Padding – Handling Image Borders

When filters slide near the image edges, information can be lost.

Padding solves this by adding extra pixels (usually zeros) around the image.

Valid padding: no padding
Same padding: output size equals input size

Padding helps preserve spatial dimensions.

Feature Maps – The Output of Convolution

Each filter produces one feature map.

If a convolution layer has:

32 filters → 32 feature maps
64 filters → 64 feature maps

Each map highlights where a specific pattern appears in the image.

Do Filters Learn Automatically?

Yes.

CNNs start with random filter values. During training, the network:

Sees many images
Adjusts filter weights
Keeps patterns that help prediction

This is why CNNs adapt well to different tasks.

Multiple Filters = Rich Understanding

One filter learns one type of pattern.

Many filters together learn:

Edges
Textures
Object parts

This diversity makes CNNs powerful.

Is This Coding or Theory?

This lesson is conceptual with visual reasoning.

You are building a mental model of:

What filters do
Why convolution works
How CNNs extract information

Actual convolution code comes next.

Practice Questions

Q1. What is the purpose of a convolution filter?

To detect specific local patterns in an image.

Q2. What happens when stride increases?

The output feature map becomes smaller and computation reduces.

Q3. Why is padding useful?

It prevents loss of information near image borders.

Homework / Thinking Exercise

Imagine sliding a 3×3 window over a photo
Think about what patterns that window can capture
Relate it to edges and textures

This thinking directly maps to real CNN behavior.

Quick Recap

Convolution extracts local features
Filters slide across images
Stride controls movement
Padding preserves size
Feature maps represent learned patterns

Next lesson: Pooling Layers — how CNNs reduce size without losing meaning.

← Previous Course Index Next →