Convolutions and Filters
In the previous lesson, you learned that CNNs work by learning features automatically. Now we zoom into the most important operation behind that learning: convolution.
If you truly understand convolutions, CNNs will never feel confusing again.
What Is a Convolution?
A convolution is an operation where a small matrix called a filter (or kernel) slides across an image to extract useful patterns.
Instead of looking at the entire image at once, the network looks at small local regions.
This is how CNNs detect edges, corners, textures, and shapes.
What Is a Filter (Kernel)?
A filter is a small grid of numbers, usually of size:
- 3 × 3
- 5 × 5
- 7 × 7
Each filter is designed to respond strongly to a particular pattern.
Examples of patterns:
- Vertical edges
- Horizontal edges
- Diagonal edges
- Textures
In CNNs, these filters are learned automatically, not manually defined.
How Convolution Works (Conceptually)
The process follows these steps:
- Place the filter on top of a small image region
- Multiply corresponding values
- Sum the results
- Move the filter to the next position
Each position produces a single number. All these numbers together form a new image called a feature map.
Why Local Regions Matter
Objects are made of local patterns.
For example:
- An eye is part of a face
- A wheel is part of a car
- Edges form shapes
CNNs exploit this idea by learning locally, then combining those local patterns at deeper layers.
Stride – How the Filter Moves
Stride defines how many pixels the filter moves at each step.
- Stride = 1 → move one pixel at a time
- Stride = 2 → skip every other pixel
Larger stride means:
- Smaller output feature map
- Less computation
Stride controls the balance between detail and efficiency.
Padding – Handling Image Borders
When filters slide near the image edges, information can be lost.
Padding solves this by adding extra pixels (usually zeros) around the image.
- Valid padding: no padding
- Same padding: output size equals input size
Padding helps preserve spatial dimensions.
Feature Maps – The Output of Convolution
Each filter produces one feature map.
If a convolution layer has:
- 32 filters → 32 feature maps
- 64 filters → 64 feature maps
Each map highlights where a specific pattern appears in the image.
Do Filters Learn Automatically?
Yes.
CNNs start with random filter values. During training, the network:
- Sees many images
- Adjusts filter weights
- Keeps patterns that help prediction
This is why CNNs adapt well to different tasks.
Multiple Filters = Rich Understanding
One filter learns one type of pattern.
Many filters together learn:
- Edges
- Textures
- Object parts
This diversity makes CNNs powerful.
Is This Coding or Theory?
This lesson is conceptual with visual reasoning.
You are building a mental model of:
- What filters do
- Why convolution works
- How CNNs extract information
Actual convolution code comes next.
Practice Questions
Q1. What is the purpose of a convolution filter?
Q2. What happens when stride increases?
Q3. Why is padding useful?
Homework / Thinking Exercise
- Imagine sliding a 3×3 window over a photo
- Think about what patterns that window can capture
- Relate it to edges and textures
This thinking directly maps to real CNN behavior.
Quick Recap
- Convolution extracts local features
- Filters slide across images
- Stride controls movement
- Padding preserves size
- Feature maps represent learned patterns
Next lesson: Pooling Layers — how CNNs reduce size without losing meaning.