Computer Vision Lesson 27 – Convolutions | Dataplexa

Convolutions and Filters

In the previous lesson, you learned that CNNs work by learning features automatically. Now we zoom into the most important operation behind that learning: convolution.

If you truly understand convolutions, CNNs will never feel confusing again.


What Is a Convolution?

A convolution is an operation where a small matrix called a filter (or kernel) slides across an image to extract useful patterns.

Instead of looking at the entire image at once, the network looks at small local regions.

This is how CNNs detect edges, corners, textures, and shapes.


What Is a Filter (Kernel)?

A filter is a small grid of numbers, usually of size:

  • 3 × 3
  • 5 × 5
  • 7 × 7

Each filter is designed to respond strongly to a particular pattern.

Examples of patterns:

  • Vertical edges
  • Horizontal edges
  • Diagonal edges
  • Textures

In CNNs, these filters are learned automatically, not manually defined.


How Convolution Works (Conceptually)

The process follows these steps:

  • Place the filter on top of a small image region
  • Multiply corresponding values
  • Sum the results
  • Move the filter to the next position

Each position produces a single number. All these numbers together form a new image called a feature map.


Why Local Regions Matter

Objects are made of local patterns.

For example:

  • An eye is part of a face
  • A wheel is part of a car
  • Edges form shapes

CNNs exploit this idea by learning locally, then combining those local patterns at deeper layers.


Stride – How the Filter Moves

Stride defines how many pixels the filter moves at each step.

  • Stride = 1 → move one pixel at a time
  • Stride = 2 → skip every other pixel

Larger stride means:

  • Smaller output feature map
  • Less computation

Stride controls the balance between detail and efficiency.


Padding – Handling Image Borders

When filters slide near the image edges, information can be lost.

Padding solves this by adding extra pixels (usually zeros) around the image.

  • Valid padding: no padding
  • Same padding: output size equals input size

Padding helps preserve spatial dimensions.


Feature Maps – The Output of Convolution

Each filter produces one feature map.

If a convolution layer has:

  • 32 filters → 32 feature maps
  • 64 filters → 64 feature maps

Each map highlights where a specific pattern appears in the image.


Do Filters Learn Automatically?

Yes.

CNNs start with random filter values. During training, the network:

  • Sees many images
  • Adjusts filter weights
  • Keeps patterns that help prediction

This is why CNNs adapt well to different tasks.


Multiple Filters = Rich Understanding

One filter learns one type of pattern.

Many filters together learn:

  • Edges
  • Textures
  • Object parts

This diversity makes CNNs powerful.


Is This Coding or Theory?

This lesson is conceptual with visual reasoning.

You are building a mental model of:

  • What filters do
  • Why convolution works
  • How CNNs extract information

Actual convolution code comes next.


Practice Questions

Q1. What is the purpose of a convolution filter?

To detect specific local patterns in an image.

Q2. What happens when stride increases?

The output feature map becomes smaller and computation reduces.

Q3. Why is padding useful?

It prevents loss of information near image borders.

Homework / Thinking Exercise

  • Imagine sliding a 3×3 window over a photo
  • Think about what patterns that window can capture
  • Relate it to edges and textures

This thinking directly maps to real CNN behavior.


Quick Recap

  • Convolution extracts local features
  • Filters slide across images
  • Stride controls movement
  • Padding preserves size
  • Feature maps represent learned patterns

Next lesson: Pooling Layers — how CNNs reduce size without losing meaning.