Computer Vision Lesson 40 – YOLO Basics | Dataplexa

YOLO Basics – You Only Look Once

In the previous lesson, you learned that object detection models can be divided into two-stage and one-stage detectors.

YOLO belongs to the one-stage family and completely changed how real-time object detection works.

This lesson explains what YOLO is, how it works internally, and why it is so fast, using clear intuition instead of equations.

What Is YOLO?

YOLO stands for You Only Look Once.

The name itself explains the idea:

The model looks at the image only once
It predicts all objects in a single forward pass

Unlike two-stage detectors, YOLO does not:

Generate region proposals
Run classification multiple times

Everything happens together, end-to-end.

Why YOLO Was Revolutionary

Before YOLO, detection models were accurate but slow.

YOLO introduced three major advantages:

Real-time speed
Simple architecture
Global understanding of the image

This made object detection practical for:

Live video
Autonomous vehicles
Surveillance cameras
Drones and robots

The Core Idea Behind YOLO

YOLO treats object detection as a single regression problem.

Instead of asking:

“Is there an object here?”
“What about here?”

YOLO asks:

“Given this image, directly predict all bounding boxes and class probabilities.”

Image Grid Concept

YOLO divides the image into a grid of equal-sized cells.

Each grid cell is responsible for detecting objects whose center lies inside that cell.

For every grid cell, YOLO predicts:

Bounding box coordinates
Object confidence score
Class probabilities

What Does a YOLO Cell Predict?

Each cell predicts:

x, y: center of bounding box
w, h: width and height
Confidence: probability an object exists
Class scores: what object it is

All predictions are made simultaneously.

Single Forward Pass Explained Simply

YOLO runs the image through a CNN once.

That single pass outputs:

All boxes
All classes
All confidences

No second stage. No repeated scanning.

That is why YOLO is fast.

YOLO vs Two-Stage Detectors

Aspect	YOLO	Two-Stage Models
Detection passes	One	Multiple
Speed	Very fast	Slower
Architecture	Simple, end-to-end	Complex pipeline
Real-time use	Yes	Usually no

Confidence Score in YOLO

The confidence score reflects two things:

Is there an object?
How accurate is the bounding box?

Low-confidence boxes are filtered out during post-processing.

Non-Maximum Suppression (NMS)

YOLO often predicts multiple boxes for the same object.

Non-Maximum Suppression solves this by:

Keeping the box with highest confidence
Removing overlapping lower-confidence boxes

This produces clean final detections.

Strengths of YOLO

Extremely fast inference
End-to-end training
Global image context
Well-suited for real-time video

Limitations of Early YOLO Versions

Early YOLO models struggled with:

Small objects
Closely packed objects
Precise localization

Modern YOLO versions have improved these significantly.

Where YOLO Is Commonly Used

Traffic monitoring
Security cameras
Retail analytics
Sports analytics
Robotics vision systems

Practice Questions

Q1. Why is YOLO faster than two-stage detectors?

Because YOLO predicts all objects in a single forward pass without region proposals.

Q2. What does “You Only Look Once” mean?

The model processes the image once to detect all objects at the same time.

Q3. What role does Non-Maximum Suppression play?

It removes duplicate overlapping boxes and keeps the most confident detection.

Mini Assignment

Think about a live camera system.

Would speed or accuracy matter more?
Why would YOLO be a good choice?

This kind of reasoning is expected in interviews.

Quick Recap

YOLO is a one-stage detector
It predicts all objects in one pass
Uses grid-based prediction
Extremely fast and practical
Foundation for modern real-time detection

Next lesson: YOLOv5 & YOLOv8 – Architecture and Improvements.

← Previous Course Index Next →