Computer Vision Lesson 40 – YOLO Basics | Dataplexa

YOLO Basics – You Only Look Once

In the previous lesson, you learned that object detection models can be divided into two-stage and one-stage detectors.

YOLO belongs to the one-stage family and completely changed how real-time object detection works.

This lesson explains what YOLO is, how it works internally, and why it is so fast, using clear intuition instead of equations.


What Is YOLO?

YOLO stands for You Only Look Once.

The name itself explains the idea:

  • The model looks at the image only once
  • It predicts all objects in a single forward pass

Unlike two-stage detectors, YOLO does not:

  • Generate region proposals
  • Run classification multiple times

Everything happens together, end-to-end.


Why YOLO Was Revolutionary

Before YOLO, detection models were accurate but slow.

YOLO introduced three major advantages:

  • Real-time speed
  • Simple architecture
  • Global understanding of the image

This made object detection practical for:

  • Live video
  • Autonomous vehicles
  • Surveillance cameras
  • Drones and robots

The Core Idea Behind YOLO

YOLO treats object detection as a single regression problem.

Instead of asking:

  • “Is there an object here?”
  • “What about here?”

YOLO asks:

“Given this image, directly predict all bounding boxes and class probabilities.”


Image Grid Concept

YOLO divides the image into a grid of equal-sized cells.

Each grid cell is responsible for detecting objects whose center lies inside that cell.

For every grid cell, YOLO predicts:

  • Bounding box coordinates
  • Object confidence score
  • Class probabilities

What Does a YOLO Cell Predict?

Each cell predicts:

  • x, y: center of bounding box
  • w, h: width and height
  • Confidence: probability an object exists
  • Class scores: what object it is

All predictions are made simultaneously.


Single Forward Pass Explained Simply

YOLO runs the image through a CNN once.

That single pass outputs:

  • All boxes
  • All classes
  • All confidences

No second stage. No repeated scanning.

That is why YOLO is fast.


YOLO vs Two-Stage Detectors

Aspect YOLO Two-Stage Models
Detection passes One Multiple
Speed Very fast Slower
Architecture Simple, end-to-end Complex pipeline
Real-time use Yes Usually no

Confidence Score in YOLO

The confidence score reflects two things:

  • Is there an object?
  • How accurate is the bounding box?

Low-confidence boxes are filtered out during post-processing.


Non-Maximum Suppression (NMS)

YOLO often predicts multiple boxes for the same object.

Non-Maximum Suppression solves this by:

  • Keeping the box with highest confidence
  • Removing overlapping lower-confidence boxes

This produces clean final detections.


Strengths of YOLO

  • Extremely fast inference
  • End-to-end training
  • Global image context
  • Well-suited for real-time video

Limitations of Early YOLO Versions

Early YOLO models struggled with:

  • Small objects
  • Closely packed objects
  • Precise localization

Modern YOLO versions have improved these significantly.


Where YOLO Is Commonly Used

  • Traffic monitoring
  • Security cameras
  • Retail analytics
  • Sports analytics
  • Robotics vision systems

Practice Questions

Q1. Why is YOLO faster than two-stage detectors?

Because YOLO predicts all objects in a single forward pass without region proposals.

Q2. What does “You Only Look Once” mean?

The model processes the image once to detect all objects at the same time.

Q3. What role does Non-Maximum Suppression play?

It removes duplicate overlapping boxes and keeps the most confident detection.

Mini Assignment

Think about a live camera system.

  • Would speed or accuracy matter more?
  • Why would YOLO be a good choice?

This kind of reasoning is expected in interviews.


Quick Recap

  • YOLO is a one-stage detector
  • It predicts all objects in one pass
  • Uses grid-based prediction
  • Extremely fast and practical
  • Foundation for modern real-time detection

Next lesson: YOLOv5 & YOLOv8 – Architecture and Improvements.