Computer Vision Lesson 39 – Detection Models | Dataplexa

Object Detection Models – An Overview

Until now, we focused on models that answer a simple question: “What is present in the image?”

Object Detection answers a harder and more useful question: “What objects are present, where are they located, and what are they?”

This lesson builds a strong foundation for understanding YOLO, SSD, Faster R-CNN, and modern detection systems.


What Is Object Detection?

Object Detection is a Computer Vision task that combines classification and localization.

For every object in an image, the model must:

  • Identify the object class (car, person, dog, etc.)
  • Draw a bounding box around the object
  • Assign confidence to the prediction

Unlike image classification, multiple objects can exist in one image.


How Object Detection Is Different from Image Classification

Task Output Example
Image Classification Single label “This image contains a dog”
Object Detection Multiple labels + boxes “Dog here, person there”

Detection models must understand both what and where.


Key Components of Object Detection

Every object detection model predicts three things:

  • Bounding Box: location of the object
  • Class Label: what the object is
  • Confidence Score: how sure the model is

Bounding boxes are usually represented as:

  • (x, y, width, height)
  • or (x_min, y_min, x_max, y_max)

Bounding Boxes Explained Simply

A bounding box is a rectangle drawn tightly around an object.

Good detection models:

  • Place boxes accurately
  • Avoid overlapping duplicate boxes
  • Ignore background noise

Poor boxes reduce real-world usability.


Traditional vs Deep Learning Detection

Earlier detection methods used handcrafted features. Modern systems rely on deep learning.

Approach Technique Limitations
Traditional HOG + Sliding Window Slow, less accurate
Deep Learning CNN-based detectors Needs more data

All modern object detectors are CNN-based.


Two Main Categories of Detection Models

Object detection models fall into two broad categories.


1. Two-Stage Detectors (Accuracy First)

These models work in two steps:

  • Step 1: Propose candidate object regions
  • Step 2: Classify and refine bounding boxes

Examples:

  • R-CNN
  • Fast R-CNN
  • Faster R-CNN

Strengths:

  • High accuracy
  • Precise localization

Weakness:

  • Slower inference speed

2. One-Stage Detectors (Speed First)

These models detect objects in a single forward pass.

They directly predict:

  • Bounding boxes
  • Class probabilities

Examples:

  • YOLO family
  • SSD
  • RetinaNet

Strengths:

  • Very fast
  • Real-time capable

Weakness:

  • Slightly lower accuracy (historically)

Accuracy vs Speed Trade-off

There is always a trade-off.

Two-stage models → Higher accuracy, slower speed
One-stage models → Faster speed, slightly less precision

Modern YOLO models have reduced this gap significantly.


Why Object Detection Is Hard

Detection is challenging because:

  • Objects vary in size and shape
  • Multiple objects overlap
  • Lighting and background vary
  • Small objects are difficult to detect

Good detectors learn scale, context, and spatial relationships.


Where Object Detection Is Used

  • Autonomous driving
  • Surveillance systems
  • Retail analytics
  • Medical imaging
  • Robotics and drones

Any system that must “see and react” uses detection.


Evaluation Metrics for Detection

Accuracy alone is not enough.

Common metrics:

  • Intersection over Union (IoU)
  • Precision and Recall
  • mAP (mean Average Precision)

We will study these in detail in later lessons.


Practice Questions

Q1. What makes object detection harder than image classification?

Because the model must identify both object class and precise location for multiple objects.

Q2. Name one two-stage and one one-stage detector.

Two-stage: Faster R-CNN, One-stage: YOLO.

Q3. Why are one-stage detectors preferred for real-time systems?

Because they perform detection in a single pass, making them faster.

Mini Assignment

Think about a real-world problem and decide:

  • Is accuracy more important or speed?
  • Would you choose a one-stage or two-stage model?

This decision-making skill is critical in interviews and projects.


Quick Recap

  • Object detection finds what and where
  • Bounding boxes localize objects
  • Two-stage models prioritize accuracy
  • One-stage models prioritize speed
  • Detection powers many real-world systems

Next lesson: YOLO Basics – How Real-Time Detection Works.