Computer Vision Lesson 44 – Instance Segmentation | Dataplexa

Instance Segmentation – Separating Individual Objects

So far, you have learned how semantic segmentation labels every pixel. But it has one important limitation.

It does not distinguish between different instances of the same class.

Instance segmentation solves this problem by answering a more precise question:

Which pixel belongs to which object?


Why Semantic Segmentation Is Not Enough

Imagine an image with five people.

Semantic segmentation will label all of them as:

“person”

But it will not tell:

  • Which pixels belong to person 1
  • Which pixels belong to person 2

For many real-world problems, this is not sufficient.


What Is Instance Segmentation?

Instance segmentation assigns:

  • A class label
  • A unique object identity
  • A pixel-accurate mask

Each object is treated as a separate instance, even if multiple objects belong to the same class.


Semantic vs Instance Segmentation

Aspect Semantic Segmentation Instance Segmentation
Class labels Yes Yes
Object identity No Yes
Separates same-class objects No Yes
Mask per object No Yes

How Instance Segmentation Thinks

Instance segmentation combines ideas from:

  • Object detection
  • Semantic segmentation

Conceptually, it works in three steps:

  • Find objects (bounding boxes)
  • Classify each object
  • Create a pixel-level mask for each object

This makes it more complex than semantic segmentation.


Why Instance Segmentation Is Harder

Instance segmentation must solve:

  • Overlapping objects
  • Objects touching each other
  • Different sizes and shapes

The model must understand:

  • What is foreground vs background
  • Where one object ends and another begins

Real-World Example

Consider a street scene:

  • 10 cars
  • 5 pedestrians
  • 2 bicycles

Semantic segmentation gives:

  • Car pixels
  • Person pixels
  • Bicycle pixels

Instance segmentation gives:

  • Car #1, Car #2, …
  • Person #1, Person #2, …
  • Bicycle #1, Bicycle #2

Why Instance Segmentation Matters

Many applications require object-level understanding:

  • Autonomous driving
  • Robotics manipulation
  • Medical image analysis
  • Video tracking

Without instance segmentation, these systems fail.


Instance Segmentation vs Object Detection

Object detection draws boxes.

Instance segmentation goes further.

Feature Object Detection Instance Segmentation
Bounding boxes Yes Yes
Pixel-level masks No Yes
Object separation Partial Accurate

Popular Models for Instance Segmentation

Several architectures have been proposed:

  • Mask R-CNN
  • YOLACT
  • Detectron-based models

Among these, Mask R-CNN became the most influential.

That is why the next lesson focuses entirely on it.


How Output Looks Conceptually

The output of instance segmentation includes:

  • Bounding box
  • Class label
  • Binary mask for each object

Each object has its own mask, independent of others.


Common Mistakes Beginners Make

  • Confusing instance segmentation with semantic segmentation
  • Thinking bounding boxes are enough
  • Ignoring overlapping objects

Understanding this distinction is critical for interviews and real projects.


Practice Questions

Q1. Why can’t semantic segmentation separate multiple people in an image?

Because it assigns the same class label to all pixels without object identity.

Q2. What additional information does instance segmentation provide?

It provides a separate pixel-level mask for each individual object.

Q3. Which task combines object detection and segmentation?

Instance segmentation.

Mini Assignment

Think of a supermarket shelf image.

  • Why is instance segmentation better than detection?
  • Why is semantic segmentation insufficient?

Answer conceptually.


Quick Recap

  • Semantic segmentation labels pixels by class
  • Instance segmentation separates objects of the same class
  • Each object gets its own mask
  • Used in autonomous driving, robotics, medicine
  • Foundation for Mask R-CNN

Next lesson: Mask R-CNN – Instance Segmentation in Practice.