ML Lesson 11 – Overfitting & Underfitting | Dataplexa

Overfitting and Underfitting

In the previous lesson, we learned how probability helps models handle uncertainty. Now we answer a very common and very important question: why does a model sometimes perform well in training but fail in real life?

The answer lies in two concepts: underfitting and overfitting.


What Is Underfitting?

Underfitting happens when a model is too simple to capture the pattern in the data.

Such a model does not learn enough. It performs poorly on both training data and unseen data.

Think of it like this:

Trying to explain house prices using only one feature, when many factors influence the decision.


Real-World Example of Underfitting

Imagine predicting house purchase decisions using only customer age.

Age alone cannot explain income, location, or house size. The model misses important information.

As a result, predictions are inaccurate everywhere.


What Is Overfitting?

Overfitting happens when a model learns the training data too well.

It memorizes noise, small fluctuations, and exceptions instead of learning the true pattern.

Such a model performs extremely well on training data but poorly on new, unseen data.


Real-World Example of Overfitting

Imagine a student who memorizes answers instead of understanding concepts.

They score very high in practice tests but fail when questions are slightly changed.

That is exactly how an overfitted model behaves.


Using Our Dataset to Understand the Problem

We continue using the same dataset throughout the ML module:

Dataplexa ML Housing & Customer Dataset

If we build a very complex model with too many rules, it may perfectly classify the training data but fail on new customers.

If we build a very simple model, it may miss important relationships altogether.


Training Error vs Testing Error

Underfitting:

High training error High testing error

Overfitting:

Low training error High testing error

The goal of ML is to find the balance between the two.


Why Overfitting Is Dangerous

An overfitted model looks impressive during development.

But once deployed, it makes unreliable predictions in real-world conditions.

This leads to poor business decisions and loss of trust.


Why Underfitting Is Also a Problem

An underfitted model is too weak to be useful.

It fails to learn meaningful patterns, making it ineffective even on simple tasks.


How We Control Overfitting and Underfitting

Later in this course, we will learn techniques such as:

Cross-validation Regularization Feature selection Proper model complexity

These techniques help models generalize well.


Mini Practice

Think about our dataset.

Ask yourself:

Would a model using only one feature underfit? Would a model with too many parameters overfit?


Exercises

Exercise 1:
What is underfitting?

Underfitting occurs when a model is too simple to capture patterns in the data.

Exercise 2:
What is overfitting?

Overfitting occurs when a model memorizes training data instead of learning general patterns.

Exercise 3:
Why is overfitting dangerous?

Because the model performs poorly on new, unseen data.

Quick Quiz

Q1. Can a model have low training error and high test error?

Yes. That is a sign of overfitting.

Q2. Is underfitting solved by adding more data alone?

Not always. Model complexity and feature quality also matter.

In the next lesson, we will learn about Train/Test Split, which is the first practical tool used to detect overfitting and underfitting.

1800-dataplexa-ml-portal/