Overfitting and Underfitting
In the previous lesson, we learned how probability helps models handle uncertainty. Now we answer a very common and very important question: why does a model sometimes perform well in training but fail in real life?
The answer lies in two concepts: underfitting and overfitting.
What Is Underfitting?
Underfitting happens when a model is too simple to capture the pattern in the data.
Such a model does not learn enough. It performs poorly on both training data and unseen data.
Think of it like this:
Trying to explain house prices using only one feature, when many factors influence the decision.
Real-World Example of Underfitting
Imagine predicting house purchase decisions using only customer age.
Age alone cannot explain income, location, or house size. The model misses important information.
As a result, predictions are inaccurate everywhere.
What Is Overfitting?
Overfitting happens when a model learns the training data too well.
It memorizes noise, small fluctuations, and exceptions instead of learning the true pattern.
Such a model performs extremely well on training data but poorly on new, unseen data.
Real-World Example of Overfitting
Imagine a student who memorizes answers instead of understanding concepts.
They score very high in practice tests but fail when questions are slightly changed.
That is exactly how an overfitted model behaves.
Using Our Dataset to Understand the Problem
We continue using the same dataset throughout the ML module:
Dataplexa ML Housing & Customer Dataset
If we build a very complex model with too many rules, it may perfectly classify the training data but fail on new customers.
If we build a very simple model, it may miss important relationships altogether.
Training Error vs Testing Error
Underfitting:
High training error High testing error
Overfitting:
Low training error High testing error
The goal of ML is to find the balance between the two.
Why Overfitting Is Dangerous
An overfitted model looks impressive during development.
But once deployed, it makes unreliable predictions in real-world conditions.
This leads to poor business decisions and loss of trust.
Why Underfitting Is Also a Problem
An underfitted model is too weak to be useful.
It fails to learn meaningful patterns, making it ineffective even on simple tasks.
How We Control Overfitting and Underfitting
Later in this course, we will learn techniques such as:
Cross-validation Regularization Feature selection Proper model complexity
These techniques help models generalize well.
Mini Practice
Think about our dataset.
Ask yourself:
Would a model using only one feature underfit? Would a model with too many parameters overfit?
Exercises
Exercise 1:
What is underfitting?
Exercise 2:
What is overfitting?
Exercise 3:
Why is overfitting dangerous?
Quick Quiz
Q1. Can a model have low training error and high test error?
Q2. Is underfitting solved by adding more data alone?
In the next lesson, we will learn about Train/Test Split, which is the first practical tool used to detect overfitting and underfitting.