Bias–Variance Tradeoff
In the previous lesson, we learned how cross-validation helps us evaluate models more reliably.
Now we come to one of the most important ideas in Machine Learning: the Bias–Variance Tradeoff.
This concept explains why some models underfit, why some overfit, and how we decide the right level of model complexity.
What Is Bias?
Bias refers to errors caused by overly simple assumptions in a model.
A high-bias model does not learn enough from the data. It assumes the data follows a very simple pattern, even when reality is more complex.
As a result, the model performs poorly on both training data and unseen data.
Real-World Example of High Bias
Suppose we predict house purchase decisions using only one rule:
“If income is high, then purchase; otherwise, no purchase.”
This ignores many other important factors like location, house size, and customer preferences.
The model is too simple — it has high bias.
What Is Variance?
Variance refers to how sensitive a model is to changes in the training data.
A high-variance model learns too much detail, including noise and random fluctuations.
It performs very well on training data but poorly on unseen data.
Real-World Example of High Variance
Imagine a model that creates extremely specific rules for each customer in the dataset.
It memorizes training examples instead of learning general patterns.
Such a model fails when a new customer behaves slightly differently.
Bias and Variance Together
Bias and variance are opposite forces.
High bias → underfitting High variance → overfitting
The goal of Machine Learning is to find a balance where both bias and variance are reasonably low.
Using Our Dataset
We continue using the same dataset throughout this ML module:
Dataplexa ML Housing & Customer Dataset
A simple model may ignore useful features (high bias).
A very complex model may memorize customer behavior (high variance).
Choosing the right complexity allows the model to generalize well to new customers.
Visual Intuition (Conceptual)
Think of shooting arrows at a target.
High bias: All arrows miss the target in the same direction.
High variance: Arrows are scattered all over the place.
Low bias and low variance: Arrows are tightly clustered near the center.
How We Control the Tradeoff
In later lessons, we will learn practical techniques to manage this balance:
Choosing the right model Using cross-validation Regularization techniques Feature selection More training data
These tools help us reduce overfitting without oversimplifying the model.
Why This Concept Is Critical
Many ML failures happen not because of bad algorithms, but because of poor bias–variance balance.
Understanding this concept allows you to diagnose problems and improve models intelligently.
Mini Practice
Think about our dataset.
Ask yourself:
What happens if we remove many features? What happens if we add too many complex rules?
Exercises
Exercise 1:
What does high bias indicate?
Exercise 2:
What does high variance indicate?
Exercise 3:
Why is balancing bias and variance important?
Quick Quiz
Q1. Can a model have both high bias and high variance?
Q2. Is bias–variance tradeoff solved automatically?
In the next lesson, we will learn Evaluation Metrics, which help us measure how good our models actually are.