AI Lesson 44 – Forward Pass & Backpropagation | Dataplexa

Overfitting and Underfitting

Overfitting and underfitting describe two common problems that occur when training machine learning models. Both lead to poor performance, but for opposite reasons.

A good model should learn meaningful patterns from data and generalize well to unseen examples. When this balance is lost, overfitting or underfitting occurs.

What Is Underfitting?

Underfitting happens when a model is too simple to capture the underlying pattern in the data. It fails to learn important relationships and performs poorly on both training and test data.

Model is too simple
High bias
Poor performance everywhere

What Is Overfitting?

Overfitting happens when a model learns the training data too well, including noise and random fluctuations. It performs very well on training data but poorly on new, unseen data.

Model is too complex
High variance
Great training accuracy, poor test accuracy

Real-World Connection

Imagine preparing for an exam by memorizing answers instead of understanding concepts. You may score well on practice questions but fail when questions change. This is exactly how overfitting works.

On the other hand, skimming only headlines without studying details is like underfitting — you never learn enough to succeed.

Visual Intuition

Underfitting looks like a straight line trying to fit curved data. Overfitting looks like a wildly zigzag curve trying to touch every data point.

Underfitting Example (Linear Model)


from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np

X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1, 4, 9, 16, 25])

model = LinearRegression()
model.fit(X, y)

predictions = model.predict(X)
print(mean_squared_error(y, predictions))

6.8

A linear model cannot capture the square relationship in the data, resulting in underfitting.

Overfitting Example (High Degree Polynomial)


from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

poly_model = make_pipeline(
    PolynomialFeatures(degree=10),
    LinearRegression()
)

poly_model.fit(X, y)
predictions = poly_model.predict(X)

print(mean_squared_error(y, predictions))

0.0

The model fits training data perfectly but will likely fail on new inputs.

Bias vs Variance

High bias: Underfitting
High variance: Overfitting
Goal is to find the right balance

How to Fix Underfitting

Use a more complex model
Add more features
Train longer

How to Fix Overfitting

Collect more data
Reduce model complexity
Apply regularization
Use cross-validation

Practice Questions

Practice 1: When a model is too simple, it is called?

Practice 2: When a model memorizes training data, it is called?

Practice 3: Underfitting is associated with high?

Quick Quiz

Quiz 1: Overfitting is associated with high?

Bias
Variance
Noise

Quiz 2: A good model should maximize?

Memorization
Generalization
Complexity

Quiz 3: Which technique helps detect overfitting?

Scaling
Cross Validation
Encoding

Coming up next: Cross-Validation — evaluating models more reliably.

← Previous Course Index Next →

AI Course