AI Lesson 39 – Overfitting & Underfitting | Dataplexa

Overfitting vs Underfitting

In machine learning, building a model is not just about achieving high accuracy. A good model must perform well on both training data and unseen real-world data. This balance is where overfitting and underfitting come into play.

Understanding these two concepts helps you build reliable, production-ready AI systems instead of models that fail outside the notebook.

What Is Underfitting?

Underfitting happens when a model is too simple to capture patterns in the data. The model fails to learn important relationships and performs poorly on both training and test datasets.

Model is too simple
High bias
Poor training accuracy
Poor test accuracy

Real-World Underfitting Example

Imagine predicting house prices using only the number of bedrooms. Important factors like location, size, and condition are ignored. The predictions will be inaccurate even on known data.

What Is Overfitting?

Overfitting occurs when a model learns the training data too well, including noise and random fluctuations. It performs very well on training data but poorly on new, unseen data.

Model is too complex
Low bias but high variance
Very high training accuracy
Low test accuracy

Real-World Overfitting Example

Suppose a student memorizes answers instead of understanding concepts. They score high in practice tests but fail in the actual exam. This is similar to overfitting.

Visual Understanding

Underfitting misses the trend completely. Overfitting follows every tiny fluctuation. A good model captures the true pattern without noise.

Code Example: Demonstrating Overfitting

Let’s see how increasing model complexity can cause overfitting using polynomial regression.


import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([1, 4, 9, 16, 25])

poly = PolynomialFeatures(degree=4)
X_poly = poly.fit_transform(X)

model = LinearRegression()
model.fit(X_poly, y)

predictions = model.predict(X_poly)
print(mean_squared_error(y, predictions))

0.0

A zero error on training data indicates possible overfitting, especially with small datasets.

Code Example: Underfitting Case

Using a linear model for non-linear data causes underfitting.


model_simple = LinearRegression()
model_simple.fit(X, y)

pred_simple = model_simple.predict(X)
print(mean_squared_error(y, pred_simple))

18.0

The high error shows the model failed to capture the true relationship.

How to Detect Overfitting and Underfitting

Compare training and test performance
Use validation data
Monitor learning curves

How to Fix Underfitting

Increase model complexity
Add more features
Reduce regularization

How to Fix Overfitting

Use more training data
Apply regularization
Reduce model complexity
Use cross-validation

Practice Questions

Practice 1: What do we call a model that performs poorly on both training and test data?

Practice 2: A model with high training accuracy but low test accuracy is called?

Practice 3: Overfitting is associated with high ________.

Quick Quiz

Quiz 1: A model memorizes training data but fails on new data.

Underfitting
Overfitting
Balanced

Quiz 2: Underfitting is mainly caused by high?

Variance
Bias
Noise

Quiz 3: Which technique helps detect overfitting?

Normalization
Cross Validation
Scaling

Coming up next: Cross Validation — building models that generalize well.

← Previous Course Index Next →

AI Course