Bias and Variance
In the previous lesson, we learned how weight initialization affects signal flow and training stability.
Now we step into one of the most important ideas that explains why some models fail, even when training code looks correct.
This idea is called the bias–variance tradeoff.
What Is Bias in Deep Learning?
Bias represents how much a model is making strong assumptions about the data.
A high-bias model is too simple. It cannot capture the real patterns present in the data.
Such a model performs poorly even on the training data.
This situation is known as underfitting.
In deep learning, high bias usually occurs when:
The network is too shallow, has too few neurons, or uses overly restrictive assumptions.
What Is Variance in Deep Learning?
Variance represents how much a model is sensitive to small changes in the data.
A high-variance model learns the training data too well, including noise and random fluctuations.
It performs very well on training data but poorly on unseen data.
This situation is known as overfitting.
In deep learning, high variance usually occurs when:
The network is very deep, has many parameters, or is trained for too long without regularization.
Why Bias and Variance Matter Together
Bias and variance are not independent.
Reducing bias often increases variance, and reducing variance often increases bias.
This creates a tradeoff.
The goal of deep learning is not to minimize bias or variance alone, but to balance both.
A well-trained model lies in the middle — complex enough to learn patterns, but simple enough to generalize.
Bias–Variance Through a Real-World Lens
Imagine teaching mathematics to students.
If you explain only basic formulas, students cannot solve real problems. This is high bias.
If you teach extremely advanced techniques immediately, students memorize solutions without understanding. This is high variance.
Effective teaching balances structure and flexibility. Deep learning works the same way.
Bias and Variance in Neural Networks
In neural networks, bias and variance are influenced by:
Model depth, number of neurons, activation functions, training duration, and regularization methods.
For example, a shallow network with few neurons has high bias and low variance.
A very deep network with many parameters has low bias but high variance.
How Training Curves Reveal Bias and Variance
Training and validation loss curves are powerful diagnostic tools.
If both training and validation loss are high, the model has high bias.
If training loss is low but validation loss is high, the model has high variance.
Deep learning practitioners rely heavily on these curves to guide model design.
Controlling Bias and Variance
Bias can be reduced by:
Increasing model capacity, adding layers, or using better feature representations.
Variance can be reduced by:
Regularization, dropout, early stopping, and more data.
In upcoming lessons, we will study these techniques in detail and apply them systematically.
Simple Code Illustration
In practice, we control model complexity directly in code.
from tensorflow.keras.layers import Dense
# Higher bias model
Dense(16, activation="relu")
# Lower bias, higher variance model
Dense(256, activation="relu")
This single design choice can drastically change model behavior.
Exercises
Exercise 1:
What happens when a model has high bias?
Exercise 2:
Why does increasing model complexity increase variance?
Quick Quiz
Q1. What is the main goal of the bias–variance tradeoff?
Q2. Which usually increases variance: deeper models or shallower models?
In the next lesson, we will connect bias and variance directly to overfitting and underfitting and introduce concrete techniques to control them during training.