Loss Functions
In the previous lesson, we learned how neural networks make predictions using forward propagation and improve themselves using backpropagation.
But there is a critical question we have not answered yet:
How does the network know whether its prediction is good or bad?
The answer lies in loss functions.
What Is a Loss Function? 🎯
A loss function is a mathematical formula that measures how far the model’s prediction is from the actual correct answer.
It converts prediction quality into a single number.
👉 Smaller loss = better model
👉 Larger loss = worse model
Neural networks do not understand words like “good” or “bad”. They only understand numbers — and loss functions provide exactly that.
Real-World Intuition
Imagine you are throwing darts at a dartboard 🎯.
The distance between where the dart lands and the center of the board is the loss.
Your goal is not to throw once perfectly, but to keep adjusting your throws so the distance gets smaller every time.
That adjustment process is learning.
Why Loss Functions Matter So Much
Loss functions directly control how the network learns.
If the loss function is poorly chosen, even a powerful neural network will learn the wrong thing.
Different problems require different loss functions. A classifier and a price predictor should not be punished in the same way.
Common Types of Loss Functions
Let’s understand the most important loss functions used in Deep Learning.
1️⃣ Mean Squared Error (MSE)
Mean Squared Error is commonly used for regression problems (where outputs are continuous numbers).
It squares the difference between actual and predicted values.
Loss = (Actual - Predicted)²
Squaring ensures:
• Negative errors don’t cancel positive ones
• Larger mistakes are punished more
📌 Example: House price prediction. Being off by $50,000 hurts much more than being off by $5,000.
2️⃣ Mean Absolute Error (MAE)
MAE measures the absolute difference between predicted and actual values.
Loss = |Actual - Predicted|
Unlike MSE, MAE treats all errors equally.
This makes MAE more robust to outliers, but less sensitive to large mistakes.
3️⃣ Binary Cross-Entropy
Binary Cross-Entropy is used for binary classification problems — yes/no, true/false, spam/not spam.
It measures how confident the model is about the correct class.
Loss = -[y*log(p) + (1-y)*log(1-p)]
If the model is confidently wrong, the loss becomes very large.
This strongly pushes the network to correct its mistakes.
4️⃣ Categorical Cross-Entropy
When there are more than two classes (such as image classification), we use categorical cross-entropy.
It compares the predicted probability distribution with the true distribution.
This loss function powers most modern image and language models.
Loss Functions in Real Training Code 🧠
In practice, loss functions are provided by deep learning libraries.
# Example (conceptual)
model.compile(
optimizer="adam",
loss="categorical_crossentropy",
metrics=["accuracy"]
)
Behind the scenes, this loss value drives backpropagation and weight updates.
Mini Practice 🤔
If you are predicting medical diagnoses, which is more dangerous:
• Being slightly wrong many times
• Being extremely wrong once
Think how loss functions influence this behavior.
Exercises
Exercise 1:
What is the main purpose of a loss function?
Exercise 2:
Why does MSE punish large errors more?
Exercise 3:
Which loss function is best for binary classification?
Quick Quiz ⚡
Q1. What happens if loss is zero?
Q2. Does changing loss function change learning behavior?
In the next lesson, we will explore gradient descent variants and see how loss values are minimized efficiently during training.