DL Lesson 6 – Loss Functions | Dataplexa

Loss Functions

In the previous lesson, we learned how neural networks make predictions using forward propagation and improve themselves using backpropagation.

But there is a critical question we have not answered yet:

How does the network know whether its prediction is good or bad?

The answer lies in loss functions.

What Is a Loss Function? 🎯

A loss function is a mathematical formula that measures how far the model’s prediction is from the actual correct answer.

It converts prediction quality into a single number.

👉 Smaller loss = better model
👉 Larger loss = worse model

Neural networks do not understand words like “good” or “bad”. They only understand numbers — and loss functions provide exactly that.

Real-World Intuition

Imagine you are throwing darts at a dartboard 🎯.

The distance between where the dart lands and the center of the board is the loss.

Your goal is not to throw once perfectly, but to keep adjusting your throws so the distance gets smaller every time.

That adjustment process is learning.

Why Loss Functions Matter So Much

Loss functions directly control how the network learns.

If the loss function is poorly chosen, even a powerful neural network will learn the wrong thing.

Different problems require different loss functions. A classifier and a price predictor should not be punished in the same way.

Common Types of Loss Functions

Let’s understand the most important loss functions used in Deep Learning.

1️⃣ Mean Squared Error (MSE)

Mean Squared Error is commonly used for regression problems (where outputs are continuous numbers).

It squares the difference between actual and predicted values.

Loss = (Actual - Predicted)²

Squaring ensures:

• Negative errors don’t cancel positive ones
• Larger mistakes are punished more

📌 Example: House price prediction. Being off by $50,000 hurts much more than being off by $5,000.

2️⃣ Mean Absolute Error (MAE)

MAE measures the absolute difference between predicted and actual values.

Loss = |Actual - Predicted|

Unlike MSE, MAE treats all errors equally.

This makes MAE more robust to outliers, but less sensitive to large mistakes.

3️⃣ Binary Cross-Entropy

Binary Cross-Entropy is used for binary classification problems — yes/no, true/false, spam/not spam.

It measures how confident the model is about the correct class.

Loss = -[y*log(p) + (1-y)*log(1-p)]

If the model is confidently wrong, the loss becomes very large.

This strongly pushes the network to correct its mistakes.

4️⃣ Categorical Cross-Entropy

When there are more than two classes (such as image classification), we use categorical cross-entropy.

It compares the predicted probability distribution with the true distribution.

This loss function powers most modern image and language models.

Loss Functions in Real Training Code 🧠

In practice, loss functions are provided by deep learning libraries.

# Example (conceptual)
model.compile(
    optimizer="adam",
    loss="categorical_crossentropy",
    metrics=["accuracy"]
)

Behind the scenes, this loss value drives backpropagation and weight updates.

Mini Practice 🤔

If you are predicting medical diagnoses, which is more dangerous:

• Being slightly wrong many times
• Being extremely wrong once

Think how loss functions influence this behavior.

Exercises

Exercise 1:
What is the main purpose of a loss function?

To quantify how wrong the model’s predictions are.

Exercise 2:
Why does MSE punish large errors more?

Because errors are squared, amplifying large differences.

Exercise 3:
Which loss function is best for binary classification?

Binary Cross-Entropy.

Quick Quiz ⚡

Q1. What happens if loss is zero?

The model’s predictions are perfectly correct.

Q2. Does changing loss function change learning behavior?

Yes. Loss functions directly guide how the model updates weights.

In the next lesson, we will explore gradient descent variants and see how loss values are minimized efficiently during training.

← Previous Lesson DL Index Next ➜