ML Lesson 42 – Optimizers | Dataplexa

Optimizers

In the previous lesson, we learned about loss functions and how they measure how wrong a model’s predictions are.

Now we answer an important question: how does the model actually reduce this loss?

The answer lies in Optimizers. Optimizers control how model parameters are updated during training.

What Is an Optimizer?

An optimizer is an algorithm that adjusts model weights to minimize the loss function.

After each prediction, the optimizer decides how much and in which direction weights should change.

Without an optimizer, a model cannot learn.

How Optimizers Work (Intuition)

Imagine standing on a mountain and trying to reach the lowest point.

Each step you take should move you downhill.

The loss function describes the landscape, and the optimizer decides how to take each step.

Too big a step can overshoot. Too small a step can be very slow.

Learning Rate

The learning rate controls how large each weight update is.

A very high learning rate can make training unstable.

A very low learning rate can make training extremely slow.

Choosing the right learning rate is one of the most important decisions in training a model.

Gradient Descent

Gradient Descent is the most basic optimizer.

It computes the gradient of the loss with respect to each weight and moves weights in the opposite direction.

This method works well but can be slow for large datasets.

Stochastic Gradient Descent (SGD)

Stochastic Gradient Descent updates weights using small batches of data.

This makes training faster and helps the model escape local minima.

SGD is widely used in large-scale machine learning.

from sklearn.linear_model import SGDClassifier

model = SGDClassifier(
    loss="log_loss",
    learning_rate="optimal"
)

model.fit(X_train, y_train)

Adam Optimizer

Adam is one of the most popular optimizers used in deep learning.

It adapts the learning rate for each parameter automatically.

This makes Adam faster and more stable than basic gradient descent.

Because of its reliability, Adam is often the default choice.

Optimizers and Our Dataset

Using the Dataplexa ML dataset, different optimizers may converge at different speeds.

Some may reach good performance quickly, while others take more iterations.

Choosing the right optimizer improves both training time and final accuracy.

When to Change Optimizers

Optimizers are not one-size-fits-all.

If training is unstable, the optimizer or learning rate may need adjustment.

Monitoring loss curves helps decide when changes are required.

Mini Practice

Train the same model using different learning rates and observe how fast loss decreases.

This experiment builds intuition about optimizer behavior.

Exercises

Exercise 1:
Why is the learning rate important?

It controls how large each weight update is during training.

Exercise 2:
Why is Adam popular in deep learning?

Because it adapts learning rates automatically and converges faster.

Quick Quiz

Q1. Can a bad optimizer prevent a model from learning?

Yes. Poor optimization can cause slow or unstable training.

In the next lesson, we explore Transfer Learning, which allows models to reuse knowledge from previously trained systems.

← Previous Lesson ML Index Next ➜