Gradient Descent
In the previous lesson, we explored how hashing is applied in real-world systems such as authentication, caching, and databases.
Now we move into a very important optimization algorithm — Gradient Descent.
Gradient Descent is the backbone of modern Machine Learning, Deep Learning, and many optimization problems in algorithms.
What Is Gradient Desent?
Gradient Descent is an algorithm used to minimize a function.
In simple terms, it helps us find the lowest point of a curve or surface.
That lowest point usually represents:
- Minimum error
- Lowest cost
- Best solution
Intuition: Walking Down a Hill
Imagine you are standing on a mountain at night and want to reach the lowest point in the valley.
You cannot see the entire path.
So you take small steps in the direction where the slope goes down.
That is exactly how Gradient Descent works.
The Mathematical Idea (Simple)
Every function has a slope.
The slope tells us:
- How steep the curve is
- Which direction to move
Gradient Descent repeatedly moves opposite to the slope until the function value stops decreasing.
Basic Gradient Descent Formula
The update rule is:
new_value = old_value - learning_rate * gradient
Each part has a meaning:
- Gradient → direction of steepest increase
- Learning rate → step size
- Minus sign → move downhill
Why Learning Rate Matters
The learning rate controls how big each step is.
If it is too small, learning becomes very slow.
If it is too large, the algorithm may overshoot and never converge.
# Too small
learning_rate = 0.0001
# Reasonable
learning_rate = 0.01
# Too large
learning_rate = 1.0
Simple Example: Minimizing a Function
Consider this function:
f(x) = x²
Its minimum value occurs at x = 0.
Let us apply Gradient Descent.
x = 10
learning_rate = 0.1
for i in range(10):
gradient = 2 * x
x = x - learning_rate * gradient
print(x)
With each iteration, x moves closer to zero.
Where Gradient Descent Is Used
Gradient Descent is used everywhere:
- Linear Regression
- Logistic Regression
- Neural Networks
- Deep Learning
Without Gradient Descent, modern AI systems would not exist.
Real-World Example
Think of training a recommendation system.
The system makes predictions, calculates error, and Gradient Descent adjusts parameters to reduce that error step by step.
This process repeats millions of times.
Common Problems with Gradient Descent
Gradient Descent is powerful but not perfect.
- Can get stuck in local minima
- Sensitive to learning rate
- Slow for large datasets
Later lessons will improve on this using advanced variants.
Exercises
Exercise 1:
What happens if the learning rate is too large?
Exercise 2:
Why do we subtract the gradient?
Exercise 3:
What does Gradient Descent try to minimize?
Quick Quiz
Q1. What does the gradient represent?
Q2. Why is Gradient Descent important?
In the next lesson, we will extend this idea and learn Stochastic and Mini-Batch Gradient Descent, which solve scalability issues.