Epochs, Batch Size, and Learning Rate
In the previous lesson, we learned how hyperparameters influence the behavior of a deep learning model.
In this lesson, we focus on three hyperparameters that directly control how training progresses over time: epochs, batch size, and learning rate.
Understanding the relationship between these three is essential for building stable and efficient deep learning models.
What Is an Epoch?
An epoch means one complete pass of the entire dataset through the neural network.
If your dataset contains 10,000 samples, one epoch means the model has seen all 10,000 samples once.
Training usually requires multiple epochs because the model cannot learn all patterns in a single pass.
However, too many epochs can cause the model to memorize the data instead of learning general patterns.
What Is Batch Size?
Batch size determines how many samples are processed before the model updates its weights.
If batch size is 32, the model updates its weights after seeing every 32 samples.
Smaller batch sizes introduce more noise but often improve generalization. Larger batch sizes are more stable but may lead to poorer generalization.
This is why batch size selection is both a performance and a generalization decision.
What Is Learning Rate?
Learning rate controls how large each update step is when adjusting weights.
A high learning rate makes large jumps toward the minimum, which can cause instability.
A low learning rate makes very small updates, which can slow training significantly.
Choosing the right learning rate is often more important than choosing the network architecture.
How These Three Work Together
Epochs determine how long the model trains. Batch size determines how often updates happen. Learning rate determines how big each update is.
Changing one often requires adjusting the others. For example, increasing batch size often allows a slightly higher learning rate.
This interaction is why deep learning training is more about balance than fixed rules.
Using Our Dataset
We will continue using the same dataset throughout this deep learning module.
Download link (if you have not already):
⬇ Download Deep Learning Practice Dataset
Loading Data for Training
import pandas as pd
from sklearn.model_selection import train_test_split
df = pd.read_csv("dataplexa_deep_learning_master_dataset.csv")
X = df.drop("target", axis=1)
y = df["target"]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
At this stage, our focus is not on perfect preprocessing, but on understanding training behavior.
Training With Different Epoch Values
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential([
Dense(64, activation="relu", input_shape=(X_train.shape[1],)),
Dense(1, activation="sigmoid")
])
model.compile(
optimizer="adam",
loss="binary_crossentropy",
metrics=["accuracy"]
)
history = model.fit(
X_train,
y_train,
epochs=10,
batch_size=32,
validation_split=0.2
)
If training accuracy keeps increasing while validation accuracy decreases, it is a sign of overfitting.
Effect of Batch Size
Let us change only the batch size and observe behavior.
history = model.fit(
X_train,
y_train,
epochs=10,
batch_size=128,
validation_split=0.2
)
Larger batch sizes produce smoother learning curves but may converge to less optimal solutions.
Effect of Learning Rate
Now we adjust the learning rate explicitly.
from tensorflow.keras.optimizers import Adam
optimizer = Adam(learning_rate=0.01)
model.compile(
optimizer=optimizer,
loss="binary_crossentropy",
metrics=["accuracy"]
)
If the loss becomes unstable or oscillates, the learning rate is likely too high.
Real-World Insight
In industry, engineers rarely train once and accept the result.
They experiment with epochs, batch size, and learning rate together until the model trains smoothly and generalizes well.
This experimentation process is a core deep learning skill.
Mini Practice
If your model trains very fast but produces poor validation accuracy, which hyperparameter would you investigate first?
Exercises
Exercise 1:
What happens if we train for too many epochs?
Exercise 2:
Why do smaller batch sizes sometimes generalize better?
Quick Quiz
Q1. Does increasing batch size reduce the number of updates?
Q2. Which hyperparameter directly controls step size?
In the next lesson, we will introduce early stopping and learn how to automatically prevent overtraining using validation signals.