DL Lesson 20 – Epochs,Batches & LR | Dataplexa

Epochs, Batch Size, and Learning Rate

In the previous lesson, we learned how hyperparameters influence the behavior of a deep learning model.

In this lesson, we focus on three hyperparameters that directly control how training progresses over time: epochs, batch size, and learning rate.

Understanding the relationship between these three is essential for building stable and efficient deep learning models.


What Is an Epoch?

An epoch means one complete pass of the entire dataset through the neural network.

If your dataset contains 10,000 samples, one epoch means the model has seen all 10,000 samples once.

Training usually requires multiple epochs because the model cannot learn all patterns in a single pass.

However, too many epochs can cause the model to memorize the data instead of learning general patterns.


What Is Batch Size?

Batch size determines how many samples are processed before the model updates its weights.

If batch size is 32, the model updates its weights after seeing every 32 samples.

Smaller batch sizes introduce more noise but often improve generalization. Larger batch sizes are more stable but may lead to poorer generalization.

This is why batch size selection is both a performance and a generalization decision.


What Is Learning Rate?

Learning rate controls how large each update step is when adjusting weights.

A high learning rate makes large jumps toward the minimum, which can cause instability.

A low learning rate makes very small updates, which can slow training significantly.

Choosing the right learning rate is often more important than choosing the network architecture.


How These Three Work Together

Epochs determine how long the model trains. Batch size determines how often updates happen. Learning rate determines how big each update is.

Changing one often requires adjusting the others. For example, increasing batch size often allows a slightly higher learning rate.

This interaction is why deep learning training is more about balance than fixed rules.


Using Our Dataset

We will continue using the same dataset throughout this deep learning module.

Download link (if you have not already):

⬇ Download Deep Learning Practice Dataset


Loading Data for Training

import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv("dataplexa_deep_learning_master_dataset.csv")

X = df.drop("target", axis=1)
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

At this stage, our focus is not on perfect preprocessing, but on understanding training behavior.


Training With Different Epoch Values

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model = Sequential([
    Dense(64, activation="relu", input_shape=(X_train.shape[1],)),
    Dense(1, activation="sigmoid")
])

model.compile(
    optimizer="adam",
    loss="binary_crossentropy",
    metrics=["accuracy"]
)

history = model.fit(
    X_train,
    y_train,
    epochs=10,
    batch_size=32,
    validation_split=0.2
)

If training accuracy keeps increasing while validation accuracy decreases, it is a sign of overfitting.


Effect of Batch Size

Let us change only the batch size and observe behavior.

history = model.fit(
    X_train,
    y_train,
    epochs=10,
    batch_size=128,
    validation_split=0.2
)

Larger batch sizes produce smoother learning curves but may converge to less optimal solutions.


Effect of Learning Rate

Now we adjust the learning rate explicitly.

from tensorflow.keras.optimizers import Adam

optimizer = Adam(learning_rate=0.01)

model.compile(
    optimizer=optimizer,
    loss="binary_crossentropy",
    metrics=["accuracy"]
)

If the loss becomes unstable or oscillates, the learning rate is likely too high.


Real-World Insight

In industry, engineers rarely train once and accept the result.

They experiment with epochs, batch size, and learning rate together until the model trains smoothly and generalizes well.

This experimentation process is a core deep learning skill.


Mini Practice

If your model trains very fast but produces poor validation accuracy, which hyperparameter would you investigate first?


Exercises

Exercise 1:
What happens if we train for too many epochs?

The model may overfit and lose generalization ability.

Exercise 2:
Why do smaller batch sizes sometimes generalize better?

Because noisy updates help the model avoid sharp minima.

Quick Quiz

Q1. Does increasing batch size reduce the number of updates?

Yes. Larger batches mean fewer weight updates per epoch.

Q2. Which hyperparameter directly controls step size?

Learning rate.

In the next lesson, we will introduce early stopping and learn how to automatically prevent overtraining using validation signals.