ML Lesson 31 – Hyperparameter Tuning | Dataplexa

Hyperparameter Tuning

In the previous lesson, we learned how to engineer powerful features. We reshaped raw data into meaningful signals that machine learning models can understand more easily.

Now we take the next step. Even with good features, models do not perform their best by default. They need careful configuration. This is where Hyperparameter Tuning comes in.


What Are Hyperparameters?

Every machine learning model has settings that control how it learns. These settings are called hyperparameters.

Hyperparameters are not learned from data. They are chosen before training begins. Examples include learning rate, number of trees, maximum depth, or regularization strength.

Think of hyperparameters as the knobs on a machine. If they are set poorly, the model underperforms. If they are set well, performance improves significantly.


Why Hyperparameter Tuning Is Important

A model with default settings rarely gives optimal results.

Poor hyperparameter choices can cause underfitting, overfitting, or unstable predictions.

Hyperparameter tuning allows us to balance bias and variance and extract the best possible performance from our engineered features.


Hyperparameters vs Model Parameters

It is important to understand the difference.

Model parameters are learned from data during training. Examples include weights and coefficients.

Hyperparameters control how the learning process happens. They guide the training process itself.


Our Dataset and Model Context

We continue using the same dataset:

Dataplexa ML Housing & Customer Dataset

Our task remains loan approval prediction.

We will demonstrate hyperparameter tuning using a Logistic Regression model.


Training a Baseline Model

Before tuning, we first train a baseline model with default hyperparameters.

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import pandas as pd

df = pd.read_csv("dataplexa_ml_housing_customer_dataset.csv")

X = df.drop("loan_approved", axis=1)
y = df["loan_approved"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy_score(y_test, y_pred)

This gives us a reference point. We now know how the model performs without tuning.


Tuning Regularization Strength

Logistic Regression uses regularization to prevent overfitting.

The hyperparameter C controls the strength of regularization.

Smaller values of C mean stronger regularization. Larger values mean weaker regularization.

model = LogisticRegression(C=0.1, max_iter=1000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy_score(y_test, y_pred)

Changing this single hyperparameter can significantly impact model accuracy.


Real-World Perspective

In production systems, hyperparameter tuning is not optional.

Banks, e-commerce platforms, and healthcare systems tune models carefully to reduce errors and improve reliability.

Well-tuned simple models often outperform poorly tuned complex models.


Mini Practice

Change the value of C to 1, 10, and 50.

Observe how accuracy changes. Think about why stronger or weaker regularization affects performance.


Exercises

Exercise 1:
Why are hyperparameters not learned from data?

Because they control the learning process itself and must be set before training.

Exercise 2:
Can hyperparameter tuning reduce overfitting?

Yes. Proper tuning balances model complexity and generalization.

Quick Quiz

Q1. Does higher model complexity always mean better performance?

No. Poor hyperparameter choices can degrade performance.

In the next lesson, we will learn Grid Search, a systematic way to find optimal hyperparameters.