Hyperparameter Tuning
In the previous lesson, we learned how to engineer powerful features. We reshaped raw data into meaningful signals that machine learning models can understand more easily.
Now we take the next step. Even with good features, models do not perform their best by default. They need careful configuration. This is where Hyperparameter Tuning comes in.
What Are Hyperparameters?
Every machine learning model has settings that control how it learns. These settings are called hyperparameters.
Hyperparameters are not learned from data. They are chosen before training begins. Examples include learning rate, number of trees, maximum depth, or regularization strength.
Think of hyperparameters as the knobs on a machine. If they are set poorly, the model underperforms. If they are set well, performance improves significantly.
Why Hyperparameter Tuning Is Important
A model with default settings rarely gives optimal results.
Poor hyperparameter choices can cause underfitting, overfitting, or unstable predictions.
Hyperparameter tuning allows us to balance bias and variance and extract the best possible performance from our engineered features.
Hyperparameters vs Model Parameters
It is important to understand the difference.
Model parameters are learned from data during training. Examples include weights and coefficients.
Hyperparameters control how the learning process happens. They guide the training process itself.
Our Dataset and Model Context
We continue using the same dataset:
Dataplexa ML Housing & Customer Dataset
Our task remains loan approval prediction.
We will demonstrate hyperparameter tuning using a Logistic Regression model.
Training a Baseline Model
Before tuning, we first train a baseline model with default hyperparameters.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import pandas as pd
df = pd.read_csv("dataplexa_ml_housing_customer_dataset.csv")
X = df.drop("loan_approved", axis=1)
y = df["loan_approved"]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy_score(y_test, y_pred)
This gives us a reference point. We now know how the model performs without tuning.
Tuning Regularization Strength
Logistic Regression uses regularization to prevent overfitting.
The hyperparameter C
controls the strength of regularization.
Smaller values of C
mean stronger regularization.
Larger values mean weaker regularization.
model = LogisticRegression(C=0.1, max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy_score(y_test, y_pred)
Changing this single hyperparameter can significantly impact model accuracy.
Real-World Perspective
In production systems, hyperparameter tuning is not optional.
Banks, e-commerce platforms, and healthcare systems tune models carefully to reduce errors and improve reliability.
Well-tuned simple models often outperform poorly tuned complex models.
Mini Practice
Change the value of C
to 1, 10, and 50.
Observe how accuracy changes. Think about why stronger or weaker regularization affects performance.
Exercises
Exercise 1:
Why are hyperparameters not learned from data?
Exercise 2:
Can hyperparameter tuning reduce overfitting?
Quick Quiz
Q1. Does higher model complexity always mean better performance?
In the next lesson, we will learn Grid Search, a systematic way to find optimal hyperparameters.