Grid Search
In the previous lesson, we learned that hyperparameters control how a model learns and that choosing good values is critical for performance.
We manually changed one hyperparameter and observed how model accuracy changed. While this approach works for learning, it is not practical for real projects.
This lesson introduces Grid Search, a systematic and reliable method to find the best hyperparameter values.
What Is Grid Search?
Grid Search is an automated technique that tries all possible combinations of a predefined set of hyperparameter values.
Instead of guessing, we define a grid of values and let the algorithm evaluate each combination.
The combination that produces the best performance is selected as the optimal configuration.
Why Grid Search Is Important
Real-world machine learning systems must be consistent and reproducible.
Grid Search removes personal bias from tuning and ensures that every candidate configuration is evaluated fairly.
This is especially important in regulated industries such as banking and healthcare.
Our Dataset Context
We continue working with the same dataset:
Dataplexa ML Housing & Customer Dataset
The task remains loan approval prediction.
We will again use Logistic Regression to demonstrate Grid Search clearly.
Preparing the Training Data
We split the dataset into training and testing sets, exactly as we have done before.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
df = pd.read_csv("dataplexa_ml_housing_customer_dataset.csv")
X = df.drop("loan_approved", axis=1)
y = df["loan_approved"]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
Defining the Hyperparameter Grid
Before running Grid Search, we must define which hyperparameters to test and which values to try.
Here we tune the regularization strength C
and the penalty type.
from sklearn.model_selection import GridSearchCV
param_grid = {
"C": [0.01, 0.1, 1, 10, 50],
"penalty": ["l2"],
"solver": ["liblinear"]
}
This grid represents all combinations of the specified values.
Running Grid Search
We now train Grid Search using cross-validation.
Each hyperparameter combination is evaluated multiple times to ensure stability.
grid = GridSearchCV(
LogisticRegression(max_iter=1000),
param_grid,
cv=5,
scoring="accuracy"
)
grid.fit(X_train, y_train)
Best Hyperparameters Found
After training completes, Grid Search reveals the best configuration.
grid.best_params_
These values represent the most effective hyperparameter combination for our dataset and model.
Evaluating the Tuned Model
We now test the tuned model on unseen data.
best_model = grid.best_estimator_
best_model.score(X_test, y_test)
This score is usually higher than the baseline model from Lesson 31.
Real-World Perspective
In production systems, Grid Search is often used during model development.
However, because it tests every combination, it can become computationally expensive when the grid is large.
This limitation leads us naturally to the next technique: Random Search.
Mini Practice
Add more values to the C parameter
and observe how training time changes.
Notice that better performance comes at the cost of higher computation.
Exercises
Exercise 1:
Why does Grid Search use cross-validation?
Exercise 2:
What is the main disadvantage of Grid Search?
Quick Quiz
Q1. Does Grid Search guarantee the best possible model?
In the next lesson, we will study Random Search, a more efficient alternative to Grid Search for large hyperparameter spaces.