ML Lesson 32 – Grid Search | Dataplexa

Grid Search

In the previous lesson, we learned that hyperparameters control how a model learns and that choosing good values is critical for performance.

We manually changed one hyperparameter and observed how model accuracy changed. While this approach works for learning, it is not practical for real projects.

This lesson introduces Grid Search, a systematic and reliable method to find the best hyperparameter values.


What Is Grid Search?

Grid Search is an automated technique that tries all possible combinations of a predefined set of hyperparameter values.

Instead of guessing, we define a grid of values and let the algorithm evaluate each combination.

The combination that produces the best performance is selected as the optimal configuration.


Why Grid Search Is Important

Real-world machine learning systems must be consistent and reproducible.

Grid Search removes personal bias from tuning and ensures that every candidate configuration is evaluated fairly.

This is especially important in regulated industries such as banking and healthcare.


Our Dataset Context

We continue working with the same dataset:

Dataplexa ML Housing & Customer Dataset

The task remains loan approval prediction.

We will again use Logistic Regression to demonstrate Grid Search clearly.


Preparing the Training Data

We split the dataset into training and testing sets, exactly as we have done before.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

df = pd.read_csv("dataplexa_ml_housing_customer_dataset.csv")

X = df.drop("loan_approved", axis=1)
y = df["loan_approved"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

Defining the Hyperparameter Grid

Before running Grid Search, we must define which hyperparameters to test and which values to try.

Here we tune the regularization strength C and the penalty type.

from sklearn.model_selection import GridSearchCV

param_grid = {
    "C": [0.01, 0.1, 1, 10, 50],
    "penalty": ["l2"],
    "solver": ["liblinear"]
}

This grid represents all combinations of the specified values.


Running Grid Search

We now train Grid Search using cross-validation.

Each hyperparameter combination is evaluated multiple times to ensure stability.

grid = GridSearchCV(
    LogisticRegression(max_iter=1000),
    param_grid,
    cv=5,
    scoring="accuracy"
)

grid.fit(X_train, y_train)

Best Hyperparameters Found

After training completes, Grid Search reveals the best configuration.

grid.best_params_

These values represent the most effective hyperparameter combination for our dataset and model.


Evaluating the Tuned Model

We now test the tuned model on unseen data.

best_model = grid.best_estimator_

best_model.score(X_test, y_test)

This score is usually higher than the baseline model from Lesson 31.


Real-World Perspective

In production systems, Grid Search is often used during model development.

However, because it tests every combination, it can become computationally expensive when the grid is large.

This limitation leads us naturally to the next technique: Random Search.


Mini Practice

Add more values to the C parameter and observe how training time changes.

Notice that better performance comes at the cost of higher computation.


Exercises

Exercise 1:
Why does Grid Search use cross-validation?

To ensure that performance is stable and not dependent on a single data split.

Exercise 2:
What is the main disadvantage of Grid Search?

It becomes computationally expensive as the number of parameters grows.

Quick Quiz

Q1. Does Grid Search guarantee the best possible model?

It guarantees the best model within the defined parameter grid.

In the next lesson, we will study Random Search, a more efficient alternative to Grid Search for large hyperparameter spaces.