ML Lesson 33 – Random Search | Dataplexa

Random Search

In the previous lesson, we used Grid Search to find the best hyperparameters. Grid Search tested every possible combination inside a fixed grid.

While Grid Search is reliable, it becomes very slow when the number of hyperparameters increases.

In real-world machine learning projects, teams often work with large models and limited computing resources.

This is where Random Search becomes extremely useful.


What Is Random Search?

Random Search is a hyperparameter tuning technique that randomly selects combinations from a defined parameter space.

Instead of trying every possible combination, it samples a fixed number of random configurations.

Surprisingly, Random Search often finds good or even better models in much less time than Grid Search.


Why Random Search Works Well

In many machine learning models, only a few hyperparameters have a large impact on performance.

Random Search explores the parameter space more efficiently by focusing on diverse combinations instead of exhaustively testing everything.

This makes it a preferred choice for complex models such as Random Forest, XGBoost, and Neural Networks.


Dataset Reminder

We continue using the same dataset:

Dataplexa ML Housing & Customer Dataset

The problem remains loan approval prediction.

This consistency helps you understand how tuning techniques affect the same data.


Preparing the Data

The data preparation steps remain unchanged. This is intentional.

A good ML workflow separates data preparation from tuning.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

df = pd.read_csv("dataplexa_ml_housing_customer_dataset.csv")

X = df.drop("loan_approved", axis=1)
y = df["loan_approved"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

Defining the Parameter Distribution

Unlike Grid Search, Random Search uses distributions or ranges.

This allows the algorithm to explore more values efficiently.

from sklearn.model_selection import RandomizedSearchCV

param_dist = {
    "C": [0.01, 0.1, 1, 10, 50, 100],
    "penalty": ["l2"],
    "solver": ["liblinear"]
}

Running Random Search

We specify how many random combinations we want to test.

This gives us direct control over training time.

random_search = RandomizedSearchCV(
    LogisticRegression(max_iter=1000),
    param_distributions=param_dist,
    n_iter=10,
    cv=5,
    scoring="accuracy",
    random_state=42
)

random_search.fit(X_train, y_train)

Best Parameters Found

After training completes, Random Search reports the best combination it discovered.

random_search.best_params_

These parameters may differ from Grid Search results, but performance is often similar.


Evaluating the Tuned Model

We now evaluate the best model on unseen test data.

best_model = random_search.best_estimator_

best_model.score(X_test, y_test)

In practice, this score is achieved with much less computation than Grid Search.


Real-World Example

Imagine training a credit scoring model with millions of customers.

Trying every hyperparameter combination would be impractical.

Random Search allows teams to build strong models quickly and iterate faster.


Mini Practice

Increase n_iter and observe how model accuracy changes.

Notice the balance between performance and training time.


Exercises

Exercise 1:
Why is Random Search faster than Grid Search?

Because it evaluates only a limited number of randomly chosen parameter combinations.

Exercise 2:
When would you prefer Grid Search instead?

When the parameter space is small and exhaustive evaluation is affordable.

Quick Quiz

Q1. Does Random Search guarantee the best possible model?

No. It finds a good model within the sampled parameter space.

In the next lesson, we will learn how to combine preprocessing, training, and tuning into a single workflow using ML Pipelines.