Random Search
In the previous lesson, we used Grid Search to find the best hyperparameters. Grid Search tested every possible combination inside a fixed grid.
While Grid Search is reliable, it becomes very slow when the number of hyperparameters increases.
In real-world machine learning projects, teams often work with large models and limited computing resources.
This is where Random Search becomes extremely useful.
What Is Random Search?
Random Search is a hyperparameter tuning technique that randomly selects combinations from a defined parameter space.
Instead of trying every possible combination, it samples a fixed number of random configurations.
Surprisingly, Random Search often finds good or even better models in much less time than Grid Search.
Why Random Search Works Well
In many machine learning models, only a few hyperparameters have a large impact on performance.
Random Search explores the parameter space more efficiently by focusing on diverse combinations instead of exhaustively testing everything.
This makes it a preferred choice for complex models such as Random Forest, XGBoost, and Neural Networks.
Dataset Reminder
We continue using the same dataset:
Dataplexa ML Housing & Customer Dataset
The problem remains loan approval prediction.
This consistency helps you understand how tuning techniques affect the same data.
Preparing the Data
The data preparation steps remain unchanged. This is intentional.
A good ML workflow separates data preparation from tuning.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
df = pd.read_csv("dataplexa_ml_housing_customer_dataset.csv")
X = df.drop("loan_approved", axis=1)
y = df["loan_approved"]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
Defining the Parameter Distribution
Unlike Grid Search, Random Search uses distributions or ranges.
This allows the algorithm to explore more values efficiently.
from sklearn.model_selection import RandomizedSearchCV
param_dist = {
"C": [0.01, 0.1, 1, 10, 50, 100],
"penalty": ["l2"],
"solver": ["liblinear"]
}
Running Random Search
We specify how many random combinations we want to test.
This gives us direct control over training time.
random_search = RandomizedSearchCV(
LogisticRegression(max_iter=1000),
param_distributions=param_dist,
n_iter=10,
cv=5,
scoring="accuracy",
random_state=42
)
random_search.fit(X_train, y_train)
Best Parameters Found
After training completes, Random Search reports the best combination it discovered.
random_search.best_params_
These parameters may differ from Grid Search results, but performance is often similar.
Evaluating the Tuned Model
We now evaluate the best model on unseen test data.
best_model = random_search.best_estimator_
best_model.score(X_test, y_test)
In practice, this score is achieved with much less computation than Grid Search.
Real-World Example
Imagine training a credit scoring model with millions of customers.
Trying every hyperparameter combination would be impractical.
Random Search allows teams to build strong models quickly and iterate faster.
Mini Practice
Increase n_iter
and observe how model accuracy changes.
Notice the balance between performance and training time.
Exercises
Exercise 1:
Why is Random Search faster than Grid Search?
Exercise 2:
When would you prefer Grid Search instead?
Quick Quiz
Q1. Does Random Search guarantee the best possible model?
In the next lesson, we will learn how to combine preprocessing, training, and tuning into a single workflow using ML Pipelines.