AI Lesson 49 – RNN, LSTM & GRU | Dataplexa

Random Forest

Random Forest is an ensemble machine learning algorithm that builds multiple Decision Trees and combines their results to make a final prediction. Instead of relying on one tree, it takes decisions from many trees, which makes the model more accurate and stable.

If a single decision tree is like asking one expert for advice, a random forest is like asking a group of experts and choosing the most common answer.

Real-World Connection

Imagine a company hiring a candidate. Instead of trusting one interviewer, they ask multiple interviewers to evaluate the candidate. If most interviewers agree, the decision becomes more reliable. Random Forest works the same way by combining predictions from many trees.

Why Random Forest Is Better Than a Single Decision Tree

Decision Trees are powerful but tend to overfit. Random Forest reduces overfitting by:

Training many trees on different data samples
Using random subsets of features for each split
Averaging results (regression) or voting (classification)

How Random Forest Works

The algorithm follows these steps:

Create multiple bootstrap samples from the dataset
Train a decision tree on each sample
Each tree makes its own prediction
Final output is decided by majority vote or average

Random Forest for Classification


from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)

model = RandomForestClassifier(
    n_estimators=100,
    max_depth=5,
    random_state=42
)

model.fit(X, y)
print(model.score(X, y))

0.99

Here, 100 decision trees are created. Each tree learns slightly different patterns, and their combined prediction produces a more accurate result.

Understanding the Code

The n_estimators parameter defines how many trees are built. More trees usually improve performance but increase computation. The max_depth parameter controls how deep each tree can grow, helping to prevent overfitting.

Random Forest for Regression


from sklearn.ensemble import RandomForestRegressor
import numpy as np

X = np.array([[1],[2],[3],[4],[5]])
y = np.array([2,4,6,8,10])

model = RandomForestRegressor(
    n_estimators=50,
    random_state=42
)

model.fit(X, y)
print(model.predict([[6]]))

[9.8]

The prediction is based on averaging outputs from many trees, which makes regression results smoother and more reliable.

Feature Importance in Random Forest

One powerful advantage of Random Forest is that it can tell which features are most important.


import pandas as pd

importance = model.feature_importances_
print(importance)

Feature importance helps data scientists understand which inputs influence predictions the most.

Advantages of Random Forest

High accuracy
Less overfitting than decision trees
Works well with large datasets
Handles missing values better

Limitations of Random Forest

Slower than single trees
Less interpretable
Requires more memory

Practice Questions

Practice 1: Random Forest belongs to which type of learning method?

Practice 2: Which parameter controls the number of trees?

Practice 3: Random Forest mainly helps reduce which problem?

Quick Quiz

Quiz 1: How does Random Forest make final classification decisions?

Single Tree
Majority Voting
Random Guessing

Quiz 2: Which output helps explain model behavior?

Accuracy
Feature Importance
Loss

Quiz 3: Random Forest can be used for?

Classification and Regression
Only Classification
Only Regression

Coming up next: Gradient Boosting — learning from mistakes to build stronger models.

← Previous Course Index Next →

AI Course