AI Lesson 30 – Random Forest | Dataplexa

Random Forest

Random Forest is an ensemble machine learning algorithm that builds multiple decision trees and combines their results to make a final prediction. Instead of relying on a single tree, it takes decisions from many trees, which makes the model more accurate and stable.

This lesson explains why random forest was created, how it works internally, how it improves decision trees, and how to implement it using code.

Why Random Forest Was Introduced

Decision trees are easy to understand but they suffer from a major problem — overfitting. A small change in data can create a very different tree and lead to poor predictions.

Random forest solves this by creating many trees and letting them vote.

One tree may be wrong
Many trees together are usually right

What Is a Random Forest?

A random forest is a collection of decision trees trained on different subsets of the data. Each tree makes a prediction, and the final output is based on majority voting (classification) or averaging (regression).

This approach reduces variance and improves generalization.

Real-World Connection

Think of a medical diagnosis scenario. Instead of relying on one doctor, multiple doctors review the case independently and then agree on a final decision.

Random forest works in the same way — each tree acts like an independent expert.

How Random Forest Works

Random forest introduces randomness in two ways:

Each tree is trained on a random subset of the data
Each split considers a random subset of features

This randomness ensures that trees are different from each other, which improves the final prediction.

Simple Random Forest Example

The following example demonstrates how a random forest classifier works using Python.


from sklearn.ensemble import RandomForestClassifier

# Sample data
X = [[25, 50000], [30, 60000], [45, 80000], [35, 65000], [50, 90000]]
y = [0, 0, 1, 0, 1]  # 0 = Reject, 1 = Approve

# Create model
model = RandomForestClassifier(n_estimators=100, random_state=42)

# Train model
model.fit(X, y)

# Predict
prediction = model.predict([[40, 70000]])
print(prediction)

[1]

The model predicts approval by combining decisions from multiple trees rather than relying on a single rule.

How Random Forest Makes Better Decisions

Each tree in the forest looks at the data slightly differently. Some trees focus more on income, others on age, and others on different combinations.

The final decision is based on the most common outcome among all trees.

Advantages of Random Forest

High accuracy compared to single trees
Less prone to overfitting
Handles large datasets well
Works with missing values

Limitations of Random Forest

Less interpretable than a single decision tree
Requires more computation
Model size can be large

Practice Questions

Practice 1: Random forest belongs to which type of learning approach?

Practice 2: Random forest is made up of?

Practice 3: Random forest mainly reduces which problem?

Quick Quiz

Quiz 1: How does random forest make final predictions?

Single rule
Voting
Random guess

Quiz 2: What makes trees in a random forest different?

Same data
Randomness
Manual rules

Quiz 3: Random forest can be used for?

Classification and Regression
Clustering only
Dimensionality Reduction

Coming up next: Gradient Boosting — building models sequentially to correct previous mistakes.

← Previous Course Index Next →

AI Course