AI Course
Random Forest
Random Forest is an ensemble machine learning algorithm that builds multiple decision trees and combines their results to make a final prediction. Instead of relying on a single tree, it takes decisions from many trees, which makes the model more accurate and stable.
This lesson explains why random forest was created, how it works internally, how it improves decision trees, and how to implement it using code.
Why Random Forest Was Introduced
Decision trees are easy to understand but they suffer from a major problem — overfitting. A small change in data can create a very different tree and lead to poor predictions.
Random forest solves this by creating many trees and letting them vote.
- One tree may be wrong
- Many trees together are usually right
What Is a Random Forest?
A random forest is a collection of decision trees trained on different subsets of the data. Each tree makes a prediction, and the final output is based on majority voting (classification) or averaging (regression).
This approach reduces variance and improves generalization.
Real-World Connection
Think of a medical diagnosis scenario. Instead of relying on one doctor, multiple doctors review the case independently and then agree on a final decision.
Random forest works in the same way — each tree acts like an independent expert.
How Random Forest Works
Random forest introduces randomness in two ways:
- Each tree is trained on a random subset of the data
- Each split considers a random subset of features
This randomness ensures that trees are different from each other, which improves the final prediction.
Simple Random Forest Example
The following example demonstrates how a random forest classifier works using Python.
from sklearn.ensemble import RandomForestClassifier
# Sample data
X = [[25, 50000], [30, 60000], [45, 80000], [35, 65000], [50, 90000]]
y = [0, 0, 1, 0, 1] # 0 = Reject, 1 = Approve
# Create model
model = RandomForestClassifier(n_estimators=100, random_state=42)
# Train model
model.fit(X, y)
# Predict
prediction = model.predict([[40, 70000]])
print(prediction)
The model predicts approval by combining decisions from multiple trees rather than relying on a single rule.
How Random Forest Makes Better Decisions
Each tree in the forest looks at the data slightly differently. Some trees focus more on income, others on age, and others on different combinations.
The final decision is based on the most common outcome among all trees.
Advantages of Random Forest
- High accuracy compared to single trees
- Less prone to overfitting
- Handles large datasets well
- Works with missing values
Limitations of Random Forest
- Less interpretable than a single decision tree
- Requires more computation
- Model size can be large
Practice Questions
Practice 1: Random forest belongs to which type of learning approach?
Practice 2: Random forest is made up of?
Practice 3: Random forest mainly reduces which problem?
Quick Quiz
Quiz 1: How does random forest make final predictions?
Quiz 2: What makes trees in a random forest different?
Quiz 3: Random forest can be used for?
Coming up next: Gradient Boosting — building models sequentially to correct previous mistakes.