AI Lesson 49 – RNN, LSTM & GRU | Dataplexa

Random Forest

Random Forest is an ensemble machine learning algorithm that builds multiple Decision Trees and combines their results to make a final prediction. Instead of relying on one tree, it takes decisions from many trees, which makes the model more accurate and stable.

If a single decision tree is like asking one expert for advice, a random forest is like asking a group of experts and choosing the most common answer.

Real-World Connection

Imagine a company hiring a candidate. Instead of trusting one interviewer, they ask multiple interviewers to evaluate the candidate. If most interviewers agree, the decision becomes more reliable. Random Forest works the same way by combining predictions from many trees.

Why Random Forest Is Better Than a Single Decision Tree

Decision Trees are powerful but tend to overfit. Random Forest reduces overfitting by:

  • Training many trees on different data samples
  • Using random subsets of features for each split
  • Averaging results (regression) or voting (classification)

How Random Forest Works

The algorithm follows these steps:

  • Create multiple bootstrap samples from the dataset
  • Train a decision tree on each sample
  • Each tree makes its own prediction
  • Final output is decided by majority vote or average

Random Forest for Classification


from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)

model = RandomForestClassifier(
    n_estimators=100,
    max_depth=5,
    random_state=42
)

model.fit(X, y)
print(model.score(X, y))
  
0.99

Here, 100 decision trees are created. Each tree learns slightly different patterns, and their combined prediction produces a more accurate result.

Understanding the Code

The n_estimators parameter defines how many trees are built. More trees usually improve performance but increase computation. The max_depth parameter controls how deep each tree can grow, helping to prevent overfitting.

Random Forest for Regression


from sklearn.ensemble import RandomForestRegressor
import numpy as np

X = np.array([[1],[2],[3],[4],[5]])
y = np.array([2,4,6,8,10])

model = RandomForestRegressor(
    n_estimators=50,
    random_state=42
)

model.fit(X, y)
print(model.predict([[6]]))
  
[9.8]

The prediction is based on averaging outputs from many trees, which makes regression results smoother and more reliable.

Feature Importance in Random Forest

One powerful advantage of Random Forest is that it can tell which features are most important.


import pandas as pd

importance = model.feature_importances_
print(importance)
  

Feature importance helps data scientists understand which inputs influence predictions the most.

Advantages of Random Forest

  • High accuracy
  • Less overfitting than decision trees
  • Works well with large datasets
  • Handles missing values better

Limitations of Random Forest

  • Slower than single trees
  • Less interpretable
  • Requires more memory

Practice Questions

Practice 1: Random Forest belongs to which type of learning method?



Practice 2: Which parameter controls the number of trees?



Practice 3: Random Forest mainly helps reduce which problem?



Quick Quiz

Quiz 1: How does Random Forest make final classification decisions?





Quiz 2: Which output helps explain model behavior?





Quiz 3: Random Forest can be used for?





Coming up next: Gradient Boosting — learning from mistakes to build stronger models.