AI Course
Decision Trees
A Decision Tree is a supervised machine learning algorithm that makes decisions by splitting data into smaller and smaller parts based on certain conditions. It looks like a tree structure where each decision leads to another decision until a final outcome is reached.
Decision Trees are easy to understand, interpret, and visualize, which makes them very popular in real-world applications.
Real-World Connection
Think about how you decide to buy a phone. You may first ask: Is it within budget? If yes, does it have a good camera? If yes, does it support fast charging? Each question leads you closer to a decision. This step-by-step questioning is exactly how a decision tree works.
How a Decision Tree Works
A Decision Tree splits data based on features. At each split, it chooses the feature that best separates the data into meaningful groups.
- Root Node: first split
- Decision Nodes: internal conditions
- Leaf Nodes: final prediction
Key Concepts in Decision Trees
- Splitting: dividing data based on feature values
- Impurity: how mixed the data is
- Depth: number of levels in the tree
Gini Impurity and Entropy
To decide the best split, decision trees use measures like Gini Impurity or Entropy. These metrics tell the tree how pure or mixed a node is.
Lower impurity means better separation.
Simple Decision Tree Example
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
model = DecisionTreeClassifier(max_depth=3)
model.fit(X, y)
print(model.score(X, y))
The model learns decision rules that classify flower species based on input features.
Decision Tree for Regression
Decision Trees can also be used for regression problems where the output is a numeric value.
from sklearn.tree import DecisionTreeRegressor
import numpy as np
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1, 4, 9, 16, 25])
model = DecisionTreeRegressor(max_depth=2)
model.fit(X, y)
print(model.predict([[6]]))
The tree predicts values by averaging outcomes in each leaf node.
Advantages of Decision Trees
- Easy to interpret
- No feature scaling required
- Works with numerical and categorical data
Limitations of Decision Trees
- Prone to overfitting
- Unstable with small data changes
- Can grow very large
Practice Questions
Practice 1: Which algorithm makes decisions using if-else conditions?
Practice 2: What node gives the final prediction?
Practice 3: Which metric measures impurity?
Quick Quiz
Quiz 1: Decision Trees can be used for?
Quiz 2: A major drawback of Decision Trees is?
Quiz 3: The first split in a decision tree is called?
Coming up next: Random Forest — combining multiple decision trees for better performance.