Decision Trees
Welcome to one of the most intuitive and powerful algorithms in Machine Learning — Decision Trees.
In the previous lesson, we learned Logistic Regression, which makes decisions using probabilities and mathematical equations.
Decision Trees work very differently. They think more like a human decision process.
What Is a Decision Tree?
A Decision Tree is a supervised machine learning algorithm used for both:
Classification and Regression
It makes predictions by asking a sequence of questions and following a path based on answers.
Each question splits the data into smaller groups, until a final decision is reached.
Real-World Example
Think about loan approval again.
A bank officer may think like this:
If income > 50,000 → check credit score If credit score > 700 → approve loan Else → reject loan
This exact thinking structure is what a decision tree learns automatically.
How a Decision Tree Is Structured
A decision tree has four main parts:
Root Node – the first question Decision Nodes – internal questions Branches – outcomes of questions Leaf Nodes – final predictions
The model keeps splitting data until it reaches a stopping condition.
Using Our Dataset
We continue using the same dataset:
Dataplexa ML Housing & Customer Dataset
Target variable:
loan_approved (0 = No, 1 = Yes)
Decision Trees are excellent for this type of binary classification.
Preparing the Data
We follow the same preprocessing pattern you already know.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
df = pd.read_csv("dataplexa_ml_housing_customer_dataset.csv")
X = df.drop("loan_approved", axis=1)
y = df["loan_approved"]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
Training a Decision Tree
Now we train the decision tree classifier.
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)
The model automatically learns: which feature to split, where to split, and how deep the tree should grow.
Making Predictions
Prediction is straightforward.
y_pred = model.predict(X_test)
y_pred[:10]
Evaluating the Model
Decision Trees are evaluated using classification metrics.
from sklearn.metrics import accuracy_score, classification_report
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
How Decision Trees Choose Splits
Decision trees use mathematical criteria to decide the best split.
Common criteria:
Gini Impurity – measures how mixed classes are Entropy – measures disorder in data
The goal is to create pure groups where most samples belong to one class.
Why Decision Trees Are Powerful
They are easy to understand They handle non-linear relationships They require little data preprocessing They work well with mixed data types
Limitations of Decision Trees
They can overfit the training data They are sensitive to small data changes They do not generalize well alone
This is why ensemble methods like Random Forest exist.
Mini Practice
Imagine an e-commerce website deciding:
If customer visits > 10 If average spend > 5,000 If cart abandonment is low
Then recommend premium membership.
This logic fits naturally into a decision tree.
Exercises
Exercise 1:
What type of problems can decision trees solve?
Exercise 2:
What is the role of a leaf node?
Exercise 3:
Why do decision trees overfit easily?
Quick Quiz
Q1. Do decision trees need feature scaling?
Q2. What happens if a tree grows too deep?
In the next lesson, we build on this idea and learn Random Forest — a powerful ensemble of decision trees.