ML Lesson 18 – Decision Trees | Dataplexa

Decision Trees

Welcome to one of the most intuitive and powerful algorithms in Machine Learning — Decision Trees.

In the previous lesson, we learned Logistic Regression, which makes decisions using probabilities and mathematical equations.

Decision Trees work very differently. They think more like a human decision process.


What Is a Decision Tree?

A Decision Tree is a supervised machine learning algorithm used for both:

Classification and Regression

It makes predictions by asking a sequence of questions and following a path based on answers.

Each question splits the data into smaller groups, until a final decision is reached.


Real-World Example

Think about loan approval again.

A bank officer may think like this:

If income > 50,000 → check credit score If credit score > 700 → approve loan Else → reject loan

This exact thinking structure is what a decision tree learns automatically.


How a Decision Tree Is Structured

A decision tree has four main parts:

Root Node – the first question Decision Nodes – internal questions Branches – outcomes of questions Leaf Nodes – final predictions

The model keeps splitting data until it reaches a stopping condition.


Using Our Dataset

We continue using the same dataset:

Dataplexa ML Housing & Customer Dataset

Target variable:

loan_approved (0 = No, 1 = Yes)

Decision Trees are excellent for this type of binary classification.


Preparing the Data

We follow the same preprocessing pattern you already know.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

df = pd.read_csv("dataplexa_ml_housing_customer_dataset.csv")

X = df.drop("loan_approved", axis=1)
y = df["loan_approved"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

Training a Decision Tree

Now we train the decision tree classifier.

model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

The model automatically learns: which feature to split, where to split, and how deep the tree should grow.


Making Predictions

Prediction is straightforward.

y_pred = model.predict(X_test)
y_pred[:10]

Evaluating the Model

Decision Trees are evaluated using classification metrics.

from sklearn.metrics import accuracy_score, classification_report

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

print(classification_report(y_test, y_pred))

How Decision Trees Choose Splits

Decision trees use mathematical criteria to decide the best split.

Common criteria:

Gini Impurity – measures how mixed classes are Entropy – measures disorder in data

The goal is to create pure groups where most samples belong to one class.


Why Decision Trees Are Powerful

They are easy to understand They handle non-linear relationships They require little data preprocessing They work well with mixed data types


Limitations of Decision Trees

They can overfit the training data They are sensitive to small data changes They do not generalize well alone

This is why ensemble methods like Random Forest exist.


Mini Practice

Imagine an e-commerce website deciding:

If customer visits > 10 If average spend > 5,000 If cart abandonment is low

Then recommend premium membership.

This logic fits naturally into a decision tree.


Exercises

Exercise 1:
What type of problems can decision trees solve?

Decision trees can solve both classification and regression problems.

Exercise 2:
What is the role of a leaf node?

A leaf node provides the final prediction or class label.

Exercise 3:
Why do decision trees overfit easily?

Because they can grow too deep and memorize training data.

Quick Quiz

Q1. Do decision trees need feature scaling?

No. Decision trees are not affected by feature scaling.

Q2. What happens if a tree grows too deep?

It may overfit and perform poorly on unseen data.

In the next lesson, we build on this idea and learn Random Forest — a powerful ensemble of decision trees.