ML Lesson 10 – Probability for ML| Dataplexa

Probability for Machine Learning

In the previous lesson, we learned how machines represent data using vectors and matrices. Now we answer another core question: how does a machine handle uncertainty?

Real-world data is never perfect. Customers behave unpredictably, markets fluctuate, and outcomes are uncertain. Probability is the language machines use to reason under uncertainty.


Why Probability Matters in ML

Machine Learning is not about absolute answers. It is about estimating how likely something is to happen.

Examples:

Will a customer purchase a house? Is this email spam or not? Will a user click this advertisement?

All these are probability questions.


Our Dataset and Uncertainty

We continue using the same dataset throughout the ML module:

Dataplexa ML Housing & Customer Dataset

In this dataset, the target column purchase_decision does not always behave deterministically.

Two customers with similar income may still make different decisions. That uncertainty is exactly what probability models.


Basic Probability Intuition

Probability measures how likely an event is to occur. Its value always lies between 0 and 1.

0 means impossible. 1 means certain.

For example:

A probability of 0.7 means there is a 70% chance of occurrence.


Probability in Classification Models

In classification problems, models do not directly predict classes. They predict probabilities.

For example:

Probability that customer will purchase = 0.82 Probability that customer will not purchase = 0.18

The class with the higher probability is selected as the prediction.


Real-World Example

Think of a weather forecast.

If the weather app says there is a 60% chance of rain, it is not saying rain will definitely happen.

It is expressing uncertainty based on patterns in historical data.

ML models work in the same way.


Estimating Probability from Data

We can estimate probabilities by observing frequencies in data.

import pandas as pd

df = pd.read_csv("dataplexa_ml_housing_customer_dataset.csv")

purchase_rate = df["purchase_decision"].mean()
purchase_rate

This value tells us the overall probability of purchase in our dataset.


Conditional Probability (Intuition)

Conditional probability answers questions like:

What is the probability of purchase given high income?

This concept is essential for models like Naive Bayes.

high_income = df[df["income"] > df["income"].median()]
high_income["purchase_decision"].mean()

This shows how probability changes when conditions change.


Why Models Output Probabilities

Probabilities allow flexible decision-making.

For example:

Only target customers with purchase probability above 0.8 Send discounts to users with probability between 0.4 and 0.7

Without probability, such strategies are impossible.


Probability vs Deterministic Rules

Traditional programming uses fixed rules:

If income > X, then approve loan.

ML uses probability:

Based on many factors, likelihood of approval is 73%.

This makes ML adaptable and realistic.


Mini Practice

Think about our dataset.

Ask yourself:

Is purchase decision ever 100% certain? Can probability change based on income or house size?


Exercises

Exercise 1:
Why does ML rely on probability instead of certainty?

Because real-world data contains uncertainty and noise, making absolute rules unreliable.

Exercise 2:
What does a probability of 0.9 represent?

It represents a 90% likelihood that an event will occur.

Exercise 3:
Why are probabilities useful for business decisions?

They allow prioritization and risk-based decision-making instead of binary choices.

Quick Quiz

Q1. Do ML models predict certainty?

No. They predict probabilities.

Q2. Is probability used only in classification?

No. Regression models also rely on probabilistic assumptions.

In the next lesson, we will explore Overfitting and Underfitting, which explains why models sometimes perform well on training data but fail in real-world scenarios.