ML Lesson 17 – Logistic Regression | Dataplexa

Logistic Regression

Welcome to your first classification algorithm.

In Lesson 16, we used Linear Regression to predict continuous values such as house prices.

Now we move to a different type of problem: classification.


What Is Logistic Regression?

Logistic Regression is a supervised machine learning algorithm used to predict categories, not numbers.

Typical outputs are:

Yes / No True / False 0 / 1

Even though its name contains the word “regression”, it is actually a classification algorithm.


Real-World Example

Think about loan approval.

Based on customer income, age, credit score, and spending behavior, a bank wants to decide:

Approve loan (1) or Reject loan (0)

Logistic regression is perfect for this kind of decision.


How Logistic Regression Thinks

Instead of predicting a value directly, logistic regression predicts a probability.

Example:

Probability of loan approval = 0.82

If probability ≥ 0.5 → class = 1 If probability < 0.5 → class = 0

This probability comes from a function called the Sigmoid function.


The Sigmoid Function (Intuition)

The sigmoid function converts any number into a value between 0 and 1.

Large positive numbers → close to 1 Large negative numbers → close to 0

This makes it ideal for classification.


Using Our Dataset

We continue using the same dataset:

Dataplexa ML Housing & Customer Dataset

In this lesson, we assume a binary target column such as:

loan_approved (0 = No, 1 = Yes)

If your dataset uses a different binary column, the steps remain identical.


Preparing Data

We separate features and target, then split data into training and testing sets.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

df = pd.read_csv("dataplexa_ml_housing_customer_dataset.csv")

X = df.drop("loan_approved", axis=1)
y = df["loan_approved"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

Training the Logistic Regression Model

Now we train the classification model.

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

The model learns how each feature affects the probability of approval.


Making Predictions

We can predict class labels directly:

y_pred = model.predict(X_test)
y_pred[:10]

Or predict probabilities:

y_prob = model.predict_proba(X_test)
y_prob[:5]

Evaluating Logistic Regression

For classification, we use different metrics than regression.

Common metrics include:

Accuracy Precision Recall F1-score Confusion Matrix

from sklearn.metrics import accuracy_score, classification_report

accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

accuracy
print(report)

Understanding the Output

Accuracy tells how many predictions were correct.

Precision tells how reliable positive predictions are.

Recall tells how many actual positives were found.

F1-score balances precision and recall.


Where Logistic Regression Works Best

Binary classification problems Clear linear boundaries Probability-based decisions


Where It Struggles

Complex nonlinear relationships Highly imbalanced datasets Multiple interacting features


Mini Practice

Think about real systems:

Spam detection Disease diagnosis Customer churn prediction

All of these can start with logistic regression.


Exercises

Exercise 1:
What type of output does logistic regression predict?

It predicts class labels based on probabilities (usually 0 or 1).

Exercise 2:
Why do we use the sigmoid function?

Because it converts any value into a probability between 0 and 1.

Exercise 3:
Is logistic regression a regression or classification algorithm?

It is a classification algorithm, despite its name.

Quick Quiz

Q1. Can logistic regression handle multi-class problems?

Yes. Using techniques like one-vs-rest.

Q2. Does logistic regression output probabilities?

Yes. Probabilities are converted into class labels using a threshold.

In the next lesson, we move to a powerful tree-based model: Decision Trees.