Logistic Regression
Welcome to your first classification algorithm.
In Lesson 16, we used Linear Regression to predict continuous values such as house prices.
Now we move to a different type of problem: classification.
What Is Logistic Regression?
Logistic Regression is a supervised machine learning algorithm used to predict categories, not numbers.
Typical outputs are:
Yes / No True / False 0 / 1
Even though its name contains the word “regression”, it is actually a classification algorithm.
Real-World Example
Think about loan approval.
Based on customer income, age, credit score, and spending behavior, a bank wants to decide:
Approve loan (1) or Reject loan (0)
Logistic regression is perfect for this kind of decision.
How Logistic Regression Thinks
Instead of predicting a value directly, logistic regression predicts a probability.
Example:
Probability of loan approval = 0.82
If probability ≥ 0.5 → class = 1 If probability < 0.5 → class = 0
This probability comes from a function called the Sigmoid function.
The Sigmoid Function (Intuition)
The sigmoid function converts any number into a value between 0 and 1.
Large positive numbers → close to 1 Large negative numbers → close to 0
This makes it ideal for classification.
Using Our Dataset
We continue using the same dataset:
Dataplexa ML Housing & Customer Dataset
In this lesson, we assume a binary target column such as:
loan_approved (0 = No, 1 = Yes)
If your dataset uses a different binary column, the steps remain identical.
Preparing Data
We separate features and target, then split data into training and testing sets.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
df = pd.read_csv("dataplexa_ml_housing_customer_dataset.csv")
X = df.drop("loan_approved", axis=1)
y = df["loan_approved"]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
Training the Logistic Regression Model
Now we train the classification model.
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
The model learns how each feature affects the probability of approval.
Making Predictions
We can predict class labels directly:
y_pred = model.predict(X_test)
y_pred[:10]
Or predict probabilities:
y_prob = model.predict_proba(X_test)
y_prob[:5]
Evaluating Logistic Regression
For classification, we use different metrics than regression.
Common metrics include:
Accuracy Precision Recall F1-score Confusion Matrix
from sklearn.metrics import accuracy_score, classification_report
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
accuracy
print(report)
Understanding the Output
Accuracy tells how many predictions were correct.
Precision tells how reliable positive predictions are.
Recall tells how many actual positives were found.
F1-score balances precision and recall.
Where Logistic Regression Works Best
Binary classification problems Clear linear boundaries Probability-based decisions
Where It Struggles
Complex nonlinear relationships Highly imbalanced datasets Multiple interacting features
Mini Practice
Think about real systems:
Spam detection Disease diagnosis Customer churn prediction
All of these can start with logistic regression.
Exercises
Exercise 1:
What type of output does logistic regression predict?
Exercise 2:
Why do we use the sigmoid function?
Exercise 3:
Is logistic regression a regression or classification algorithm?
Quick Quiz
Q1. Can logistic regression handle multi-class problems?
Q2. Does logistic regression output probabilities?
In the next lesson, we move to a powerful tree-based model: Decision Trees.