ML Lesson 16 – Linear Regression | Dataplexa

Linear Regression

Congratulations — this is an important milestone.

Up to Lesson 15, we focused on foundations: data, math, probability, evaluation, and model behavior.

Now we officially start Machine Learning algorithms with the most fundamental one: Linear Regression.


What Is Linear Regression?

Linear Regression is a supervised machine learning algorithm used to predict a continuous value.

It learns a relationship between input features and a numerical output.

In simple words:

It fits a straight line that best represents the data.


Real-World Intuition

Think about predicting house prices.

As house size increases, price usually increases as well.

Linear regression tries to capture this relationship mathematically.

Even when the relationship is not perfect, it finds the best possible line.


Mathematical Idea (Without Fear)

Linear regression follows this idea:

Prediction = (weight × feature) + bias

With multiple features, the idea becomes:

Prediction = w₁x₁ + w₂x₂ + w₃x₃ + … + b

You do NOT need to compute this manually. Libraries do it for you.


Using Our Dataset

We continue using the same dataset introduced earlier:

Dataplexa ML Housing & Customer Dataset

In this lesson, we assume a target such as house_price or a similar numerical column.

(If your dataset uses a different numeric target, the logic remains exactly the same.)


Preparing Data for Linear Regression

We separate features and target, then split data into training and testing sets.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

df = pd.read_csv("dataplexa_ml_housing_customer_dataset.csv")

X = df.drop("house_price", axis=1)
y = df["house_price"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

Training the Linear Regression Model

Now we train the model using training data.

model = LinearRegression()
model.fit(X_train, y_train)

At this stage, the model has learned weights and bias internally.


Making Predictions

Once trained, the model can predict values for unseen data.

y_pred = model.predict(X_test)
y_pred[:5]

Each value represents a predicted house price.


Understanding Coefficients

Each feature has a coefficient (weight).

A higher coefficient means that feature has more influence on prediction.

coefficients = pd.Series(model.coef_, index=X.columns)
coefficients

This helps us interpret the model.


Evaluating Linear Regression

For regression problems, we do not use accuracy.

Common metrics include:

Mean Absolute Error (MAE) Mean Squared Error (MSE) R² Score

from sklearn.metrics import mean_absolute_error, r2_score

mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

mae, r2

Interpreting the Results

Lower MAE means predictions are closer to actual values.

R² tells us how much variance in the target is explained by the model.

An R² close to 1 means a strong model.


Strengths of Linear Regression

Simple and fast Easy to interpret Works well for linear relationships


Limitations

Cannot capture complex nonlinear patterns Sensitive to outliers Requires assumptions to hold


Mini Practice

Think about our dataset.

Ask yourself:

Which feature should influence price the most? Would adding irrelevant features reduce performance?


Exercises

Exercise 1:
What type of problem does linear regression solve?

Linear regression solves regression problems with continuous numerical outputs.

Exercise 2:
What does a coefficient represent?

It represents the influence of a feature on the predicted value.

Exercise 3:
Why don’t we use accuracy for regression?

Because regression predicts continuous values, not class labels.

Quick Quiz

Q1. Can linear regression handle nonlinear patterns?

No. It assumes a linear relationship.

Q2. Is linear regression interpretable?

Yes. Coefficients clearly show feature influence.

In the next lesson, we will extend this idea to Logistic Regression, which is used for classification problems.