Linear Regression
Congratulations — this is an important milestone.
Up to Lesson 15, we focused on foundations: data, math, probability, evaluation, and model behavior.
Now we officially start Machine Learning algorithms with the most fundamental one: Linear Regression.
What Is Linear Regression?
Linear Regression is a supervised machine learning algorithm used to predict a continuous value.
It learns a relationship between input features and a numerical output.
In simple words:
It fits a straight line that best represents the data.
Real-World Intuition
Think about predicting house prices.
As house size increases, price usually increases as well.
Linear regression tries to capture this relationship mathematically.
Even when the relationship is not perfect, it finds the best possible line.
Mathematical Idea (Without Fear)
Linear regression follows this idea:
Prediction = (weight × feature) + bias
With multiple features, the idea becomes:
Prediction = w₁x₁ + w₂x₂ + w₃x₃ + … + b
You do NOT need to compute this manually. Libraries do it for you.
Using Our Dataset
We continue using the same dataset introduced earlier:
Dataplexa ML Housing & Customer Dataset
In this lesson, we assume a target such as house_price or a similar numerical column.
(If your dataset uses a different numeric target, the logic remains exactly the same.)
Preparing Data for Linear Regression
We separate features and target, then split data into training and testing sets.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
df = pd.read_csv("dataplexa_ml_housing_customer_dataset.csv")
X = df.drop("house_price", axis=1)
y = df["house_price"]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
Training the Linear Regression Model
Now we train the model using training data.
model = LinearRegression()
model.fit(X_train, y_train)
At this stage, the model has learned weights and bias internally.
Making Predictions
Once trained, the model can predict values for unseen data.
y_pred = model.predict(X_test)
y_pred[:5]
Each value represents a predicted house price.
Understanding Coefficients
Each feature has a coefficient (weight).
A higher coefficient means that feature has more influence on prediction.
coefficients = pd.Series(model.coef_, index=X.columns)
coefficients
This helps us interpret the model.
Evaluating Linear Regression
For regression problems, we do not use accuracy.
Common metrics include:
Mean Absolute Error (MAE) Mean Squared Error (MSE) R² Score
from sklearn.metrics import mean_absolute_error, r2_score
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
mae, r2
Interpreting the Results
Lower MAE means predictions are closer to actual values.
R² tells us how much variance in the target is explained by the model.
An R² close to 1 means a strong model.
Strengths of Linear Regression
Simple and fast Easy to interpret Works well for linear relationships
Limitations
Cannot capture complex nonlinear patterns Sensitive to outliers Requires assumptions to hold
Mini Practice
Think about our dataset.
Ask yourself:
Which feature should influence price the most? Would adding irrelevant features reduce performance?
Exercises
Exercise 1:
What type of problem does linear regression solve?
Exercise 2:
What does a coefficient represent?
Exercise 3:
Why don’t we use accuracy for regression?
Quick Quiz
Q1. Can linear regression handle nonlinear patterns?
Q2. Is linear regression interpretable?
In the next lesson, we will extend this idea to Logistic Regression, which is used for classification problems.