ML Lesson 9 – Linear Algebra (Basics) | Dataplexa

Linear Algebra (Basics) for Machine Learning

So far, we prepared and understood our data using visualization and statistics. Now we answer a very important question: how does a machine actually learn from this data?

The answer is linear algebra. Machine Learning models do not see rows and columns the way humans do. They see vectors, matrices, and mathematical transformations.


Why Linear Algebra Is Essential in ML

Every dataset is internally converted into numbers arranged in a matrix. Every model learns by performing operations on that matrix.

When a regression model predicts a price or a classifier predicts yes/no, it is performing matrix multiplication behind the scenes.

Understanding the basics removes fear and confusion when models become complex.


Our Dataset as a Matrix

We continue using the same dataset introduced in Lesson 4:

Dataplexa ML Housing & Customer Dataset

Each row represents one observation (one customer or one house). Each column represents one feature.

import pandas as pd
import numpy as np

df = pd.read_csv("dataplexa_ml_housing_customer_dataset.csv")

X = df.drop("purchase_decision", axis=1)
y = df["purchase_decision"]

X.shape, y.shape

Here, X is a matrix of features and y is a vector of targets.


What Is a Vector in ML?

A vector is a list of numbers. In ML, a single data point is represented as a vector.

For example, one customer can be represented as:

sample_vector = X.iloc[0].values
sample_vector

This vector contains information like income, house size, and other features.

The model learns by assigning importance (weights) to each element of this vector.


Understanding Matrices

When we stack many vectors together, we get a matrix. That matrix represents the entire dataset.

Most ML algorithms operate on matrices, not individual values.

matrix_X = X.values
matrix_X[:5]

This matrix is the mathematical form of our dataset.


Dot Product — The Core of Prediction

At the heart of linear models is the dot product.

A prediction is made by multiplying input features with learned weights and summing them.

weights = np.random.rand(X.shape[1])
prediction = np.dot(sample_vector, weights)
prediction

This simple operation is what drives linear regression and logistic regression.


Real-World Meaning

Think of weights as importance scores.

If house size has a higher weight than age, the model considers size more important than age when predicting price.

Linear algebra is how machines combine information logically.


Matrix Operations in Training

During training, models repeatedly:

Compare predictions with actual values, Measure error, Adjust weights using gradients.

All these steps are matrix operations performed efficiently by computers.


Why You Don’t Manually Do the Math

You do not manually compute matrices for large datasets. Libraries like NumPy and scikit-learn do it for you.

But understanding what they do internally helps you debug, optimize, and trust your models.


Mini Practice

Look at a single row in the dataset.

Ask yourself:

How many features does this vector have? How would changing one value affect prediction?


Exercises

Exercise 1:
Why is a dataset represented as a matrix?

Because models operate on multiple data points and features simultaneously using matrix operations.

Exercise 2:
What does a weight represent in a model?

It represents the importance of a feature in making predictions.

Exercise 3:
Why is dot product important?

It combines feature values with weights to generate predictions.

Quick Quiz

Q1. Do ML models understand columns and rows like humans?

No. They understand numerical vectors and matrices.

Q2. Is linear algebra only used in linear models?

No. All ML and deep learning models rely on linear algebra.

In the next lesson, we move from vectors and matrices into Probability for Machine Learning, which explains uncertainty and decision-making.