AI Lesson 35 – Dimensionality Reduction (PCA, LDA) | Dataplexa

Dimensionality Reduction (PCA & LDA)

Dimensionality Reduction is the process of reducing the number of input features while preserving as much important information as possible. It is widely used to simplify models, improve performance, and make data easier to visualize.

As datasets grow, the number of features also increases. Too many features often cause problems instead of improving accuracy. Dimensionality Reduction solves this.

Why Do We Need Dimensionality Reduction?

High-dimensional data creates several challenges:

Slower model training
Higher memory usage
Overfitting due to noise
Difficult visualization and interpretation

Dimensionality Reduction helps by keeping only the most informative parts of the data.

Real-World Example

Imagine judging a student based on 100 test scores. Many tests may measure similar skills. Instead of evaluating all 100 scores, you summarize them into key abilities like math, logic, and communication.

This summarization is exactly what dimensionality reduction does to data.

Types of Dimensionality Reduction

There are two main approaches:

Feature Selection: Choosing the most important existing features
Feature Extraction: Creating new features that summarize existing ones

PCA and LDA are popular feature extraction techniques.

Principal Component Analysis (PCA)

PCA is an unsupervised technique that transforms data into a new coordinate system where:

The first component captures the most variance
Each next component captures the remaining variance
Components are uncorrelated

PCA focuses on preserving maximum information, not class labels.

PCA Example


from sklearn.decomposition import PCA
from sklearn.datasets import load_iris

# Load data
X, y = load_iris(return_X_y=True)

# Apply PCA
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)

print(X_reduced.shape)

(150, 2)

The original dataset had 4 features. PCA reduced it to 2 while preserving most of the variance.

Understanding PCA Output

PCA does not care about class labels. It only tries to capture directions where data varies the most.

This makes PCA excellent for visualization and noise reduction.

Linear Discriminant Analysis (LDA)

LDA is a supervised dimensionality reduction technique. Unlike PCA, it uses class labels to maximize separation between classes.

LDA focuses on finding feature combinations that best distinguish different categories.

LDA Example


from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.datasets import load_iris

# Load data
X, y = load_iris(return_X_y=True)

# Apply LDA
lda = LinearDiscriminantAnalysis(n_components=2)
X_lda = lda.fit_transform(X, y)

print(X_lda.shape)

(150, 2)

LDA reduces features while maximizing class separation, making it useful for classification tasks.

PCA vs LDA

PCA: Unsupervised, variance-based
LDA: Supervised, class-separation-based
PCA: Used for visualization and noise reduction
LDA: Used for improving classification

When Should You Use Dimensionality Reduction?

Before clustering or classification
When features are highly correlated
When training time is high
When visualizing high-dimensional data

Practice Questions

Practice 1: What is the main goal of dimensionality reduction?

Practice 2: PCA belongs to which learning type?

Practice 3: What does LDA try to maximize?

Quick Quiz

Quiz 1: Which technique ignores class labels?

PCA
LDA
Both

Quiz 2: Which technique is supervised?

PCA
LDA
K-Means

Quiz 3: Dimensionality reduction helps reduce?

Accuracy
Overfitting
Data size only

Coming up next: Feature Engineering — transforming raw data into meaningful features for AI models.

← Previous Course Index Next →

AI Course