Principal Component Analysis (PCA)
In the previous lesson, you learned how Factor Analysis identifies hidden underlying factors.
Principal Component Analysis (PCA) also reduces many variables into fewer components, but with a different objective.
PCA focuses on data compression and variance preservation, not on uncovering latent constructs.
Why PCA Is Used
Modern datasets often contain many correlated variables.
This creates problems such as:
- Redundancy in information
- Difficulty in visualization
- Multicollinearity in regression
PCA transforms the original variables into a smaller set of new variables called principal components.
Core Idea Behind PCA
Each principal component is:
- A linear combination of original variables
- Uncorrelated with other components
- Ordered by the amount of variance explained
The first component explains the maximum possible variance, the second explains the next most, and so on.
PCA vs Factor Analysis
| Aspect | PCA | Factor Analysis |
|---|---|---|
| Main goal | Variance preservation | Latent structure identification |
| Focus | Data reduction | Underlying factors |
| Error modeling | Does not separate error | Separates common & unique variance |
In practice, PCA is often used as a preprocessing step, while factor analysis is used for theory building.
Eigenvalues and Variance Explained
In PCA, each component has an eigenvalue, which represents the amount of variance explained.
Common rule:
- Eigenvalue > 1 → retain the component
SPSS also provides a scree plot to visually decide the number of components.
Example Scenario
Suppose we collect data on:
- Math score
- Science score
- English score
- Logic score
These scores are correlated. PCA can reduce them into one or two components representing overall academic performance.
Running PCA in SPSS (Menu)
To perform PCA in SPSS:
- Go to Analyze → Dimension Reduction → Factor
- Select variables
- Choose Principal Components as extraction
- Check Eigenvalues greater than 1
- View scree plot
- Click OK
SPSS Syntax for PCA
FACTOR
/VARIABLES Math Science English Logic
/MISSING LISTWISE
/ANALYSIS Math Science English Logic
/EXTRACTION PC
/CRITERIA MINEIGEN(1)
/ROTATION NONE.
Interpreting PCA Output
When interpreting PCA results:
- Look at eigenvalues
- Check percentage of variance explained
- Review component loadings
Higher loadings indicate stronger contribution of a variable to a component.
Common Mistakes
Typical mistakes include:
- Confusing PCA with Factor Analysis
- Keeping too many components
- Ignoring scree plot
PCA is a mathematical technique, not a theory-driven model.
Quiz 1
What is the main goal of PCA?
To preserve maximum variance with fewer components.
Quiz 2
What does an eigenvalue represent?
Amount of variance explained by a component.
Quiz 3
Are PCA components correlated?
No.
Quiz 4
Which SPSS menu is used for PCA?
Analyze → Dimension Reduction → Factor.
Quiz 5
Does PCA identify latent constructs?
No.
Mini Practice
Use a dataset with multiple correlated variables.
Apply PCA and:
- Decide number of components
- Report variance explained
Use eigenvalues and scree plot to justify component selection.
What’s Next
In the next lesson, you will learn about Reliability and Validity, which ensure measurement quality in research and analytics.