Normalization
As databases grow, poor design can lead to data duplication, inconsistencies, and maintenance problems.
Normalization is the process of organizing data to reduce redundancy and improve data integrity.
What is Normalization?
Normalization is a set of rules (called normal forms) used to design efficient and reliable databases.
The main goals of normalization are:
- Eliminate duplicate data
- Ensure data consistency
- Improve data integrity
- Make databases easier to maintain
Why Normalization Matters
Without normalization, databases suffer from:
- Update anomalies
- Insert anomalies
- Delete anomalies
Normalization prevents these issues.
Unnormalized Example
Consider an unnormalized table:
+----+----------+-----------------------+ | id | name | courses | +----+----------+-----------------------+ | 1 | Alice | SQL, Python, Excel | | 2 | Bob | SQL, Power BI | +----+----------+-----------------------+
Problems:
- Multiple values in one column
- Difficult to search and update
- Not scalable
First Normal Form (1NF)
A table is in First Normal Form (1NF) if:
- Each column contains atomic (indivisible) values
- No repeating groups or multi-valued attributes
Converted to 1NF:
+----+----------+--------+ | id | name | course | +----+----------+--------+ | 1 | Alice | SQL | | 1 | Alice | Python | | 1 | Alice | Excel | | 2 | Bob | SQL | | 2 | Bob | Power BI | +----+----------+--------+
Second Normal Form (2NF)
A table is in Second Normal Form (2NF) if:
- It is already in 1NF
- All non-key columns depend on the full primary key
Problem in the 1NF table:
- Student name depends only on id
- Course depends on id + course relationship
Split into separate tables:
Students +----+----------+ | id | name | +----+----------+ Enrollments +----+--------+ | id | course | +----+--------+
Third Normal Form (3NF)
A table is in Third Normal Form (3NF) if:
- It is already in 2NF
- No transitive dependencies exist
Example of transitive dependency:
+----+----------+-----------+-----------+ | id | name | dept_id | dept_name | +----+----------+-----------+-----------+
Here:
- dept_name depends on dept_id
- Not directly on student id
Fix by separating:
Students +----+----------+---------+ Departments +---------+-----------+
Normalization Summary
| Normal Form | Rule |
|---|---|
| 1NF | Atomic values, no repeating groups |
| 2NF | Full dependency on primary key |
| 3NF | No transitive dependencies |
Normalization vs Performance
Highly normalized databases:
- Reduce redundancy
- Increase consistency
But may require:
- More JOINs
- Careful indexing
In practice, a balance is maintained.
Common Beginner Mistakes
- Over-normalizing every table
- Ignoring query performance
- Not defining primary keys
- Mixing unrelated data
When Should You Normalize?
Normalize when:
- Designing a new database
- Handling relational data
- Data integrity is critical
What’s Next?
In the next lesson, we will explore Constraints, which enforce rules on data in SQL tables.