DataBase Design Lesson 18 – What is Normalization | Dataplexa
Database Design · Lesson 18

What is Normalization

Transform messy, redundant data into clean, organized tables that eliminate anomalies and waste

The Problem with Bad Data Structure

Picture this: BookVault starts simple. One giant table holds everything. Customer names, book titles, author details, order information — all crammed together. Sounds efficient, right?

Wrong. This approach creates data anomalies — expensive mistakes that corrupt your database. Amazon learned this the hard way in their early days. Netflix too. Every major tech company has horror stories about denormalized data causing millions in losses.

Before — Unnormalized BookVault Data

Customer Email Book Title Author Name Author Nationality Price
Sarah Chen sarah@email.com Clean Code Robert Martin American $45.00
Sarah Chen sarah@email.com Design Patterns Gang of Four Various $52.00
Mike Johnson mike@email.com Clean Code Robert Martin American $45.00

See the problems? Sarah Chen's email appears twice. Robert Martin's nationality is duplicated. What happens when Sarah changes her email address? You must update multiple rows. Miss one? Your data becomes inconsistent.

Update Anomaly Example

Robert Martin moves to Canada. You update his nationality in row 1 but forget row 3. Now your database says he's both American AND Canadian. Which is correct? Nobody knows.

What is Normalization?

Normalization is the process of organizing data to eliminate redundancy and anomalies. Think of it as Marie Kondo for databases — every piece of data has one specific place where it belongs.

The process follows mathematical rules called normal forms. Each form builds on the previous one. Most databases need to reach Third Normal Form (3NF) to avoid major problems.

1

Identify Problems

Find redundant data and anomalies

2

Apply Normal Forms

Follow 1NF → 2NF → 3NF rules

3

Create Separate Tables

Split data into focused, single-purpose tables

4

Connect with Keys

Use foreign keys to maintain relationships

Functional Dependencies Drive the Process

Before normalizing, you must understand functional dependencies. This sounds complex but it's simple: if knowing value A lets you determine value B, then B depends on A.

In BookVault: knowing a customer_id determines the customer's email and city. Knowing a book_id determines the book's title and price.

Good Dependencies

customer_id → email
book_id → title
order_id → order_date

Problematic Dependencies

email → customer_id
title → book_id
Mixed dependencies in one table

The Three Types of Data Anomalies

Unnormalized databases suffer from three specific problems. Understanding these helps you recognize when normalization is needed.

Update Anomalies

When the same fact appears in multiple places, updating becomes dangerous. Change an author's nationality in one row but miss another? Your database now contains conflicting information. No way to know which version is correct.

-- This update only fixes ONE row - others remain incorrect
-- Problem: author nationality stored in multiple places
UPDATE unnormalized_orders 
SET author_nationality = 'Canadian'
WHERE author_name = 'Robert Martin' 
  AND customer_email = 'sarah@email.com';
1 row affected

-- But Robert Martin still shows as 'American' in other rows!
-- Database now has conflicting data

What just happened?

The UPDATE statement only modified one row containing Robert Martin's data. Other rows with the same author still show the old nationality. Your database now contains contradictory information about the same person.

Insert Anomalies

Want to add a new author who hasn't written any books yet? Impossible in our unnormalized table. The table requires book information for every row. You can't store author details without fake book data.

-- Trying to add a new author without books
-- This fails because book_title cannot be NULL
INSERT INTO unnormalized_orders (
  author_name, 
  author_nationality,
  book_title,
  customer_email
) VALUES (
  'Jane Developer',
  'Canadian', 
  NULL,  -- This causes an error!
  NULL
);
ERROR: Column 'book_title' cannot be null
ERROR: Column 'customer_email' cannot be null

-- Can't add author without complete order information

What just happened?

The database rejected our INSERT because the table structure forces us to provide book and customer data whenever we add an author. This prevents storing authors who haven't published books yet — a serious business limitation.

Delete Anomalies

The most dangerous problem. Delete a customer's last order and you accidentally erase the author's information forever. The author data wasn't meant to be deleted — it just lived in the wrong place.

-- Customer Sarah cancels her last order
-- This deletes ALL information about Robert Martin!
DELETE FROM unnormalized_orders 
WHERE customer_email = 'sarah@email.com' 
  AND book_title = 'Clean Code';
1 row deleted

-- Robert Martin's nationality data is now LOST forever
-- unless it exists in other rows

Critical Data Loss

Deleting Sarah's order accidentally removed Robert Martin's author details. If this was the only row containing his nationality, that information is permanently lost. This is why major e-commerce sites spend millions on data recovery systems.

The Normalized Solution

Normalization fixes these problems by separating concerns. Each table holds one type of entity. Dependencies become clear. Anomalies disappear.

After — Normalized BookVault Schema

CUSTOMERS
Attributes
PK customer_id
first_name
last_name
email
city
AUTHORS
Attributes
PK author_id
first_name
last_name
nationality
BOOKS
Attributes
PK book_id
FK author_id
title
price

Now each fact appears exactly once. Update Robert Martin's nationality? Change one row in the authors table. Every book by him automatically reflects the new data. One source of truth.

-- Now we can update author nationality in ONE place
-- All books by this author automatically show the new data
UPDATE authors 
SET nationality = 'Canadian'
WHERE first_name = 'Robert' 
  AND last_name = 'Martin';
1 row affected

-- Perfect! Now ALL of Robert Martin's books 
-- automatically show 'Canadian' nationality

Data Insight

Spotify's music database contains over 70 million songs. Without proper normalization, updating a single artist's name would require changing millions of rows. With normalization, it's just one update in the artists table.

When NOT to Normalize

Normalization isn't always the answer. Some scenarios benefit from denormalization — intentionally keeping redundant data for performance.

Normalize When

• Heavy write operations
• Data consistency critical
• Storage space expensive
• OLTP systems

Consider Denormalizing

• Read-heavy reporting
• Complex join queries slow
• Data warehouse systems
• Analytics dashboards

Netflix keeps movie ratings denormalized in their recommendation engine. The same rating appears in multiple tables. Why? Because calculating personalized recommendations for 200 million users requires blazing-fast reads. They accept redundancy for speed.

Start Normalized, Denormalize Later

Always begin with a fully normalized design. Add strategic denormalization only when performance metrics prove it's needed. Premature denormalization causes more problems than it solves.

The Normal Forms Hierarchy

Normalization follows a progression. Each normal form builds on the previous one. Most business applications need Third Normal Form (3NF) for reliable operation.

1

First Normal Form

Atomic values only
No repeating groups

2

Second Normal Form

Remove partial
dependencies

3

Third Normal Form

Remove transitive
dependencies

BC

Boyce-Codd NF

Stricter version
of 3NF

Each step removes specific types of dependencies and anomalies. The next lessons dive deep into each normal form with BookVault examples.

Think of normalization as progressive decluttering. 1NF removes the obvious mess. 2NF eliminates subtle redundancies. 3NF creates a clean, maintainable structure that prevents most data problems.

Quiz

1. BookVault stores author nationality in every order row. When Robert Martin moves from USA to Canada, you must update his nationality in 15 different rows. What type of anomaly is this?


2. What is the main goal of database normalization?


3. When might BookVault consider keeping some denormalized data (intentional redundancy)?


Up Next

First Normal Form (1NF)

Transform BookVault's messy data into atomic values and eliminate repeating groups using the first normalization rule.