DataBase Design Lesson 20 – Second Normal Form(2NF) | Dataplexa
NORMALIZATION · Lesson 20

Second Normal Form (2NF)

Master partial dependencies and transform BookVault tables from 1NF to clean 2NF structure using functional dependency analysis.

What Makes Second Normal Form Special

First Normal Form gets your data atomic. Second Normal Form tackles a deeper problem: partial dependencies. When part of your composite key determines other columns, you're storing redundant data everywhere.

Think about Netflix's movie ratings table. If they stored (user_id, movie_id, rating, user_name, movie_title), what happens? User "Sarah Chen" appears in thousands of rows. Change her name once? Update thousands of records. That's a partial dependency nightmare.

Second Normal Form demands: "If a table has a composite primary key, every non-key column must depend on the entire key, not just part of it." Simple rule. Massive impact on data integrity.

2NF Definition

A table is in Second Normal Form when it meets 1NF requirements AND has no partial dependencies. Every non-key attribute must be functionally dependent on the complete primary key.

Understanding Partial Dependencies

Before fixing BookVault's 2NF violations, you need to spot them. A partial dependency occurs when a non-key column depends on only part of a composite primary key.

Here's BookVault's problematic order_items table from our 1NF lesson. Notice the composite key (order_id, book_id) and what depends on what:

Before — 1NF But Not 2NF

order_id book_id quantity unit_price book_title order_date
1001 201 2 24.99 The Great Gatsby 2024-01-15
1001 205 1 19.99 To Kill a Mockingbird 2024-01-15
1002 201 1 24.99 The Great Gatsby 2024-01-16

See the problem? book_title depends only on book_id, not the complete key (order_id, book_id). Same with order_date — it only depends on order_id.

Result? "The Great Gatsby" appears twice. Order date "2024-01-15" repeats. Every book in every order carries redundant information. Database bloat and update anomalies guaranteed.

Functional Dependency Analysis

Map out exactly what determines what in the problematic table. This reveals where to split:

Full Key Dependencies

quantity ← (order_id, book_id)

unit_price ← (order_id, book_id)

Partial Dependencies

book_title ← book_id only

order_date ← order_id only

The red dependencies violate 2NF. Time to decompose this table into proper 2NF structure.

BookVault 2NF Transformation

Fix partial dependencies by creating separate tables. Each table holds attributes that depend on the same key. No more redundancy, no more update anomalies.

Split the problematic table into three clean 2NF tables:

After — 2NF Structure

order_items (core relationship)

order_id book_id quantity unit_price
1001 201 2 24.99
1001 205 1 19.99

books (book details)

book_id title price
201 The Great Gatsby 24.99
205 To Kill a Mockingbird 19.99

orders (order details)

order_id customer_id order_date
1001 501 2024-01-15
1002 502 2024-01-16

Beautiful. Each table now holds attributes that depend on its complete primary key. Book title "The Great Gatsby" appears once. Order date "2024-01-15" appears once. Zero redundancy, maximum integrity.

Creating the 2NF Schema

Transform BookVault's structure into clean 2NF tables. Each table serves a single purpose with no partial dependencies:

-- BookVault 2NF: Orders table (order-level attributes only)
-- No partial dependencies - all columns depend on complete key
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date DATE NOT NULL,
    status VARCHAR(20) DEFAULT 'pending',
    total_amount DECIMAL(10,2) NOT NULL
);
Table 'orders' created successfully.
- Primary key: order_id
- Foreign key opportunity: customer_id → customers(customer_id)
- All non-key columns depend on complete primary key

What just happened?

Created orders table where every column (customer_id, order_date, status, total_amount) depends on the complete primary key (order_id). No partial dependencies possible with single-column key.

-- BookVault 2NF: Books table (book-level attributes only)
-- Book details separate from order relationships
CREATE TABLE books (
    book_id INT PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    author_id INT NOT NULL,
    category_id INT NOT NULL,
    price DECIMAL(8,2) NOT NULL,
    stock_qty INT DEFAULT 0,
    published_year INT
);
Table 'books' created successfully.
- Primary key: book_id
- Book attributes stored once, referenced by foreign keys
- No redundancy across order_items relationships
-- BookVault 2NF: Order items (pure relationship table)
-- Composite key with attributes depending on complete key
CREATE TABLE order_items (
    order_item_id INT PRIMARY KEY,
    order_id INT NOT NULL,
    book_id INT NOT NULL,
    quantity INT NOT NULL CHECK (quantity > 0),
    unit_price DECIMAL(8,2) NOT NULL,
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (book_id) REFERENCES books(book_id),
    UNIQUE (order_id, book_id)
);
Table 'order_items' created successfully.
- Primary key: order_item_id (surrogate key)
- Composite unique constraint: (order_id, book_id)
- Both quantity and unit_price depend on specific order-book combination

What just happened?

Created order_items with surrogate primary key but logical composite key (order_id, book_id). Quantity and unit_price depend on the complete logical key — no partial dependencies. Try querying: SELECT * FROM order_items JOIN books USING (book_id);

Testing 2NF Compliance

Verify your 2NF design by testing queries and updates. If you can modify book information once and see changes everywhere, you've achieved 2NF success.

-- Test 2NF: Insert sample data into normalized tables
-- Book information stored once, referenced by foreign keys
INSERT INTO orders VALUES 
(1001, 501, '2024-01-15', 'completed', 44.98),
(1002, 502, '2024-01-16', 'pending', 24.99);

INSERT INTO books VALUES 
(201, 'The Great Gatsby', 301, 101, 24.99, 50, 1925),
(205, 'To Kill a Mockingbird', 302, 101, 19.99, 30, 1960);

INSERT INTO order_items VALUES 
(1, 1001, 201, 2, 24.99),
(2, 1001, 205, 1, 19.99),
(3, 1002, 201, 1, 24.99);
3 rows inserted into orders.
2 rows inserted into books.
3 rows inserted into order_items.

Data successfully normalized - each fact stored once.
-- Test 2NF: Update book price in one place
-- Should affect all order calculations through references
UPDATE books 
SET price = 22.99 
WHERE book_id = 201;

-- Verify the change appears in joined results
SELECT o.order_id, b.title, oi.quantity, b.price as current_price, oi.unit_price as order_price
FROM order_items oi
JOIN books b ON oi.book_id = b.book_id  
JOIN orders o ON oi.order_id = o.order_id
WHERE b.book_id = 201;
1 row updated in books table.

order_id | title           | quantity | current_price | order_price
---------|-----------------|----------|---------------|------------
1001     | The Great Gatsby| 2        | 22.99         | 24.99
1002     | The Great Gatsby| 1        | 22.99         | 24.99

Book price updated once, visible across all references.

What just happened?

Updated book price once in books table, affecting all future references. Historical order_price preserved (business rule). Perfect 2NF behavior — no redundancy, consistent updates. Try this: Change a book title and see it reflect everywhere.

Common 2NF Violations and Fixes

Spot these typical 2NF problems in real applications. Each creates maintenance nightmares and data inconsistencies:

Student Grades Problem

Table: (student_id, course_id, grade, student_name, course_title)
Issue: student_name depends only on student_id

2NF Solution

Split: grades(student_id, course_id, grade)
+ students(student_id, name) + courses(course_id, title)

Employee Projects Problem

Table: (emp_id, project_id, hours, emp_dept, project_budget)
Issue: emp_dept depends only on emp_id

2NF Solution

Split: assignments(emp_id, project_id, hours)
+ employees(emp_id, dept) + projects(project_id, budget)

Common Mistake

Adding computed columns like "order_total" to order_items table. This creates partial dependency — order_total depends on order_id only, not the complete (order_id, book_id) key. Store totals in orders table instead.

The Surrogate Key Advantage

Notice how BookVault's order_items uses order_item_id as surrogate primary key? Smart move. Single-column primary keys eliminate partial dependency concerns entirely. You can't have partial dependencies when the key has only one part.

But keep logical constraints. The UNIQUE (order_id, book_id) constraint maintains business logic while the surrogate key simplifies relationships.

Data Insight

Spotify's playlists table likely uses surrogate keys to avoid 2NF issues. Instead of composite (user_id, playlist_name) primary key creating partial dependencies, they use playlist_id as surrogate. Result: 450+ million playlists without redundancy.

Your BookVault database now meets Second Normal Form requirements. No partial dependencies. No redundant data. Each update touches exactly one row. But there's still work ahead — transitive dependencies can lurk in 2NF tables, waiting to cause problems.

Quiz

1. BookVault wants to track hotel reservations with table structure (customer_id, room_id, nights, customer_name, room_type, room_rate). The composite key is (customer_id, room_id). What 2NF violation exists and how do you fix it?


2. In BookVault's order_items table with composite key (order_id, book_id), which attributes can remain in the table without violating 2NF?


3. Why do BookVault's books and orders tables automatically satisfy 2NF requirements even without analysis?


Up Next

Third Normal Form (3NF)

Eliminate transitive dependencies where non-key columns depend on other non-key columns, completing your normalization foundation.