DataBase Design Lesson 19 – First Normal Form(1NF) | Dataplexa

Normalization · Lesson 19

First Normal Form (1NF)

Transform messy, repeating data into atomic table structures that follow 1NF rules

The Problem with Repeating Data

Imagine opening Excel and dumping all BookVault customer orders into one giant spreadsheet. Each row contains everything about the order. But what happens when a customer orders multiple books? You repeat their information over and over.

This creates a nightmare. Customer email changes? You have to update fifty rows. Miss one row? Your data becomes inconsistent. Database systems can't handle this chaos efficiently.

Here's what BookVault's order data looked like before normalization:

Before — Unnormalized

customer_name	email	books_ordered
John Smith	john@email.com	Database Design, SQL Basics
Sarah Davis	sarah@email.com	Python Programming, Data Science, Machine Learning

After — 1NF Compliant

customer_name	email	book_title
John Smith	john@email.com	Database Design
John Smith	john@email.com	SQL Basics
Sarah Davis	sarah@email.com	Python Programming

Notice the difference? The "Before" table stuffs multiple books into one cell. The "After" table breaks each book into its own row. This is the foundation of First Normal Form.

What is First Normal Form (1NF)?

First Normal Form (1NF) is the most basic rule in database normalization. It eliminates repeating groups and ensures every cell contains only atomic (indivisible) values.

Think of 1NF as the "one value per cell" rule. If you can break down what's in a cell into smaller pieces, it violates 1NF. A cell containing "Database Design, SQL Basics" breaks this rule because it actually contains two separate book titles.

The Four Rules of 1NF

1. Atomic Values Only

Each cell contains one indivisible piece of data. No comma-separated lists or multiple values.

2. No Repeating Groups

Eliminate columns like book1, book2, book3. Create separate rows instead.

3. Primary Key Required

Every table must have a unique identifier to distinguish each row.

4. Column Order Irrelevant

The arrangement of columns shouldn't affect the table's meaning or functionality.

What just happened?

We broke down the essential requirements for 1NF. Rule #1 is the most violated - developers often stuff JSON strings or comma-separated values into database columns, thinking it saves space. It doesn't. It creates maintenance nightmares.

Common 1NF Violations

Database designers violate 1NF in predictable ways. Here are the patterns that cause problems in BookVault:

Violation #1: Multiple Values in One Cell

-- WRONG: Storing multiple authors in one column
-- This violates 1NF because author_names contains multiple values
CREATE TABLE books_wrong (
    book_id INT PRIMARY KEY,
    title VARCHAR(200),
    author_names VARCHAR(500),  -- "Stephen King, Peter Straub"
    price DECIMAL(10,2)
);

Table 'books_wrong' created successfully.

Sample data problems:
book_id | title           | author_names              | price
1       | The Talisman    | Stephen King, Peter Straub| 15.99
2       | It              | Stephen King              | 12.99

Issues: How do you search for books by "Stephen King"? 
How do you count books per author? Complex string parsing required.

What just happened?

We created a table that violates 1NF by storing multiple author names in a single column. Searching becomes a nightmare - you can't use simple WHERE clauses to find books by a specific author. Try this: create a proper authors table with author_id as the foreign key instead.

Violation #2: Repeating Column Groups

-- WRONG: Creating repeating columns for multiple books per order
-- This violates 1NF because of repeating groups (book1, book2, etc.)
CREATE TABLE orders_wrong (
    order_id INT PRIMARY KEY,
    customer_id INT,
    book1_id INT,
    book1_quantity INT,
    book2_id INT,
    book2_quantity INT,
    book3_id INT,
    book3_quantity INT
    -- What if customer orders 4 books? Or 10?
);

Table 'orders_wrong' created successfully.

Problems with this approach:
- Wastes space when orders have fewer than 3 books
- Can't handle orders with more than 3 books
- Complex queries to find all books in an order
- Difficult to add/remove items dynamically

Common Mistake

Developers create book1, book2, book3 columns thinking it's simpler than a separate order_items table. This approach fails when customers order more books than you planned for. Create a proper order_items table that can handle unlimited books per order.

Converting to First Normal Form

Converting unnormalized data to 1NF follows a systematic process. We'll transform BookVault's messy order data into properly structured tables.

Step 1: Identify Atomic Values

Start with the unnormalized data and break down composite values:

-- Original unnormalized BookVault order data
-- Multiple violations: composite values, repeating groups
SELECT 
    'John Smith' as customer_name,
    'john@email.com, john.smith@work.com' as email_addresses,
    'Database Design, SQL Basics, Advanced Queries' as books_ordered;

customer_name | email_addresses                    | books_ordered
John Smith    | john@email.com, john.smith@work.com| Database Design, SQL Basics, Advanced Queries

Problems identified:
- email_addresses contains 2 emails (should be separate rows)
- books_ordered contains 3 book titles (should be separate rows)
- customer_name might need first_name/last_name split

Step 2: Create Proper 1NF Tables

Now we'll create the normalized structure that follows 1NF rules:

-- CORRECT: 1NF compliant tables for BookVault
-- Each table has atomic values and a primary key

-- Customers table - one row per customer
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Customer emails - separate table for multiple emails per customer
CREATE TABLE customer_emails (
    email_id INT PRIMARY KEY,
    customer_id INT,
    email VARCHAR(100) NOT NULL,
    is_primary BOOLEAN DEFAULT FALSE,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Tables created successfully.

customers:
- customer_id (PK) - unique identifier
- first_name, last_name - atomic name components  
- created_at - timestamp value

customer_emails:
- email_id (PK) - unique identifier
- customer_id (FK) - links to customers table
- email - single atomic email address
- is_primary - boolean flag (atomic value)

-- Books and order structure - 1NF compliant
CREATE TABLE books (
    book_id INT PRIMARY KEY,
    title VARCHAR(200) NOT NULL,
    author_id INT,
    price DECIMAL(10,2) NOT NULL,
    stock_qty INT DEFAULT 0
);

-- Orders table - one row per order
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE NOT NULL,
    status VARCHAR(20) DEFAULT 'pending',
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

-- Order items - handles multiple books per order
CREATE TABLE order_items (
    order_item_id INT PRIMARY KEY,
    order_id INT,
    book_id INT,
    quantity INT NOT NULL,
    unit_price DECIMAL(10,2) NOT NULL,
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (book_id) REFERENCES books(book_id)
);

Tables created successfully.

1NF Compliance Check:
✓ All values are atomic (single, indivisible)
✓ No repeating groups (no book1, book2 columns)
✓ Each table has a primary key
✓ Column order doesn't affect functionality

order_items table eliminates the repeating groups problem:
- Can handle unlimited books per order
- Each book gets its own row
- Quantity and price are atomic values per item

Step 3: Insert Normalized Data

-- Insert data following 1NF rules
-- Notice how each previously composite value gets its own row

-- Customer data
INSERT INTO customers (customer_id, first_name, last_name) 
VALUES (1, 'John', 'Smith');

-- Multiple emails for same customer - separate rows
INSERT INTO customer_emails (email_id, customer_id, email, is_primary) 
VALUES 
    (1, 1, 'john@email.com', TRUE),
    (2, 1, 'john.smith@work.com', FALSE);

-- Books data
INSERT INTO books (book_id, title, author_id, price, stock_qty) 
VALUES 
    (1, 'Database Design', 101, 29.99, 15),
    (2, 'SQL Basics', 102, 24.99, 8),
    (3, 'Advanced Queries', 101, 34.99, 12);

-- Order and items
INSERT INTO orders (order_id, customer_id, order_date, status)
VALUES (1, 1, '2024-01-15', 'completed');

-- Each book order gets its own row in order_items
INSERT INTO order_items (order_item_id, order_id, book_id, quantity, unit_price)
VALUES 
    (1, 1, 1, 1, 29.99),  -- Database Design
    (2, 1, 2, 1, 24.99),  -- SQL Basics  
    (3, 1, 3, 1, 34.99);  -- Advanced Queries

Data inserted successfully.

Result: John Smith's order is now properly normalized:
- Customer info: 1 row in customers table
- Email addresses: 2 rows in customer_emails table  
- Order info: 1 row in orders table
- Books ordered: 3 rows in order_items table

Total: 7 rows instead of 1 messy unnormalized row
Benefits: Easy to query, update, and maintain

Data Insight

BookVault processes over 50,000 orders monthly. Before 1NF normalization, updating a customer's email required scanning every order row. After normalization, one UPDATE statement in the customer_emails table fixes it instantly across all orders.

Benefits and Trade-offs

Converting to 1NF brings immediate benefits but also introduces complexity. Understanding both sides helps you make informed design decisions.

Benefits of 1NF

Data Consistency

Update a customer's email once in the customers table - it automatically applies to all their orders. No more hunting through thousands of rows looking for duplicates.

Query Simplicity

Find all books by an author using simple WHERE clauses. No more complex string parsing or regular expressions to extract data from composite fields.

Trade-offs to Consider

More Complex Queries

Getting complete order information now requires JOINs across multiple tables. A simple SELECT becomes a 4-table JOIN. Plan your queries carefully and use proper indexing.

-- Before 1NF: Simple query (but messy data)
-- SELECT * FROM orders_unnormalized WHERE customer_name = 'John Smith'

-- After 1NF: More complex query (but clean data)
SELECT 
    c.first_name,
    c.last_name,
    o.order_date,
    b.title,
    oi.quantity,
    oi.unit_price
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id  
JOIN books b ON oi.book_id = b.book_id
WHERE c.first_name = 'John' AND c.last_name = 'Smith';

first_name | last_name | order_date | title            | quantity | unit_price
John       | Smith     | 2024-01-15 | Database Design  | 1        | 29.99
John       | Smith     | 2024-01-15 | SQL Basics       | 1        | 24.99
John       | Smith     | 2024-01-15 | Advanced Queries | 1        | 34.99

The 4-table JOIN retrieves complete order details.
Performance tip: Index the foreign key columns for faster JOINs.

When 1NF Isn't Enough

1NF solves repeating data problems but doesn't eliminate all redundancy. Notice that John's first_name and last_name still repeat in every order result row. This duplication creates update anomalies.

Second Normal Form (2NF) addresses these partial dependency issues. But 1NF is the foundation - you can't achieve higher normal forms without first satisfying 1NF requirements.

Pro Tip: Modern NoSQL databases like MongoDB store documents with nested arrays, seemingly violating 1NF. However, within the document paradigm, each array element is treated as atomic. The 1NF principles still apply within the chosen data model.

Quiz

Up Next

Second Normal Form (2NF)

Eliminate partial dependencies by ensuring every non-key attribute depends on the complete primary key.

← Previous Course Index Next →