DataBase Design Lesson 26 – Referential Integrity | Dataplexa
Normalization · Lesson 26

Referential Integrity

Master foreign key constraints and data consistency rules that prevent orphan records and maintain reliable relationships between tables.

What Is Referential Integrity

Referential integrity is a database rule that ensures relationships between tables remain valid. Think of it as a safety net that prevents you from creating orphan records — data that references something that doesn't exist.

In BookVault, what happens if someone deletes an author while their books still exist? Without referential integrity, you'd have books pointing to an author ID that no longer exists. That's a broken relationship — and broken data means broken applications.

Broken Data Example
BookVault has 500 books by author_id = 42. Someone deletes author 42. Now 500 books reference a ghost author. Customer searches crash. Reports fail. The website breaks.

Referential integrity prevents this nightmare. It's enforced through FOREIGN KEY constraints that automatically check every insert, update, and delete operation.

Without Referential Integrity

Orphan records, broken relationships, application crashes, data corruption, manual cleanup required

With Referential Integrity

Automatic validation, consistent data, prevented deletions, controlled cascades, reliable applications

Foreign Key Constraints

A foreign key constraint is the mechanism that enforces referential integrity. It creates a link between two tables and ensures the foreign key value always matches a primary key in the referenced table.

Here's how to add foreign key constraints to BookVault tables. Notice how each constraint has a specific name — this makes debugging much easier when constraints are violated:

-- Create authors table first (referenced table)
CREATE TABLE authors (
    author_id INT PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    nationality VARCHAR(50)
);

-- Create categories table with self-reference
CREATE TABLE categories (
    category_id INT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    parent_category_id INT,
    CONSTRAINT fk_category_parent 
        FOREIGN KEY (parent_category_id) 
        REFERENCES categories(category_id)
);

-- Create books table with foreign key constraints
CREATE TABLE books (
    book_id INT PRIMARY KEY,
    title VARCHAR(200) NOT NULL,
    author_id INT NOT NULL,
    category_id INT NOT NULL,
    price DECIMAL(10,2) NOT NULL,
    stock_qty INT DEFAULT 0,
    published_year INT,
    CONSTRAINT fk_book_author 
        FOREIGN KEY (author_id) 
        REFERENCES authors(author_id),
    CONSTRAINT fk_book_category 
        FOREIGN KEY (category_id) 
        REFERENCES categories(category_id)
);
Query OK, 0 rows affected (0.12 sec)
Query OK, 0 rows affected (0.08 sec)  
Query OK, 0 rows affected (0.15 sec)

Tables created with foreign key constraints:
- books.author_id → authors.author_id
- books.category_id → categories.category_id  
- categories.parent_category_id → categories.category_id

What just happened?

We created named foreign key constraints. Each CONSTRAINT fk_book_author creates a rule that validates every insert and update. Try this: attempt to insert a book with author_id = 999 when that author doesn't exist — the database will reject it.

Foreign key constraints work in both directions. You can't insert a book with a non-existent author_id. And you can't delete an author that has books — unless you specify what should happen to those books.

Constraint Violation Scenarios

Understanding when foreign key constraints get violated helps you design better applications. There are four main scenarios where violations occur:

1
2
3
4
INSERT Violation
UPDATE Violation
DELETE Violation
Prevention Strategy

Let's see these violations in action with BookVault data:

-- First, insert valid reference data
INSERT INTO authors (author_id, first_name, last_name, nationality)
VALUES (1, 'Stephen', 'King', 'American'),
       (2, 'J.K.', 'Rowling', 'British');

INSERT INTO categories (category_id, name, parent_category_id)
VALUES (1, 'Fiction', NULL),
       (2, 'Horror', 1),
       (3, 'Fantasy', 1);

-- VIOLATION 1: INSERT with non-existent foreign key
INSERT INTO books (book_id, title, author_id, category_id, price)
VALUES (1, 'The Shining', 999, 2, 15.99);
Query OK, 2 rows affected (0.05 sec)
Query OK, 3 rows affected (0.04 sec)

ERROR 1452 (23000): Cannot add or update a child row: 
a foreign key constraint fails (`bookvault`.`books`, 
CONSTRAINT `fk_book_author` FOREIGN KEY (`author_id`) 
REFERENCES `authors` (`author_id`))

Perfect! The database rejected our attempt to insert a book with author_id = 999 because that author doesn't exist. The error message even tells us exactly which constraint was violated.

-- INSERT valid book first
INSERT INTO books (book_id, title, author_id, category_id, price)
VALUES (1, 'The Shining', 1, 2, 15.99);

-- VIOLATION 2: UPDATE to non-existent foreign key
UPDATE books 
SET author_id = 888 
WHERE book_id = 1;

-- VIOLATION 3: DELETE referenced parent record
DELETE FROM authors WHERE author_id = 1;
Query OK, 1 row affected (0.03 sec)

ERROR 1452 (23000): Cannot add or update a child row: 
a foreign key constraint fails (fk_book_author)

ERROR 1451 (23000): Cannot delete or update a parent row: 
a foreign key constraint fails (fk_book_author)

What just happened?

Two different constraint violations occurred. The UPDATE failed because author 888 doesn't exist (child row violation). The DELETE failed because author 1 has books (parent row violation). Both operations were automatically rejected to maintain data integrity.

Cascade Operations

Sometimes you want to allow deletions and updates to cascade through relationships. CASCADE options tell the database what to do when a referenced record changes or gets deleted.

There are several cascade options, each with different behavior patterns. Choose carefully — some operations are irreversible and can delete massive amounts of data:

ON DELETE CASCADE

Automatically deletes child records when parent is deleted. Dangerous but sometimes necessary.

ON DELETE SET NULL

Sets foreign key to NULL when parent is deleted. Column must allow NULL values.

ON UPDATE CASCADE

Updates foreign key when referenced primary key changes. Usually safe operation.

ON DELETE RESTRICT

Prevents deletion of referenced records. Default behavior for safety.

Let's rebuild some BookVault tables with cascade options. Notice how different relationships need different cascade behaviors:

-- Drop and recreate tables with cascade options
DROP TABLE IF EXISTS order_items, orders, books;

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date DATE NOT NULL,
    status VARCHAR(20) DEFAULT 'pending',
    total_amount DECIMAL(10,2)
);

CREATE TABLE order_items (
    order_item_id INT PRIMARY KEY,
    order_id INT NOT NULL,
    book_id INT NOT NULL,
    quantity INT NOT NULL,
    unit_price DECIMAL(10,2) NOT NULL,
    CONSTRAINT fk_orderitem_order 
        FOREIGN KEY (order_id) 
        REFERENCES orders(order_id) 
        ON DELETE CASCADE,  -- Delete order items when order is deleted
    CONSTRAINT fk_orderitem_book 
        FOREIGN KEY (book_id) 
        REFERENCES books(book_id) 
        ON DELETE RESTRICT  -- Prevent book deletion if in orders
        ON UPDATE CASCADE   -- Update book_id references automatically
);
Query OK, 0 rows affected (0.08 sec)
Query OK, 0 rows affected (0.12 sec)

Foreign key constraints created with cascade options:
- order_items.order_id: CASCADE on delete
- order_items.book_id: RESTRICT on delete, CASCADE on update

Now let's test the cascade behavior. When we delete an order, all its items should automatically be deleted too:

-- Insert test data
INSERT INTO orders (order_id, customer_id, order_date, total_amount)
VALUES (100, 1, '2024-01-15', 47.98);

INSERT INTO books (book_id, title, author_id, category_id, price)
VALUES (10, 'It', 1, 2, 18.99),
       (11, 'Pet Sematary', 1, 2, 16.99);

INSERT INTO order_items (order_item_id, order_id, book_id, quantity, unit_price)
VALUES (1, 100, 10, 1, 18.99),
       (2, 100, 11, 1, 16.99);

-- Check current data
SELECT 'Order Items' as table_name, COUNT(*) as count FROM order_items
UNION ALL
SELECT 'Orders', COUNT(*) FROM orders;

-- Delete the order - should cascade to order_items
DELETE FROM orders WHERE order_id = 100;

-- Check data after cascade delete
SELECT 'Order Items After Delete' as table_name, COUNT(*) as count FROM order_items
UNION ALL
SELECT 'Orders After Delete', COUNT(*) FROM orders;
Query OK, 1 row affected (0.02 sec)
Query OK, 2 rows affected (0.03 sec)
Query OK, 2 rows affected (0.02 sec)

table_name          count
Order Items            2
Orders                 1

Query OK, 1 row affected (0.04 sec)

table_name               count
Order Items After Delete     0
Orders After Delete          0

CASCADE worked perfectly! Deleting order 100 automatically deleted its 2 order items. This maintains data consistency — no orphan order items exist. But be careful: CASCADE deletes can remove large amounts of data in complex relationship chains.

Common Pitfalls and Solutions

Referential integrity catches many database problems, but it can also create new challenges. Here are the most common issues developers face when working with foreign key constraints:

Circular Dependencies

Sometimes tables need to reference each other, creating a chicken-and-egg problem. You can't insert into either table because each requires the other to exist first.

Problem: Authors and Books Circle

What if books table needs author_id, but authors table needs featured_book_id? Neither record can be inserted first because each references the other.

Solution: Make one of the foreign keys nullable and use a two-step insert process:

-- Create tables with one nullable foreign key
CREATE TABLE authors_extended (
    author_id INT PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    featured_book_id INT NULL  -- Nullable to break circular dependency
);

CREATE TABLE books_extended (
    book_id INT PRIMARY KEY,
    title VARCHAR(200) NOT NULL,
    author_id INT NOT NULL,
    CONSTRAINT fk_book_author_ext 
        FOREIGN KEY (author_id) 
        REFERENCES authors_extended(author_id)
);

-- Add foreign key constraint after table creation
ALTER TABLE authors_extended 
ADD CONSTRAINT fk_author_featured_book 
    FOREIGN KEY (featured_book_id) 
    REFERENCES books_extended(book_id);

-- Two-step insert process
INSERT INTO authors_extended (author_id, first_name, last_name)
VALUES (1, 'Stephen', 'King');

INSERT INTO books_extended (book_id, title, author_id)
VALUES (100, 'The Stand', 1);

UPDATE authors_extended 
SET featured_book_id = 100 
WHERE author_id = 1;
Query OK, 0 rows affected (0.08 sec)
Query OK, 0 rows affected (0.06 sec)
Query OK, 0 rows affected (0.12 sec)
Query OK, 1 row affected (0.03 sec)
Query OK, 1 row affected (0.02 sec)
Query OK, 1 row affected (0.04 sec)

Circular dependency resolved:
- Author 1 created first (featured_book_id = NULL)  
- Book 100 created (references author 1)
- Author 1 updated (featured_book_id = 100)

Import Order Problems

When importing data or running migration scripts, you must insert parent records before child records. This becomes complex with many tables and relationships.

Common Mistake

Running INSERT INTO books before INSERT INTO authors will fail. Always import in dependency order: authors → categories → books → orders → order_items.

Performance Impact

Every foreign key constraint adds a lookup operation on insert/update. With millions of records, this can slow down bulk operations significantly.

Solution: Temporarily disable constraint checking during large imports, then re-enable:

-- MySQL syntax for disabling foreign key checks
SET FOREIGN_KEY_CHECKS = 0;

-- Run your bulk imports here
INSERT INTO books (book_id, title, author_id, category_id, price)
VALUES (1000, 'Bulk Import Book', 999, 888, 19.99);  -- Even with invalid FKs

-- Re-enable foreign key checks
SET FOREIGN_KEY_CHECKS = 1;

-- Verify data integrity after import
-- This will fail because author 999 doesn't exist
UPDATE books SET title = 'Updated Title' WHERE book_id = 1000;
Query OK, 0 rows affected (0.01 sec)
Query OK, 1 row affected (0.02 sec)
Query OK, 0 rows affected (0.01 sec)

ERROR 1452 (23000): Cannot add or update a child row: 
a foreign key constraint fails (fk_book_author)

Warning: Disable Constraints Carefully

Disabling foreign key checks allows invalid data to enter your database. Always validate data integrity immediately after re-enabling constraints. Clean up any orphan records before they cause application problems.

Referential Integrity Best Practices

Building reliable databases requires following proven patterns for referential integrity. These practices prevent common problems and make your database easier to maintain:

Database Insight
Netflix uses thousands of foreign key constraints across their content database. Every movie, episode, actor, and rating is connected through referential integrity rules. This prevents impossible data like episodes belonging to non-existent series.

1. Name Your Constraints

Always use descriptive names for foreign key constraints. When a constraint violation occurs, the error message will include the constraint name. fk_book_author tells you exactly what's wrong, while books_ibfk_1 tells you nothing.

2. Choose CASCADE Options Carefully

Default to RESTRICT for safety. Only use CASCADE when you're absolutely certain child records should be deleted. For BookVault, cascading order deletion to order_items makes sense, but cascading author deletion to books would be destructive.

3. Document Your Relationships

Create a reference document showing all foreign key relationships and their cascade behaviors. This becomes crucial when multiple developers work on the same database or when troubleshooting production issues.

4. Test Constraint Violations

Write test cases that deliberately try to violate referential integrity. Your application should handle constraint violation errors gracefully, not crash or show cryptic database errors to users.

-- Create a comprehensive view of all foreign key relationships
SELECT 
    TABLE_NAME as child_table,
    COLUMN_NAME as foreign_key_column,
    CONSTRAINT_NAME as constraint_name,
    REFERENCED_TABLE_NAME as parent_table,
    REFERENCED_COLUMN_NAME as parent_column
FROM information_schema.KEY_COLUMN_USAGE 
WHERE REFERENCED_TABLE_NAME IS NOT NULL 
    AND TABLE_SCHEMA = 'bookvault'
ORDER BY TABLE_NAME, CONSTRAINT_NAME;
child_table    foreign_key_column   constraint_name        parent_table   parent_column
books          author_id            fk_book_author         authors        author_id
books          category_id          fk_book_category       categories     category_id  
categories     parent_category_id   fk_category_parent     categories     category_id
order_items    book_id              fk_orderitem_book      books          book_id
order_items    order_id             fk_orderitem_order     orders         order_id

5 foreign key relationships found in BookVault database

What just happened?

We queried the information_schema to see all foreign key relationships in BookVault. This system view shows exactly which columns reference which tables — perfect for documentation and debugging. Save this query for every database project.

Referential integrity is not optional for production databases. The short-term convenience of skipping foreign key constraints leads to long-term data corruption that's expensive and time-consuming to fix. Better to design constraints properly from the start than to clean up orphan records later.

Quiz

1. What happens when you add `ON DELETE CASCADE` to the foreign key constraint between order_items.order_id and orders.order_id in BookVault?


2. You try to insert a new book with `INSERT INTO books (book_id, title, author_id, category_id, price) VALUES (50, 'New Book', 500, 1, 24.99)` but author_id 500 doesn't exist in the authors table. What will happen?


3. You have a circular dependency where authors table needs featured_book_id and books table needs author_id. How should you solve this in BookVault?


Up Next

Logical Schema Design

Combine normalization rules and referential integrity into complete logical database designs that serve as blueprints for physical implementation.