DataBase Design Lesson 30 – Partitioning Concepts | Dataplexa
Physical Design · Lesson 30

Partitioning Concepts

Master database partitioning strategies to split massive tables across multiple storage locations, dramatically improving BookVault's query performance and data management at enterprise scale.

Why Partition Tables?

Your BookVault database starts small. Then reality hits. Ten million orders. Fifty million books sold. Daily queries that scan entire tables. The database crawls to a halt.

Partitioning solves this by splitting one massive table into smaller, manageable pieces called partitions. Each partition holds a subset of the data based on specific rules you define.

Think of it like organizing a massive library. Instead of one enormous room with millions of books, you create separate wings: Fiction Wing, Science Wing, History Wing. Finding a specific book becomes dramatically faster when you know which wing to search.

Without Partitioning

Single orders table with 50M rows. Every query scans the entire table. Backup takes 8 hours. Index rebuilds lock the entire system.

With Partitioning

Orders split by year: 12 partitions of 4M rows each. Queries hit only relevant partitions. Backup one partition in minutes.

Real-World Impact

Netflix partitions viewing data by region and time. A query for "US viewers last month" only touches the relevant partitions, not the global dataset. Result: millisecond responses instead of minutes.

Types of Partitioning

Database systems offer several partitioning strategies. Each works best for different data patterns and query types.

1
Range Partitioning
2
List Partitioning
3
Hash Partitioning

Range Partitioning

Split data based on value ranges. Perfect for dates, ages, prices, or any ordered data. Most common type because it matches how humans think about data.

-- Range partition BookVault orders by year
-- Each partition holds one year of order data
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    status VARCHAR(20),
    total_amount DECIMAL(10,2)
)
PARTITION BY RANGE (YEAR(order_date)) (
    PARTITION p2021 VALUES LESS THAN (2022),
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);
Table created.

Partitions created:
- p2021: orders from 2021
- p2022: orders from 2022  
- p2023: orders from 2023
- p2024: orders from 2024
- p_future: any future dates

What just happened?

The database automatically routes each INSERT to the correct partition based on order_date. Query "WHERE order_date >= '2023-01-01'" only scans the p2023 and p2024 partitions. Try this: Add a query with a date range to see partition pruning in action.

List Partitioning

Assign specific values to specific partitions. Great for categories, regions, or any discrete values. Gives you complete control over data placement.

-- List partition BookVault books by category
-- Popular categories get dedicated partitions
CREATE TABLE books (
    book_id INT PRIMARY KEY,
    title VARCHAR(255),
    author_id INT,
    category_id INT,
    price DECIMAL(8,2),
    stock_qty INT,
    published_year INT
)
PARTITION BY LIST (category_id) (
    PARTITION p_fiction VALUES IN (1, 2, 3),     -- Fiction, Romance, Mystery  
    PARTITION p_tech VALUES IN (4, 5),           -- Programming, Science
    PARTITION p_business VALUES IN (6, 7, 8),    -- Business, Finance, Marketing
    PARTITION p_other VALUES IN (DEFAULT)        -- Everything else
);
Table created.

Partitions created:
- p_fiction: Fiction books (categories 1,2,3)
- p_tech: Technical books (categories 4,5)
- p_business: Business books (categories 6,7,8)
- p_other: All remaining categories

What just happened?

Books automatically route to partitions based on category_id. Search for "Programming books" only scans the p_tech partition. The p_other partition catches any category not explicitly listed. Try this: Query books by category to see partition elimination working.

Hash Partitioning

Database applies a hash function to distribute data evenly across partitions. Perfect when you need balanced partition sizes but don't care which partition holds specific records.

-- Hash partition BookVault customers across 4 partitions
-- Ensures even distribution regardless of customer patterns
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100),
    city VARCHAR(50),
    created_at TIMESTAMP
)
PARTITION BY HASH (customer_id)
PARTITIONS 4;
Table created.

Partitions created:
- p0: ~25% of customers
- p1: ~25% of customers  
- p2: ~25% of customers
- p3: ~25% of customers

Distribution based on hash of customer_id

What just happened?

The database hashes customer_id values and assigns each row to a partition. Customer #1001 might land in partition p2, while #1002 goes to p0. Guarantees roughly equal partition sizes. Try this: Insert customers and check which partition they land in.

Partition Pruning

Here's where partitioning pays off. Partition pruning means the database automatically ignores irrelevant partitions during query execution.

Your WHERE clause determines which partitions get scanned. Smart queries touch only the necessary partitions. Poor queries scan everything.

Good Query

WHERE order_date >= '2023-01-01'
Scans only 2023+ partitions

Better Query

WHERE order_date = '2023-06-15'
Scans only p2023 partition

Perfect Query

WHERE order_date BETWEEN '2023-06-01' AND '2023-06-30'
Single partition scan

Bad Query

WHERE customer_id = 1001
Scans all partitions

Why does the bad query scan everything? Because the orders table is partitioned by order_date, not customer_id. The database can't eliminate any partitions without checking the partition key.

-- Check partition pruning with EXPLAIN
-- Shows which partitions MySQL will scan
EXPLAIN PARTITIONS
SELECT order_id, total_amount 
FROM orders 
WHERE order_date BETWEEN '2023-06-01' AND '2023-06-30';
+----+-------------+--------+------------+------+
| id | select_type | table  | partitions | rows |
+----+-------------+--------+------------+------+
|  1 | SIMPLE      | orders | p2023      |  847 |
+----+-------------+--------+------------+------+

Only p2023 partition scanned!
Total rows examined: 847 (not 50 million)

Data Insight

Uber partitions ride data by city and date. A query for "San Francisco rides last Tuesday" examines less than 0.1% of their global trip data, delivering results in under 50 milliseconds.

Maintenance Operations

Partitioned tables require different maintenance strategies. The good news? You can maintain individual partitions without affecting the entire table.

Adding New Partitions

Range partitions need new partitions as data grows. Create them before you need them, or your database will reject new data.

-- Add 2025 partition to BookVault orders
-- Must be done before 2025 data arrives
ALTER TABLE orders 
ADD PARTITION (
    PARTITION p2025 VALUES LESS THAN (2026)
);
Partition added successfully.

Available partitions:
- p2021, p2022, p2023, p2024, p2025, p_future

Ready for 2025 order data.

Dropping Old Partitions

Dropping a partition removes all its data instantly. Much faster than DELETE statements, but irreversible. Perfect for data retention policies.

-- Remove 2021 orders to comply with 3-year retention policy  
-- Instantly deletes all 2021 data
ALTER TABLE orders 
DROP PARTITION p2021;
Partition dropped successfully.
Rows deleted: 4,200,000
Time: 0.3 seconds

Remaining partitions:
- p2022, p2023, p2024, p2025, p_future

Backup Before Dropping

DROP PARTITION is instant and irreversible. Always backup the partition first with mysqldump --where="YEAR(order_date)=2021" orders or similar before dropping.

Choosing Your Strategy

Not every table needs partitioning. Wrong partitioning can make performance worse, not better. Here's when each type makes sense.

Partition Type Best For BookVault Example Query Pattern
Range Time series, ordered data Orders by date, books by year Date ranges, recent data
List Categories, regions, fixed values Books by category, customers by region Filter by specific categories
Hash Even distribution, parallel processing Customer data, user sessions Random access, bulk operations

Pro tip: Start with range partitioning by date. Most business queries filter by time ranges, making it the most universally useful partitioning strategy. You can always add secondary partitioning later.

When NOT to Partition

Skip partitioning if your table is under 10 million rows, your queries don't filter on potential partition keys, or you need complex JOIN operations across partitions. Partitioning adds complexity that small datasets don't need.

Honestly, most performance problems come from missing indexes, not table size. Fix your indexing strategy first, then consider partitioning for truly massive datasets.

Quiz

1. BookVault's orders table has 25 million rows and most queries filter by date ranges like "last month" or "Q4 2023". What's the best partitioning strategy?


2. Your books table is partitioned by category_id, but you run SELECT * FROM books WHERE author_id = 42. What happens during query execution?


3. BookVault needs to delete all orders from 2020 to comply with data retention policies. The orders table is range partitioned by year. What's the most efficient approach?


Up Next

Designing for Performance

Combine indexing, partitioning, and query optimization techniques to build BookVault databases that handle millions of users without breaking a sweat.