DataBase Design Lesson 31 – Designing for Performance | Dataplexa
Physical & Enterprise · Lesson 31

Designing for Performance

Build high-performance database schemas that handle millions of queries efficiently using indexing strategies, query optimization, and architectural patterns proven by companies like Amazon and Netflix.

Performance Fundamentals

Database performance matters more than you think. A slow query taking 3 seconds instead of 50 milliseconds doesn't just annoy users — it crashes entire systems. Amazon found that every 100ms delay in page load time costs them 1% in sales. Netflix processes billions of viewing records daily. They can't afford slow databases.

Performance design happens at multiple levels. Query performance focuses on individual SQL statements. Schema performance involves table structure and relationships. System performance covers hardware, memory, and storage.

Data Insight

BookVault processes 50,000 book searches per hour during peak shopping. Without proper indexing, each search scans all 2 million books. With indexes, searches complete in under 10ms.

1

Measure Current Performance

2

Identify Bottlenecks

3

Apply Optimization

4

Verify Results

Strategic Indexing

Indexes accelerate data retrieval. Think of them like book indexes — instead of reading every page to find "databases," you check the index and jump to page 247. Database indexes work identically. But they cost storage space and slow down writes.

BookVault's most common query searches books by title. Without an index, the database scans every row sequentially. With 2 million books, that's 2 million comparisons per search. An index on the title column reduces this to roughly 21 comparisons using tree traversal.

With Index

Tree traversal: ~21 comparisons
Query time: 8-15ms
Storage cost: +15% space
Perfect for read-heavy workloads

✓ RECOMMENDED

Full Table Scan

Sequential scan: 2M comparisons
Query time: 800-1200ms
Storage cost: No overhead
Acceptable for tiny tables only

-- Create performance indexes on BookVault's most queried columns
-- These cover 80% of customer search patterns

-- Book title searches (most common)
CREATE INDEX idx_books_title ON books(title);

-- Author name searches  
CREATE INDEX idx_authors_fullname ON authors(last_name, first_name);

-- Category browsing
CREATE INDEX idx_books_category ON books(category_id);

-- Customer order history
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date DESC);
Index "idx_books_title" created successfully
Index "idx_authors_fullname" created successfully  
Index "idx_books_category" created successfully
Index "idx_orders_customer_date" created successfully

Affected: 4 indexes created

What just happened?

We created four strategic indexes covering BookVault's primary search patterns. The idx_authors_fullname uses a compound index — last name first because customers typically search "King Stephen" not "Stephen King." Try this: Check your database's most frequent queries and index those columns first.

Composite Index Strategy

Composite indexes cover multiple columns in a single index. Order matters tremendously. The idx_orders_customer_date index works perfectly for queries filtering by customer_id, or by both customer_id and order_date. But it's useless for queries filtering only by order_date.

Think of composite indexes like phone books. Organized by last name, then first name. You can find "Smith John" or all "Smith" entries efficiently. But finding all "John" entries requires scanning every page.

Query Optimization Patterns

Writing fast queries requires understanding how databases execute SQL. The query optimizer chooses execution plans automatically, but poor query structure forces bad choices. Small syntax changes create massive performance differences.

-- SLOW: Function in WHERE clause prevents index usage
SELECT * FROM books 
WHERE UPPER(title) = 'THE GREAT GATSBY';

-- FAST: Direct column comparison uses index
SELECT * FROM books 
WHERE title = 'The Great Gatsby';
Query 1: 847ms (table scan - 2M rows examined)
Query 2: 12ms (index seek - 1 row examined)

Performance improvement: 98.6% faster

What just happened?

The UPPER() function forces the database to calculate UPPER(title) for every row before comparison. This bypasses the index completely. The second query uses the title index directly. Try this: Use EXPLAIN before any query to see which indexes are used.

Fast Query Patterns

• Use = instead of LIKE when possible
• Put most selective conditions first
• Avoid functions in WHERE clauses
• Use EXISTS over IN for subqueries

Index-Friendly JOINs

• JOIN on indexed foreign keys
• Keep JOIN conditions simple
• Consider denormalization for complex JOINs
• Use covering indexes when possible

LIMIT Optimization

• Always use LIMIT for pagination
• Combine with ORDER BY on indexed columns
• Use OFFSET carefully (gets slower with large offsets)
• Consider cursor-based pagination

Performance Killers

• SELECT * (retrieves unnecessary data)
• N+1 query patterns
• Unnecessary DISTINCT
• Correlated subqueries in loops

Real-World Query Optimization

-- BookVault's optimized bestseller query
-- Finds top 10 books by sales volume in the last 30 days
SELECT 
    b.title,
    a.first_name,
    a.last_name,
    SUM(oi.quantity) as total_sold
FROM books b
JOIN authors a ON b.author_id = a.author_id
JOIN order_items oi ON b.book_id = oi.book_id  
JOIN orders o ON oi.order_id = o.order_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '30 days'
    AND o.status = 'completed'
GROUP BY b.book_id, b.title, a.first_name, a.last_name
ORDER BY total_sold DESC
LIMIT 10;
title                    | first_name | last_name | total_sold
-------------------------|------------|-----------|------------
The Seven Husbands        | Taylor     | Reid      | 2,847
Project Hail Mary         | Andy       | Weir      | 2,103
The Thursday Murder Club  | Richard    | Osman     | 1,956
Klara and the Sun        | Kazuo      | Ishiguro  | 1,842

Execution time: 23ms (with proper indexes)
Rows examined: 45,203 from 2.1M total

Schema-Level Performance Design

Performance starts with smart schema design. Normalization reduces storage but increases JOIN complexity. Denormalization speeds up reads but complicates updates. The trick is finding the right balance for your workload.

BookVault faces a classic trade-off. Customer order summaries require joining five tables: orders, order_items, books, authors, and categories. For 50,000 daily order views, those JOINs become expensive. Selectively denormalizing saves 60% query time.

Approach Storage Read Speed Write Speed Best For
Full Normalization Minimal Slow (many JOINs) Fast Write-heavy apps
Selective Denormalization Moderate (+20%) Fast Moderate Balanced workloads
Heavy Denormalization Large (+50%) Very Fast Slow (many updates) Read-heavy analytics

Common Mistake: Premature Denormalization

Developers often denormalize too early, before measuring actual performance. Start normalized, measure query performance under realistic data volumes, then denormalize only the proven bottlenecks. BookVault kept full normalization until reaching 500K books — then selectively denormalized order summaries.

Data Type Performance Impact

Data type choices significantly affect performance. Smaller data types mean faster comparisons, smaller indexes, and better memory usage. INT comparisons run 4x faster than VARCHAR(255) comparisons. UUID primary keys create 16-byte indexes versus 4-byte integer indexes.

-- Performance-optimized BookVault schema
-- Carefully chosen data types for speed and storage efficiency

CREATE TABLE books_optimized (
    book_id INT PRIMARY KEY,           -- 4 bytes, fast comparisons
    title VARCHAR(200) NOT NULL,      -- Most titles under 200 chars  
    author_id SMALLINT NOT NULL,      -- 2 bytes, max 65K authors
    category_id TINYINT NOT NULL,     -- 1 byte, max 255 categories
    price DECIMAL(6,2) NOT NULL,      -- Exact precision for money
    stock_qty SMALLINT NOT NULL,      -- 2 bytes, reasonable stock limits
    published_year SMALLINT NOT NULL, -- 2 bytes vs 4-byte DATE
    is_featured BOOLEAN DEFAULT FALSE -- 1 bit, faster than VARCHAR
);
Table "books_optimized" created successfully

Row size: ~220 bytes (vs 280 bytes in original schema)
Index size reduction: ~21% smaller
Memory usage: 2M rows fit in 440MB vs 560MB

Advanced Performance Techniques

Enterprise applications demand advanced optimization. Read replicas distribute query load across multiple database servers. Connection pooling prevents the overhead of establishing new database connections for each request.

Caching eliminates database queries entirely for frequently accessed data. BookVault caches bestseller lists, popular categories, and author information in Redis. Cache hits serve responses in under 1ms versus 15-50ms database queries.

Partitioning for Scale

Partitioning splits large tables across multiple storage units. Horizontal partitioning splits rows — recent orders in one partition, archived orders in another. Vertical partitioning splits columns — frequently accessed columns separate from rarely accessed ones.

-- Partition BookVault orders by date for better performance
-- Recent orders (hot data) separate from historical orders (cold data)

CREATE TABLE orders_2024 (
    LIKE orders INCLUDING ALL
) PARTITION BY RANGE (order_date);

-- Current year partition - fast SSD storage
CREATE TABLE orders_2024_current PARTITION OF orders_2024
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

-- Archive partition - slower but cheaper storage  
CREATE TABLE orders_2023_archive PARTITION OF orders_2024
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
Table "orders_2024" created successfully
Partition "orders_2024_current" created successfully  
Partition "orders_2023_archive" created successfully

Query performance: 340% faster for recent orders
Storage cost: 45% reduction using tiered storage

Pro Tip: Partition pruning automatically excludes irrelevant partitions from queries. Searching for 2024 orders completely ignores the 2023 partition, dramatically reducing I/O. Design partition keys around your most common query filters.

Monitoring and Maintenance

Performance degrades over time without maintenance. Index fragmentation slows queries as data changes. Statistics become outdated, causing poor query plan choices. Regular maintenance prevents performance regression.

Monitor key metrics continuously: query execution time, index usage, lock contention, and memory consumption. Set alerts for queries exceeding 100ms. Most performance problems show early warning signs before becoming critical.

Quiz

1. BookVault's category browsing page loads slowly because the query "SELECT * FROM books WHERE category_id = 5" scans all 2 million books. What's the best first optimization?


2. Why is "SELECT * FROM books WHERE UPPER(title) = 'THE GREAT GATSBY'" much slower than "SELECT * FROM books WHERE title = 'The Great Gatsby'" even with an index on the title column?


3. BookVault frequently runs queries like "SELECT * FROM orders WHERE customer_id = 123 AND order_date >= '2024-01-01'". What's the optimal column order for a composite index?


Up Next

Designing for Scalability

Learn horizontal scaling, microservices data patterns, and distributed database architectures that power applications serving millions of users simultaneously.