Database Design
Indexing Strategy
Design high-performance indexes for BookVault's database using real query patterns and performance analysis to create an optimal indexing strategy.
1,000,000 rows checked
Response: 2.4 seconds
50 rows examined
Response: 0.003 seconds
Understanding Index Types
Database indexes come in several flavors. Each solves specific performance problems. Clustered indexes physically reorder table data. Non-clustered indexes create separate lookup structures. Composite indexes cover multiple columns together.Clustered Index
Physically sorts table data. Only one per table. Usually on primary key. Fast for range queries.
Non-Clustered
Separate lookup structure. Multiple allowed per table. Points to actual data rows. Best for specific searches.
Composite Index
Covers multiple columns in specific order. Perfect for complex WHERE clauses and sorting.
Partial Index
Indexes subset of rows meeting conditions. Smaller, faster, targets specific query patterns.
books table illustrates these differences perfectly. The primary key book_id gets a clustered index automatically. But searches by title or author_id need separate non-clustered indexes.
Query Pattern Analysis
Before creating indexes, analyze your actual queries. Not what you think users will search for. What they actually search for. The difference kills performance budgets. BookVault's most common queries reveal clear indexing needs:-- BookVault's top 5 query patterns from application logs
-- Query 1: Book search by title (40% of all queries)
SELECT book_id, title, price
FROM books
WHERE title LIKE 'Harry Potter%';
-- Query 2: Books by specific author (25% of queries)
SELECT b.title, b.price, a.first_name, a.last_name
FROM books b
JOIN authors a ON b.author_id = a.author_id
WHERE a.last_name = 'Rowling';
-- Query 3: Category browsing with pagination (20% of queries)
SELECT book_id, title, price
FROM books
WHERE category_id = 5
ORDER BY title
LIMIT 20 OFFSET 100;
Query 1: Full table scan - 2.1 seconds Query 2: Hash join with table scans - 1.8 seconds Query 3: Full table scan + sort - 3.2 seconds Total response time: 7.1 seconds User experience: Unacceptable
What just happened?
Without proper indexes, every query performs full table scans. The database examines every single row to find matches. Try this: Run EXPLAIN on your slowest queries to see their execution plans.
Query Frequency vs Performance Impact
Not all slow queries deserve indexes. A query running once per day can stay slow. A query running 10,000 times per hour must be fast. Calculate the performance impact score: frequency × current response time × user frustration level.| Query Type | Daily Frequency | Avg Response | Impact Score | Priority |
|---|---|---|---|---|
| Book title search | 45,000 | 2.1s | 94,500 | CRITICAL |
| Author page load | 28,000 | 1.8s | 50,400 | HIGH |
| Category browsing | 22,000 | 3.2s | 70,400 | HIGH |
| Sales reports | 50 | 8.5s | 425 | LOW |
Designing Composite Indexes
Single-column indexes solve simple problems. Real applications need composite indexes covering multiple columns. But column order matters enormously. Get it wrong, and your index becomes useless for most queries. The golden rule: most selective column first, then by query frequency. "Selective" means the column that eliminates the most rows fastest.-- BookVault composite index design
-- Query: Find fantasy books under $20, sorted by publication year
SELECT title, price, published_year
FROM books
WHERE category_id = 3 -- Fantasy category (selectivity: 8%)
AND price < 20.00 -- Price filter (selectivity: 45%)
AND stock_qty > 0 -- In stock (selectivity: 78%)
ORDER BY published_year DESC;
Without index: 1,000,000 rows scanned Expected result: ~2,800 matching books Selectivity analysis: - category_id = 3: eliminates 92% of rows (best) - price < 20.00: eliminates 55% of remaining rows - stock_qty > 0: eliminates 22% of remaining rows
What just happened?
Selectivity determines index effectiveness. The category_id filter eliminates 92% of rows immediately, making it the perfect leading column. Try this: Calculate selectivity with COUNT DISTINCT on your filter columns.
-- Optimal composite index for BookVault fantasy search
CREATE INDEX idx_books_category_price_stock_year ON books (
category_id, -- Most selective (8% of rows match)
price, -- Second most selective (helps range queries)
stock_qty, -- Least selective but commonly filtered
published_year -- Included for ORDER BY optimization
);
-- This index supports multiple query patterns:
-- Pattern 1: WHERE category_id = ?
-- Pattern 2: WHERE category_id = ? AND price < ?
-- Pattern 3: WHERE category_id = ? AND price < ? AND stock_qty > ?
-- Pattern 4: All above + ORDER BY published_year
Index created successfully. Estimated size: 45 MB Expected performance improvement: - Fantasy book queries: 2.1s → 0.08s (26x faster) - Price range searches: 1.8s → 0.12s (15x faster) - Stock availability: 1.2s → 0.05s (24x faster)
Column Order Mistake
Never put low-selectivity columns first. An index starting with stock_qty would be nearly useless because 78% of books are in stock. The database still examines 780,000 rows instead of 80,000.
Index Maintenance Strategy
Indexes aren't "create and forget" structures. They require ongoing maintenance. Every INSERT, UPDATE, and DELETE modifies indexes too. Write-heavy tables with too many indexes become slower than tables with no indexes. BookVault'sorders table demonstrates this challenge perfectly. Orders are created frequently but rarely updated. The order_items table follows the same pattern.
Data Insight
BookVault processes 50,000 orders daily but only 200 order modifications. This 250:1 read-to-write ratio justifies aggressive indexing on order lookup columns.
-- Monitor index usage in BookVault database
-- PostgreSQL version (adjust for your database system)
SELECT
schemaname,
tablename,
indexname,
idx_scan AS times_used,
idx_tup_read AS rows_read,
idx_tup_fetch AS rows_fetched
FROM pg_stat_user_indexes
WHERE schemaname = 'bookvault'
ORDER BY idx_scan DESC;
indexname | times_used | rows_read idx_books_title | 450,234 | 2,100,445 idx_books_author_id | 289,567 | 1,200,334 idx_orders_customer_id | 234,123 | 890,234 idx_books_experimental_date | 0 | 0 idx_categories_old_system | 12 | 48
What just happened?
Usage statistics reveal index effectiveness. The experimental date index was never used and should be dropped. The old system category index is barely used and wastes 15 MB. Try this: Schedule monthly index usage reviews.
Index Rebuilding and Statistics
Database indexes fragment over time, just like hard drives. Query performance gradually degrades as indexes become less efficient. Most databases provide tools to rebuild and optimize indexes. Index statistics become stale when data distribution changes significantly. A category that used to contain 5% of books now contains 40%. The query optimizer makes poor decisions based on outdated statistics.-- BookVault index maintenance routine
-- Update statistics for better query plans
-- PostgreSQL example
ANALYZE books;
ANALYZE orders;
ANALYZE order_items;
-- Reindex fragmented indexes (run during maintenance window)
REINDEX INDEX idx_books_title;
REINDEX INDEX idx_orders_customer_id;
-- Check index bloat and effectiveness
SELECT
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE schemaname = 'bookvault';
Statistics updated for 3 tables. Reindexing completed. Index sizes after optimization: idx_books_title | 89 MB (was 156 MB - 43% reduction) idx_orders_customer_id | 34 MB (was 52 MB - 35% reduction) Total space reclaimed: 85 MB
Covering Indexes and Include Columns
The ultimate index optimization: covering indexes. These indexes contain all columns needed by a query. The database never touches the actual table. Pure index lookups are 10-50x faster than index + table lookups. BookVault's book search demonstrates covering index power:-- BookVault covering index for complete book search
-- Query needs: title, price, stock_qty, author name
-- Without covering index:
-- 1. Use title index to find book_id values
-- 2. Look up each book_id in books table for price, stock_qty
-- 3. Look up author_id in authors table for name
-- Result: Multiple table accesses per book
-- With covering index:
CREATE INDEX idx_books_search_covering ON books (
title, -- Search key
category_id -- Filter key
) INCLUDE (
price, -- Include columns (non-key data)
stock_qty,
author_id,
published_year
);
-- Now this query uses ONLY the index:
SELECT title, price, stock_qty, published_year
FROM books
WHERE title LIKE 'Harry%'
AND category_id = 1;
Covering index created: 145 MB Query execution plan: -> Index Only Scan on idx_books_search_covering Filter: (title ~~ 'Harry%' AND category_id = 1) Rows: 47 Time: 0.003 seconds (was 0.24 seconds) Performance improvement: 80x faster
What just happened?
"Index Only Scan" means the query never accessed the books table. All required data came from the index. This eliminates random disk I/O and cache misses. Try this: Identify your most frequent SELECT columns for covering index opportunities.
Pro Tip: Create covering indexes for queries that run more than 1,000 times per hour and access predictable column sets. Monitor storage growth and write performance impact carefully.
Quiz
1. BookVault needs to optimize this query that runs 5,000 times daily: SELECT * FROM orders WHERE order_date >= '2024-01-01' AND customer_id = 123 ORDER BY order_date. Which index provides the best performance?
2. BookVault's index usage monitoring shows: title index (450,000 uses), author index (289,000 uses), experimental date index (0 uses), old system category index (12 uses). Which index should be dropped first?
3. For BookVault's composite index on the books table, you analyzed selectivity: category_id filters to 8% of rows, price < 20 filters to 45% of rows, stock_qty > 0 filters to 78% of rows. What's the optimal column order?
Up Next
Partitioning Concepts
Learn how to split massive tables across multiple physical storage units for better performance and easier maintenance.