Database Design
Physical Database Design
Transform your logical schema into an optimized physical database ready for production workloads
Your logical schema looks perfect on paper. Clean relationships, proper normalization, elegant design. But now comes the harsh reality check. Will this database actually perform when 50,000 customers browse BookVault simultaneously?
Physical database design bridges the gap between theory and production. You take your logical schema and make concrete decisions about storage, performance, and scalability. Every choice affects real users waiting for book search results.
Think of it like architectural blueprints versus actual construction. The blueprint shows where rooms go. But the contractor must decide concrete types, insulation methods, and electrical routing. Similarly, physical design determines how your data actually lives on disk drives.
Physical vs Logical Design
Logical Design
What data relationships exist
- Tables and columns
- Primary and foreign keys
- Relationships and constraints
- Business rules
Physical Design
How data is stored and accessed
- Storage engines and file formats
- Indexes and partitioning
- Hardware and memory allocation
- Query execution paths
Spotify's music database illustrates this perfectly. Logically, songs relate to artists, albums, and playlists. Physically, popular songs get cached in memory across multiple data centers. New releases might live on faster SSD storage. Archive tracks from 1950 could sit on cheaper magnetic drives.
Storage Engine Selection
Your first major decision: which storage engine powers each table. Think of storage engines as different filing systems for your data.
| Engine | Best For | BookVault Use Case | Trade-offs |
|---|---|---|---|
| InnoDB | Transactions, consistency | orders, order_items | Slower reads |
| MyISAM | Fast reads, full-text search | books (search catalog) | No transactions |
| Memory | Ultra-fast temporary data | user_sessions, shopping_cart | Data lost on restart |
| Archive | Compressed long-term storage | order_history, audit_logs | Insert-only, slow queries |
BookVault's orders table absolutely needs InnoDB. Why? When a customer buys three books, all three order_items must save together or not at all. No partial orders allowed. InnoDB provides these transaction guarantees.
But the books table for browsing? MyISAM might be smarter. Customers search book titles constantly. They rarely update book records. MyISAM excels at fast reads and full-text search queries.
-- Choose storage engines based on access patterns
-- BookVault orders need transactions
CREATE TABLE orders (
order_id INT AUTO_INCREMENT PRIMARY KEY,
customer_id INT NOT NULL,
order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
status ENUM('pending', 'paid', 'shipped', 'delivered'),
total_amount DECIMAL(10,2)
) ENGINE=InnoDB;
-- Books table optimized for search performance
CREATE TABLE books (
book_id INT AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(255) NOT NULL,
author_id INT NOT NULL,
category_id INT NOT NULL,
price DECIMAL(8,2),
stock_qty INT DEFAULT 0,
published_year INT,
FULLTEXT(title)
) ENGINE=MyISAM;
Query OK, 0 rows affected (0.05 sec) Query OK, 0 rows affected (0.03 sec)
What just happened?
We created two tables with different storage engines based on their usage patterns. The orders table uses InnoDB for transaction safety, while books uses MyISAM for faster search queries. Try this: Add a FULLTEXT index on book descriptions for even better search performance.
Data Type Optimization
Every column needs the smallest data type that safely holds its values. Sounds trivial, but this decision impacts every single query for the next five years.
Amazon's product catalog demonstrates this perfectly. They don't use VARCHAR(255) for product names just because it's convenient. They analyze actual product name lengths, then choose VARCHAR(120) to save millions of bytes across billions of products.
Poor Data Types
VARCHAR(255) everywhere
TEXT for short descriptions
BIGINT for small counters
Optimized Data Types
VARCHAR(50) for book titles
VARCHAR(500) for descriptions
TINYINT for quantity (0-255)
BookVault's stock_qty column illustrates this perfectly. Books rarely stock more than 255 copies. Using INT wastes 3 bytes per row. With 100,000 books, that's 300KB of unnecessary storage. Multiply across indexes and you've wasted several megabytes.
-- Optimize data types for actual usage patterns
-- BookVault books with right-sized columns
CREATE TABLE books_optimized (
book_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(120) NOT NULL, -- Most titles under 120 chars
author_id MEDIUMINT UNSIGNED NOT NULL, -- Up to 16M authors
category_id TINYINT UNSIGNED NOT NULL, -- Max 255 categories
price DECIMAL(6,2) UNSIGNED, -- $9999.99 max
stock_qty TINYINT UNSIGNED DEFAULT 0, -- 0-255 copies
published_year YEAR, -- Optimized year storage
isbn VARCHAR(13), -- ISBN-13 exact length
page_count SMALLINT UNSIGNED -- Up to 65535 pages
);
Query OK, 0 rows affected (0.04 sec)
Data Insight
This optimized schema uses 40% less storage per row compared to generic VARCHAR(255) and INT columns. With 1 million books, that saves 15MB of storage plus faster query performance.
Partitioning Strategy
Partitioning splits large tables into smaller, more manageable pieces. Imagine your filing cabinet getting too full. You could split it by year, department, or alphabet. Database partitioning works similarly.
BookVault's orders table grows rapidly. Every day adds thousands of new orders. After two years, queries slow down because the database must scan millions of rows. Partitioning by month lets queries ignore irrelevant historical data.
| Partition Type | How It Works | BookVault Example | Query Benefit |
|---|---|---|---|
| Range | Split by value ranges | orders by order_date | Skip old months |
| Hash | Distribute evenly by hash | customers by customer_id | Parallel processing |
| List | Split by specific values | books by category_id | Category-specific queries |
| Composite | Multiple partition keys | orders by date + customer | Complex query optimization |
-- Partition orders table by month for better performance
-- BookVault orders growing rapidly over time
CREATE TABLE orders_partitioned (
order_id INT AUTO_INCREMENT,
customer_id INT NOT NULL,
order_date DATE NOT NULL,
status ENUM('pending', 'paid', 'shipped', 'delivered'),
total_amount DECIMAL(10,2),
PRIMARY KEY (order_id, order_date)
) ENGINE=InnoDB
PARTITION BY RANGE (YEAR(order_date) * 100 + MONTH(order_date)) (
PARTITION p202301 VALUES LESS THAN (202302),
PARTITION p202302 VALUES LESS THAN (202303),
PARTITION p202303 VALUES LESS THAN (202304),
PARTITION p202304 VALUES LESS THAN (202305),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
Query OK, 0 rows affected (0.08 sec)
What just happened?
We created a partitioned orders table that splits data by month. Queries for recent orders only scan the relevant partition, dramatically improving performance. Notice the composite primary key includes order_date - this enables partition pruning. Try this: Query orders from March 2023 and watch MySQL skip other partitions entirely.
Memory and Buffer Configuration
Your database server has limited RAM. How you allocate this memory between different buffers determines query performance. Get this wrong and even perfect schemas run slowly.
Think of database memory like a restaurant kitchen. You need space for ingredients (data pages), prep work (sort buffers), and completed dishes (query caches). Too little prep space and chefs work slowly. Too little storage and ingredients spoil.
InnoDB Buffer Pool
Caches data pages in memory
BookVault: 70% of available RAM
Query Cache
Stores frequent query results
BookVault: 256MB for book searches
Sort Buffer
Handles ORDER BY operations
BookVault: 4MB per connection
Join Buffer
Optimizes table joins
BookVault: 2MB for complex queries
BookVault's server has 16GB RAM. We allocate 11GB to InnoDB buffer pool because most queries hit the books and orders tables. The more frequently accessed data stays in memory, the faster queries run.
Common Memory Mistake
Setting innodb_buffer_pool_size larger than available RAM causes swapping to disk, making queries slower than no buffer at all. Always leave 2GB+ for the operating system and other processes.
Hardware Considerations
Your schema design must match your hardware reality. SSD versus magnetic drives. Cloud versus dedicated servers. Single machine versus distributed cluster. Each choice affects how you structure data.
Netflix learned this lesson scaling from DVDs to streaming. Their original database assumed local disk storage. But streaming requires global distribution. They completely redesigned their schemas around cloud infrastructure and geographic regions.
| Hardware Type | Characteristics | Schema Impact | BookVault Decision |
|---|---|---|---|
| SSD Storage | Fast random access | More indexes acceptable | Aggressive indexing strategy |
| Magnetic HDD | Sequential reads faster | Minimize indexes, cluster data | Partition by access patterns |
| Cloud Database | Auto-scaling, managed | Design for horizontal scaling | Use managed services |
| Dedicated Server | Full control, fixed resources | Optimize for specific hardware | Tune for known capacity |
BookVault runs on AWS RDS with SSD storage. This enables aggressive indexing without worrying about disk seek times. We can add indexes on book titles, author names, and categories because SSD handles multiple index lookups efficiently.
Security at the Physical Layer
Physical design includes security decisions that logical schemas cannot address. Data encryption, user privileges, network access, and audit logging all require specific implementation choices.
Stripe's payment processing illustrates this perfectly. Their logical schema shows customer and payment tables. But physically, credit card numbers live in encrypted storage with separate access keys. Payment processors and customer service see different views of the same logical data.
-- Implement column-level encryption for sensitive data
-- BookVault customer personal information
ALTER TABLE customers
ADD COLUMN email_encrypted VARBINARY(255),
ADD COLUMN phone_encrypted VARBINARY(255);
-- Create separate user accounts with limited privileges
CREATE USER 'bookvault_app'@'%' IDENTIFIED BY 'secure_password';
GRANT SELECT, INSERT, UPDATE ON bookvault.orders TO 'bookvault_app'@'%';
GRANT SELECT ON bookvault.books TO 'bookvault_app'@'%';
GRANT SELECT ON bookvault.customers TO 'bookvault_app'@'%';
-- Analytics user cannot see sensitive customer data
CREATE USER 'bookvault_analytics'@'%' IDENTIFIED BY 'analytics_password';
GRANT SELECT ON bookvault.books TO 'bookvault_analytics'@'%';
GRANT SELECT ON bookvault.categories TO 'bookvault_analytics'@'%';
GRANT SELECT(order_id, order_date, total_amount) ON bookvault.orders TO 'bookvault_analytics'@'%';
Query OK, 1 row affected (0.12 sec) Query OK, 0 rows affected (0.04 sec) Query OK, 0 rows affected (0.02 sec) Query OK, 0 rows affected (0.01 sec)
Physical security extends beyond encryption. Consider network security groups, database firewalls, and audit logging. Your production database should be accessible only from your application servers, never directly from the internet.
Physical database design transforms your elegant logical schema into a production-ready system. Every decision from storage engines to memory allocation affects real user experience. BookVault customers don't care about your normalization theory. They care whether book searches return results in under 200 milliseconds.
But physical design isn't a one-time decision. Traffic patterns change. New features require different access patterns. Hardware costs shift. The best database architects continuously monitor and adjust their physical implementations while keeping logical schemas stable.
Quiz
1. BookVault needs to optimize storage engines for their main tables. Which combination makes the most sense for orders and books tables?
2. BookVault's stock_qty column stores book inventory counts, typically ranging from 0 to 200 copies. What's the most efficient data type choice?
3. BookVault's orders table grows by 10,000+ rows monthly and most queries search recent orders. What partitioning strategy provides the best query performance?
Up Next
Indexing Strategy
Learn how to choose the right indexes to make your physical database design fly with lightning-fast query performance.