DataBase Design Lesson 28 – Physical Database Design | Dataplexa
Database Design · Lesson 28

Physical Database Design

Transform your logical schema into an optimized physical database ready for production workloads

1
Logical Schema
2
Storage Decisions
3
Performance Optimization
4
Production Database

Your logical schema looks perfect on paper. Clean relationships, proper normalization, elegant design. But now comes the harsh reality check. Will this database actually perform when 50,000 customers browse BookVault simultaneously?

Physical database design bridges the gap between theory and production. You take your logical schema and make concrete decisions about storage, performance, and scalability. Every choice affects real users waiting for book search results.

Think of it like architectural blueprints versus actual construction. The blueprint shows where rooms go. But the contractor must decide concrete types, insulation methods, and electrical routing. Similarly, physical design determines how your data actually lives on disk drives.

Physical vs Logical Design

Logical Design

What data relationships exist

  • Tables and columns
  • Primary and foreign keys
  • Relationships and constraints
  • Business rules

Physical Design

How data is stored and accessed

  • Storage engines and file formats
  • Indexes and partitioning
  • Hardware and memory allocation
  • Query execution paths

Spotify's music database illustrates this perfectly. Logically, songs relate to artists, albums, and playlists. Physically, popular songs get cached in memory across multiple data centers. New releases might live on faster SSD storage. Archive tracks from 1950 could sit on cheaper magnetic drives.

Storage Engine Selection

Your first major decision: which storage engine powers each table. Think of storage engines as different filing systems for your data.

Engine Best For BookVault Use Case Trade-offs
InnoDB Transactions, consistency orders, order_items Slower reads
MyISAM Fast reads, full-text search books (search catalog) No transactions
Memory Ultra-fast temporary data user_sessions, shopping_cart Data lost on restart
Archive Compressed long-term storage order_history, audit_logs Insert-only, slow queries

BookVault's orders table absolutely needs InnoDB. Why? When a customer buys three books, all three order_items must save together or not at all. No partial orders allowed. InnoDB provides these transaction guarantees.

But the books table for browsing? MyISAM might be smarter. Customers search book titles constantly. They rarely update book records. MyISAM excels at fast reads and full-text search queries.

-- Choose storage engines based on access patterns
-- BookVault orders need transactions
CREATE TABLE orders (
    order_id INT AUTO_INCREMENT PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    status ENUM('pending', 'paid', 'shipped', 'delivered'),
    total_amount DECIMAL(10,2)
) ENGINE=InnoDB;

-- Books table optimized for search performance
CREATE TABLE books (
    book_id INT AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    author_id INT NOT NULL,
    category_id INT NOT NULL,
    price DECIMAL(8,2),
    stock_qty INT DEFAULT 0,
    published_year INT,
    FULLTEXT(title)
) ENGINE=MyISAM;
Query OK, 0 rows affected (0.05 sec)

Query OK, 0 rows affected (0.03 sec)

What just happened?

We created two tables with different storage engines based on their usage patterns. The orders table uses InnoDB for transaction safety, while books uses MyISAM for faster search queries. Try this: Add a FULLTEXT index on book descriptions for even better search performance.

Data Type Optimization

Every column needs the smallest data type that safely holds its values. Sounds trivial, but this decision impacts every single query for the next five years.

Amazon's product catalog demonstrates this perfectly. They don't use VARCHAR(255) for product names just because it's convenient. They analyze actual product name lengths, then choose VARCHAR(120) to save millions of bytes across billions of products.

Poor Data Types

VARCHAR(255) everywhere

TEXT for short descriptions

BIGINT for small counters

Optimized Data Types

VARCHAR(50) for book titles

VARCHAR(500) for descriptions

TINYINT for quantity (0-255)

BookVault's stock_qty column illustrates this perfectly. Books rarely stock more than 255 copies. Using INT wastes 3 bytes per row. With 100,000 books, that's 300KB of unnecessary storage. Multiply across indexes and you've wasted several megabytes.

-- Optimize data types for actual usage patterns
-- BookVault books with right-sized columns
CREATE TABLE books_optimized (
    book_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(120) NOT NULL,            -- Most titles under 120 chars
    author_id MEDIUMINT UNSIGNED NOT NULL,  -- Up to 16M authors
    category_id TINYINT UNSIGNED NOT NULL,  -- Max 255 categories
    price DECIMAL(6,2) UNSIGNED,            -- $9999.99 max
    stock_qty TINYINT UNSIGNED DEFAULT 0,   -- 0-255 copies
    published_year YEAR,                    -- Optimized year storage
    isbn VARCHAR(13),                       -- ISBN-13 exact length
    page_count SMALLINT UNSIGNED            -- Up to 65535 pages
);
Query OK, 0 rows affected (0.04 sec)

Data Insight

This optimized schema uses 40% less storage per row compared to generic VARCHAR(255) and INT columns. With 1 million books, that saves 15MB of storage plus faster query performance.

Partitioning Strategy

Partitioning splits large tables into smaller, more manageable pieces. Imagine your filing cabinet getting too full. You could split it by year, department, or alphabet. Database partitioning works similarly.

BookVault's orders table grows rapidly. Every day adds thousands of new orders. After two years, queries slow down because the database must scan millions of rows. Partitioning by month lets queries ignore irrelevant historical data.

Partition Type How It Works BookVault Example Query Benefit
Range Split by value ranges orders by order_date Skip old months
Hash Distribute evenly by hash customers by customer_id Parallel processing
List Split by specific values books by category_id Category-specific queries
Composite Multiple partition keys orders by date + customer Complex query optimization
-- Partition orders table by month for better performance
-- BookVault orders growing rapidly over time
CREATE TABLE orders_partitioned (
    order_id INT AUTO_INCREMENT,
    customer_id INT NOT NULL,
    order_date DATE NOT NULL,
    status ENUM('pending', 'paid', 'shipped', 'delivered'),
    total_amount DECIMAL(10,2),
    PRIMARY KEY (order_id, order_date)
) ENGINE=InnoDB
PARTITION BY RANGE (YEAR(order_date) * 100 + MONTH(order_date)) (
    PARTITION p202301 VALUES LESS THAN (202302),
    PARTITION p202302 VALUES LESS THAN (202303),
    PARTITION p202303 VALUES LESS THAN (202304),
    PARTITION p202304 VALUES LESS THAN (202305),
    PARTITION pmax VALUES LESS THAN MAXVALUE
);
Query OK, 0 rows affected (0.08 sec)

What just happened?

We created a partitioned orders table that splits data by month. Queries for recent orders only scan the relevant partition, dramatically improving performance. Notice the composite primary key includes order_date - this enables partition pruning. Try this: Query orders from March 2023 and watch MySQL skip other partitions entirely.

Memory and Buffer Configuration

Your database server has limited RAM. How you allocate this memory between different buffers determines query performance. Get this wrong and even perfect schemas run slowly.

Think of database memory like a restaurant kitchen. You need space for ingredients (data pages), prep work (sort buffers), and completed dishes (query caches). Too little prep space and chefs work slowly. Too little storage and ingredients spoil.

InnoDB Buffer Pool

Caches data pages in memory

BookVault: 70% of available RAM

Query Cache

Stores frequent query results

BookVault: 256MB for book searches

Sort Buffer

Handles ORDER BY operations

BookVault: 4MB per connection

Join Buffer

Optimizes table joins

BookVault: 2MB for complex queries

BookVault's server has 16GB RAM. We allocate 11GB to InnoDB buffer pool because most queries hit the books and orders tables. The more frequently accessed data stays in memory, the faster queries run.

Common Memory Mistake

Setting innodb_buffer_pool_size larger than available RAM causes swapping to disk, making queries slower than no buffer at all. Always leave 2GB+ for the operating system and other processes.

Hardware Considerations

Your schema design must match your hardware reality. SSD versus magnetic drives. Cloud versus dedicated servers. Single machine versus distributed cluster. Each choice affects how you structure data.

Netflix learned this lesson scaling from DVDs to streaming. Their original database assumed local disk storage. But streaming requires global distribution. They completely redesigned their schemas around cloud infrastructure and geographic regions.

Hardware Type Characteristics Schema Impact BookVault Decision
SSD Storage Fast random access More indexes acceptable Aggressive indexing strategy
Magnetic HDD Sequential reads faster Minimize indexes, cluster data Partition by access patterns
Cloud Database Auto-scaling, managed Design for horizontal scaling Use managed services
Dedicated Server Full control, fixed resources Optimize for specific hardware Tune for known capacity

BookVault runs on AWS RDS with SSD storage. This enables aggressive indexing without worrying about disk seek times. We can add indexes on book titles, author names, and categories because SSD handles multiple index lookups efficiently.

Security at the Physical Layer

Physical design includes security decisions that logical schemas cannot address. Data encryption, user privileges, network access, and audit logging all require specific implementation choices.

Stripe's payment processing illustrates this perfectly. Their logical schema shows customer and payment tables. But physically, credit card numbers live in encrypted storage with separate access keys. Payment processors and customer service see different views of the same logical data.

-- Implement column-level encryption for sensitive data
-- BookVault customer personal information
ALTER TABLE customers 
ADD COLUMN email_encrypted VARBINARY(255),
ADD COLUMN phone_encrypted VARBINARY(255);

-- Create separate user accounts with limited privileges
CREATE USER 'bookvault_app'@'%' IDENTIFIED BY 'secure_password';
GRANT SELECT, INSERT, UPDATE ON bookvault.orders TO 'bookvault_app'@'%';
GRANT SELECT ON bookvault.books TO 'bookvault_app'@'%';
GRANT SELECT ON bookvault.customers TO 'bookvault_app'@'%';

-- Analytics user cannot see sensitive customer data
CREATE USER 'bookvault_analytics'@'%' IDENTIFIED BY 'analytics_password';
GRANT SELECT ON bookvault.books TO 'bookvault_analytics'@'%';
GRANT SELECT ON bookvault.categories TO 'bookvault_analytics'@'%';
GRANT SELECT(order_id, order_date, total_amount) ON bookvault.orders TO 'bookvault_analytics'@'%';
Query OK, 1 row affected (0.12 sec)

Query OK, 0 rows affected (0.04 sec)

Query OK, 0 rows affected (0.02 sec)

Query OK, 0 rows affected (0.01 sec)

Physical security extends beyond encryption. Consider network security groups, database firewalls, and audit logging. Your production database should be accessible only from your application servers, never directly from the internet.

Physical database design transforms your elegant logical schema into a production-ready system. Every decision from storage engines to memory allocation affects real user experience. BookVault customers don't care about your normalization theory. They care whether book searches return results in under 200 milliseconds.

But physical design isn't a one-time decision. Traffic patterns change. New features require different access patterns. Hardware costs shift. The best database architects continuously monitor and adjust their physical implementations while keeping logical schemas stable.

Quiz

1. BookVault needs to optimize storage engines for their main tables. Which combination makes the most sense for orders and books tables?


2. BookVault's stock_qty column stores book inventory counts, typically ranging from 0 to 200 copies. What's the most efficient data type choice?


3. BookVault's orders table grows by 10,000+ rows monthly and most queries search recent orders. What partitioning strategy provides the best query performance?


Up Next

Indexing Strategy

Learn how to choose the right indexes to make your physical database design fly with lightning-fast query performance.