DataBase Design Lesson 34 – Database Design for Microservices | Dataplexa

Database Design · Lesson 34

Database Design for Microservices

Design separate databases for each microservice, implement data sharing patterns, and handle distributed transactions across BookVault's order management system.

Understanding Microservices Architecture

Traditional applications use one big database. Microservices break that apart. Each service owns its data completely.

Think of Netflix. They don't store user profiles, movie recommendations, and billing in one massive database. Each team owns their slice. The recommendation service has its own database. The billing service has its own. And payment processing? Separate again.

This creates a problem. How do you handle an order that needs customer data AND book inventory data AND payment processing? That's what we're solving today.

Monolith to Services

Database Per Service

Data Integration Patterns

Distributed Transactions

BookVault currently has all tables in one database. We'll split it into services: Customer Service, Catalog Service, Order Service, and Payment Service.

Database Per Service Pattern

Each microservice gets its own database. No sharing. No cross-service SQL joins. This sounds crazy at first. But it works.

Here's how BookVault splits across services:

Customer Service DB

        customers

        customer_preferences

        customer_addresses
      
Catalog Service DB

        books

        authors

        categories

        inventory
      
Order Service DB

        orders

        order_items

        order_status_history
      
Payment Service DB

        payments

        payment_methods

        transactions

Each service can choose its own database technology. Customer Service might use PostgreSQL. Catalog Service could use MongoDB for flexible book metadata. Order Service needs ACID transactions, so PostgreSQL again. Payment Service might use a specialized financial database.

Creating Service Schemas

Let's build the Order Service database. It needs to store orders without direct foreign keys to customers or books:

-- Order Service Database Schema
-- Stores order data with external references

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL,  -- Reference to Customer Service
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    status VARCHAR(20) DEFAULT 'pending',
    total_amount DECIMAL(10,2) NOT NULL,
    shipping_address TEXT NOT NULL
);

CREATE TABLE order_items (
    order_item_id SERIAL PRIMARY KEY,
    order_id INTEGER REFERENCES orders(order_id),
    book_id INTEGER NOT NULL,  -- Reference to Catalog Service
    quantity INTEGER NOT NULL,
    unit_price DECIMAL(10,2) NOT NULL,
    book_title VARCHAR(255),   -- Cached from Catalog Service
    book_author VARCHAR(255)   -- Cached from Catalog Service
);

CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_date ON orders(order_date);

CREATE TABLE
CREATE TABLE
CREATE INDEX
CREATE INDEX

What just happened?

Notice customer_id and book_id are integers, not foreign keys. We cache book_title and book_author to avoid constant API calls. Try this: Add a customer_email field for cached customer data.

Data Sharing Patterns

Services can't do database joins anymore. They need other ways to get data from each other. Three main patterns solve this.

Pattern 1: API Calls

Services call each other's APIs to get data. When Order Service processes an order, it calls Customer Service to get the shipping address:

-- Order Service needs customer data
-- But customer table is in Customer Service
-- Solution: API call, then store the order

INSERT INTO orders (
    customer_id,
    total_amount,
    shipping_address
) VALUES (
    12345,
    89.99,
    'Retrieved via Customer Service API'
);

Pattern 2: Data Caching

Store copies of frequently-needed data locally. Order Service caches book titles and prices to avoid calling Catalog Service every time:

-- Cached book data in Order Service
-- Updated when Catalog Service publishes changes

CREATE TABLE book_cache (
    book_id INTEGER PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    author_name VARCHAR(255),
    current_price DECIMAL(10,2),
    last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Use cached data for order creation
INSERT INTO order_items (
    order_id, book_id, quantity, unit_price, book_title
) 
SELECT 1, 101, 2, current_price, title
FROM book_cache 
WHERE book_id = 101;

Pattern 3: Event Streaming

Services publish events when data changes. Other services listen and update their local copies. Kafka and RabbitMQ handle this messaging.

-- Event log table for Order Service
-- Tracks all events this service publishes

CREATE TABLE order_events (
    event_id SERIAL PRIMARY KEY,
    event_type VARCHAR(50) NOT NULL,  -- 'order_created', 'order_shipped'
    order_id INTEGER NOT NULL,
    event_data JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    published BOOLEAN DEFAULT FALSE
);

-- Example: Order created event
INSERT INTO order_events (event_type, order_id, event_data)
VALUES (
    'order_created',
    1001,
    '{"customer_id": 12345, "total": 89.99, "items": [{"book_id": 101, "qty": 2}]}'
);

CREATE TABLE
INSERT 0 1

Pro Tip: Use JSONB for event data in PostgreSQL. It's indexable and flexible for different event types. Each service can extract the fields it needs from the event payload.

Distributed Transactions

Here's the big problem. An order needs to:

1. Create the order record
2. Reduce book inventory
3. Charge the customer
4. Send confirmation email

Each step hits a different service. What if step 3 fails but steps 1 and 2 succeeded? You've got an order and reduced inventory, but no payment. Disaster.

Saga Pattern

Break the transaction into steps. Each step can succeed or fail. If any step fails, run compensating actions to undo previous steps. Think of it like a carefully choreographed dance.

-- Saga orchestration table in Order Service
-- Tracks multi-step transaction progress

CREATE TABLE order_saga (
    saga_id SERIAL PRIMARY KEY,
    order_id INTEGER NOT NULL,
    current_step VARCHAR(50) NOT NULL,
    saga_status VARCHAR(20) DEFAULT 'in_progress',
    steps_completed JSONB DEFAULT '[]',
    compensation_needed JSONB DEFAULT '[]',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Example saga record for order processing
INSERT INTO order_saga (order_id, current_step, steps_completed)
VALUES (
    1001,
    'inventory_reserved',
    '["order_created", "inventory_reserved"]'
);

When payment fails, the saga runs compensating actions:

-- Compensation tracking when saga fails
-- Records what needs to be undone

UPDATE order_saga 
SET 
    saga_status = 'compensating',
    compensation_needed = '[
        {"action": "release_inventory", "book_id": 101, "quantity": 2},
        {"action": "cancel_order", "order_id": 1001}
    ]'
WHERE order_id = 1001;

-- Cancel the order
UPDATE orders 
SET status = 'cancelled' 
WHERE order_id = 1001;

UPDATE 1
UPDATE 1

What just happened?

The saga marked itself as compensating and listed the undo actions needed. Each service will process its compensation step. The order status changed to cancelled. Try this: Add a retry mechanism for failed payment steps.

Data Consistency Challenges

Microservices trade strong consistency for availability and scalability. Eventually consistent becomes your new reality.

A customer might see their order as "confirmed" while the inventory service still shows the book as "available" to other customers. The data will sync up, but not immediately.

Strong Consistency

All services see the same data at the same time. ACID transactions across services. Slow but predictable.

Eventual Consistency

Services sync data over time through events. Fast and scalable. Data might be temporarily out of sync.

Handling Data Conflicts

What happens when two services have conflicting data? Build conflict resolution into your schema:

-- Conflict resolution with version numbers
-- Each update increments the version

ALTER TABLE book_cache 
ADD COLUMN version_number INTEGER DEFAULT 1,
ADD COLUMN source_service VARCHAR(50) DEFAULT 'catalog';

-- Update only if version is newer
UPDATE book_cache 
SET 
    current_price = 24.99,
    version_number = version_number + 1,
    last_updated = CURRENT_TIMESTAMP
WHERE 
    book_id = 101 
    AND version_number < 5;  -- Only if our version is older

Common Mistake

Don't replicate ALL data everywhere. Only cache what you frequently need. Replicating entire customer records in every service creates a maintenance nightmare when customer data changes.

Data Insight

Amazon processes over 66,000 orders per hour during peak times. Their order processing spans 12+ microservices. Each service handles one piece: inventory, pricing, shipping, recommendations. The key? They cache aggressively and accept eventual consistency.

Implementation Strategy

Don't split everything at once. That's a recipe for disaster. Start small and grow.

For BookVault, split in this order:

Phase 1: Read-Only

Split Catalog Service first. Books and authors are mostly read-only. Low risk.

Phase 2: User Data

Extract Customer Service. Users can update profiles independently.

Phase 3: Transactions

Split Order Service. Implement saga pattern for order processing.

Phase 4: Critical Systems

Extract Payment Service last. Money is sensitive. Test thoroughly.

Uber took 3 years to fully decompose their monolith. Airbnb is still migrating services 6 years after starting. Patience beats perfection.

Quiz

Up Next

Mini Project

Design a complete database system for an e-learning platform, applying all the concepts from normalization to microservices architecture.

← Previous Course Index Next →