NO SQL Lesson 33 – NoSQL in Microservices | Dataplexa
Enterprise & Cloud · Lesson 33

NoSQL in Microservices

Microservices and NoSQL were practically born together. Both emerged from the same frustration: a single monolithic system — one codebase, one database, one deployment — that could not scale independently, could not be updated without risk, and could not adopt the best tool for each problem. When you split a monolith into services, you do not just split the code. You split the data too — and the choices you make about which database each service gets determine whether you build something that scales, or a distributed monolith that is harder to operate than what you started with.

Database per Service — The Core Pattern

The foundational rule of microservices data architecture is that each service owns its own database. No service reads directly from another service's database. No service shares a database schema with another service. Data crosses service boundaries only through APIs — never through a shared table or a shared connection pool.

This sounds strict. It is. The constraint is what makes independent deployment possible. If Service A and Service B share a database table, you cannot deploy A without coordinating with B. Schema changes become cross-team negotiations. The boundary that was supposed to give you independence disappears.

❌ Shared database — hidden coupling

// Order service reads user table directly
SELECT * FROM users WHERE id = ?
// Recommendation service reads order table
SELECT * FROM orders WHERE user_id = ?

// Problem: every team must coordinate
// schema changes. Deploy together or
// break each other. Distributed monolith.

✅ Database per service — true independence

// Order service calls User service API
GET /users/{"{id}"} → {"{"} name, email {"}"}
// Recommendation service calls Order API
GET /orders?user_id={"{id}"} → [...]

// Each service owns its DB schema.
// Deploy independently. Scale separately.
// Choose the right DB per service.

Polyglot Persistence in Practice — Matching Database to Service

Once each service owns its own database, you are free to choose the right database for each service's specific access patterns — rather than forcing everything into one general-purpose database. This is polyglot persistence applied at service granularity.

E-commerce Platform — Each Service Gets the Right Database

User Service

PostgreSQL

ACID required. Auth, profile, addresses. Structured, relational, transactional.

Product Service

MongoDB

Flexible schema per category. Rich attribute queries. Schema evolves fast.

Session Service

Redis

Sub-ms reads. TTL for expiry. Cart, tokens, rate-limit counters.

Event Service

Cassandra

Millions of writes/sec. Clickstream, page views, A/B events. TTL for retention.

Recommendation Service

Neo4j

Purchase graph traversal. Collaborative filtering. Multi-hop relationships.

Order Service

PostgreSQL

Multi-step transactions. Inventory debit + order creation must be ACID.

The Data Consistency Challenge — Cross-Service Transactions

Database per service solves deployment independence but creates a new problem: there is no ACID transaction that spans service boundaries. When a customer places an order, you need to: debit inventory in the Product service, create the order in the Order service, and charge the payment in the Payment service — three operations across three services with three different databases. Any of them can fail independently.

The two patterns for handling this are the Saga pattern (covered in Lesson 30) and the Outbox pattern. The Outbox pattern is particularly important for NoSQL-backed services because it solves the dual-write problem — the race condition between writing to your database and publishing an event to a message broker.

The Outbox Pattern — Solving Dual-Write

Every event-driven microservice faces the same problem: you need to write to your database and publish a message to Kafka (or another broker). If you do both in sequence, one can succeed while the other fails. The inventory is decremented but the event never reaches the Order service. Or the event is published but the inventory write fails. Either way, the system is inconsistent.

The scenario: You are building the Product service. When inventory is decremented, an inventory.updated event must reach the Order service reliably. A network blip between your MongoDB and Kafka means the event is sometimes lost. You are implementing the Outbox pattern to make event publishing atomic with the database write.

from pymongo import MongoClient
import datetime, uuid

client = MongoClient("mongodb://localhost:27017/")
db     = client["product_service"]

def decrement_inventory(product_id: str, quantity: int):
    """
    Outbox pattern: write the business state change AND the outbox event
    in a single MongoDB transaction. The relay process reads the outbox
    and publishes to Kafka — separately and reliably.
    """
    with client.start_session() as session:
        session.start_transaction()
        try:
            # Step 1: update inventory
            result = db.inventory.update_one(
                {"_id": product_id, "stock": {"$gte": quantity}},
                {"$inc": {"stock": -quantity}},
                session=session
            )
            if result.modified_count == 0:
                raise ValueError("Insufficient stock")

            # Step 2: write event to outbox collection (same transaction)
            db.outbox.insert_one({
                "_id":        str(uuid.uuid4()),
                "topic":      "inventory.updated",
                "payload":    {"product_id": product_id,
                               "quantity_decremented": quantity},
                "created_at": datetime.datetime.utcnow(),
                "published":  False     # relay will set this True after Kafka ACK
            }, session=session)

            session.commit_transaction()
            return {"ok": True}

        except Exception as e:
            session.abort_transaction()
            return {"ok": False, "reason": str(e)}
>>> decrement_inventory("prod_trainers_xl", 1)
{'ok': True}
Inventory decremented: prod_trainers_xl  stock -1
Outbox event written: inventory.updated  (published: False)
Both writes in one transaction — either both commit or neither does  ✓

>>> decrement_inventory("prod_trainers_xl", 9999)
{'ok': False, 'reason': 'Insufficient stock'}
Transaction aborted: inventory unchanged, no outbox event written  ✓
Outbox event in the same transaction as the business write

The key insight: the outbox event is written to a MongoDB collection, not directly to Kafka. Since both the inventory update and the outbox insert are inside the same MongoDB transaction, they are atomic — either both commit or neither does. The dual-write problem is eliminated at the database level. Kafka never receives a direct write from the business logic.

published: False — the relay picks this up

A separate relay process (often called the Outbox Relay or Transactional Outbox Processor) polls the outbox collection for events where published: False, publishes them to Kafka, then sets published: True. If the relay crashes between publishing and updating the flag, it will re-publish on restart — at-least-once delivery. Consumers must be idempotent to handle duplicates.

The scenario continues: You implement the relay process that polls the outbox collection and forwards events to Kafka. It runs as a separate lightweight service alongside the Product service.

import time
from kafka import KafkaProducer
import json

producer = KafkaProducer(
    bootstrap_servers=['kafka:9092'],
    value_serializer=lambda v: json.dumps(v).encode('utf-8')
)

def outbox_relay():
    """Poll outbox for unpublished events and forward to Kafka."""
    while True:
        # Find up to 100 unpublished events, oldest first
        pending = list(db.outbox.find(
            {"published": False},
            sort=[("created_at", 1)],
            limit=100
        ))

        for event in pending:
            # Publish to Kafka topic
            producer.send(event["topic"], value=event["payload"])
            producer.flush()   # wait for broker ACK

            # Mark as published only after Kafka confirms receipt
            db.outbox.update_one(
                {"_id": event["_id"]},
                {"$set": {"published": True,
                          "published_at": datetime.datetime.utcnow()}}
            )

        time.sleep(0.1)   # poll every 100ms
Outbox relay started — polling every 100ms

[10:42:01] Found 1 unpublished event
  → Kafka: inventory.updated  {"product_id": "prod_trainers_xl", ...}
  → Kafka ACK received
  → outbox.published = True  ✓

[10:42:01] No pending events — sleeping 100ms

Guarantee: if relay crashes before marking published=True,
the event is re-sent on restart (at-least-once delivery).
producer.flush() before marking published

producer.flush() blocks until Kafka acknowledges the message. Only then does the relay update published: True. If the relay crashes between the Kafka send and the MongoDB update, the event has already reached Kafka but published is still False — the relay will re-send on restart, and downstream consumers must handle the duplicate. This is at-least-once delivery: the event is guaranteed to arrive, but may arrive more than once.

Event Sourcing with NoSQL — Commands and Events

Some microservices go further than the Outbox pattern and adopt event sourcing — storing the full history of changes as an immutable event log rather than storing just the current state. The current state is always derived by replaying the event log. This pairs naturally with NoSQL document stores and Cassandra's append-heavy write model.

The scenario: Your Order service uses event sourcing. Every state change to an order — created, confirmed, shipped, delivered, refunded — is stored as an immutable event. The current order state is computed by replaying all events for that order ID.

# Event store in MongoDB — append-only, never update or delete
def append_event(order_id: str, event_type: str, data: dict):
    db.order_events.insert_one({
        "order_id":   order_id,
        "event_type": event_type,
        "data":       data,
        "occurred_at": datetime.datetime.utcnow(),
        "version":    db.order_events.count_documents(
                          {"order_id": order_id}) + 1
    })

# Record the order lifecycle
append_event("ord_9912", "OrderCreated",
    {"customer_id": "cust_441", "items": ["prod_001"], "total": 149.99})
append_event("ord_9912", "PaymentConfirmed",
    {"payment_id": "pay_7721", "amount": 149.99})
append_event("ord_9912", "OrderShipped",
    {"carrier": "DHL", "tracking": "DHL998821"})

# Rebuild current state by replaying all events
def get_order_state(order_id: str) -> dict:
    events = list(db.order_events.find(
        {"order_id": order_id},
        sort=[("version", 1)]
    ))
    state = {}
    for event in events:
        if   event["event_type"] == "OrderCreated":    state.update(event["data"]); state["status"] = "created"
        elif event["event_type"] == "PaymentConfirmed": state["status"] = "paid"
        elif event["event_type"] == "OrderShipped":    state["status"] = "shipped"; state.update(event["data"])
    return state
get_order_state("ord_9912"):
{
  "customer_id": "cust_441",
  "items":       ["prod_001"],
  "total":       149.99,
  "status":      "shipped",
  "carrier":     "DHL",
  "tracking":    "DHL998821"
}

Full audit trail preserved — replay to any point in time ✓
Events: [OrderCreated v1, PaymentConfirmed v2, OrderShipped v3]
Append-only — never update or delete

The event store is a pure append log — no event is ever modified or deleted. This gives you a complete audit trail and the ability to replay history to any point in time. To issue a refund, you append a RefundIssued event — you do not update the OrderCreated event. This pairs naturally with Cassandra's LSM-tree write model, where sequential appends are the fastest possible operation.

Replay at scale — projections and snapshots

Replaying thousands of events per order to compute current state becomes expensive at scale. The standard fix is a snapshot — periodically persist the computed state after every N events, then only replay events since the last snapshot. For an order with 1,000 events, a snapshot every 100 events reduces replay to at most 99 events. Store snapshots in a separate collection alongside the event log.

Service Mesh and NoSQL Connection Management

In a Kubernetes-based microservices deployment, each service pod has its own database connection pool. With 50 pods running the Product service, each maintaining a pool of 20 MongoDB connections, the database receives 1,000 simultaneous connections just from one service — before the other 20 services connect. Connection management becomes a first-class concern at microservices scale.

Connection pooling per pod

Each pod maintains its own pool. Size the pool to match actual concurrency per pod — not the maximum the database can handle. 50 pods × 20 connections = 1,000 total connections. Monitor with db.serverStatus().connections.

ProxySQL / PgBouncer

A connection proxy sits between your pods and the database, multiplexing many application connections onto fewer actual database connections. Reduces the connection overhead on the database from 1,000 to 50 — a 20× reduction with no application changes.

Sidecar pattern

Deploy a database proxy as a sidecar container in each pod. The sidecar handles connection pooling, circuit breaking, and retry logic — keeping the main service container focused on business logic. Used by Istio, Envoy, and similar service mesh tools.

Circuit breaker

If the database becomes slow or unavailable, a circuit breaker stops sending requests to it and returns a fallback response immediately. Prevents cascading failures where one slow database causes all pods to block, exhausting threads across the entire service mesh.

Hands-on — Wiring a Microservice with MongoDB and Kafka

The scenario: You are wiring up the Product service for the e-commerce platform. The service exposes a REST API, uses MongoDB for its product catalogue, and publishes events to Kafka when products change. You are implementing the service entry point with proper connection management and the Outbox pattern already in place.

from fastapi import FastAPI, HTTPException
from pymongo import MongoClient
from contextlib import asynccontextmanager
import os

# ---- Startup / shutdown connection lifecycle ----
@asynccontextmanager
async def lifespan(app: FastAPI):
    # Open pool once at startup — not per-request
    app.state.mongo = MongoClient(
        os.environ["MONGO_URI"],
        maxPoolSize=20,          # sized to pod concurrency
        minPoolSize=5,
        serverSelectionTimeoutMS=3000,
        waitQueueTimeoutMS=2000  # fail fast, don't queue forever
    )
    app.state.db = app.state.mongo["product_service"]
    yield
    # Clean shutdown — return connections to pool
    app.state.mongo.close()

app = FastAPI(lifespan=lifespan)

@app.get("/products/{product_id}")
async def get_product(product_id: str):
    doc = app.state.db.products.find_one(
        {"_id": product_id},
        {"_id": 1, "name": 1, "price": 1, "stock": 1}  # projection
    )
    if not doc:
        raise HTTPException(status_code=404, detail="Not found")
    return doc

@app.post("/products/{product_id}/purchase")
async def purchase(product_id: str, quantity: int):
    result = decrement_inventory(product_id, quantity)
    if not result["ok"]:
        raise HTTPException(status_code=409, detail=result["reason"])
    return {"purchased": True}
GET /products/prod_trainers_xl
  → {"_id": "prod_trainers_xl", "name": "Trainers XL",
     "price": 89.99, "stock": 43}
  Response time: 2.1ms

POST /products/prod_trainers_xl/purchase?quantity=1
  → {"purchased": true}
  Inventory decremented, outbox event written  ✓

POST /products/prod_sold_out/purchase?quantity=1
  → 409 Conflict: "Insufficient stock"
  Transaction aborted, no event written  ✓
MongoClient created once in lifespan — not per request

The lifespan context manager is FastAPI's lifecycle hook — it runs once at startup and once at shutdown. Creating the MongoClient here instead of inside a request handler means one connection pool is shared across all requests on this pod. The most common microservices MongoDB bug is creating a new MongoClient on every request — exhausting file descriptors under any meaningful load within seconds.

Projection on the GET handler

The product endpoint projects only four fields — _id, name, price, stock. Product documents in production often contain hundreds of specification fields. Without the projection, the database transfers the entire document over the network, the driver deserialises it, and the response serialiser re-serialises it — all for data the client never requested. At high request volume, projections reduce database I/O, network bandwidth, and CPU in the application server simultaneously.

Teacher's Note

The two mistakes that turn a well-designed microservices architecture into an operational nightmare are sharing a database between services (destroying the independence that microservices were built to provide) and skipping the Outbox pattern (creating silent data loss when any service-to-broker write fails). Both are easy to do in prototypes and expensive to fix six months later when inconsistencies surface in production. Get these two right from the start and the rest of the architecture will be much easier to evolve.

Practice Questions — You're the Engineer

Scenario:

Your inventory service writes a stock decrement to MongoDB and then publishes an event to Kafka. Over the past month, monitoring revealed 38 cases where the MongoDB write succeeded but the Kafka publish failed silently — leaving downstream services with stale inventory data. Your tech lead says to stop publishing directly to Kafka from business logic and instead write the event to a collection in the same MongoDB transaction as the inventory update, then use a separate relay process to forward it to Kafka. What pattern is this?


Scenario:

Your outbox relay process publishes an event to Kafka and then marks it as published: True in MongoDB. The relay crashes between the Kafka send and the MongoDB update. On restart, the relay finds the event still marked published: False and re-sends it to Kafka. The downstream consumer receives the same event twice. A senior engineer says this is expected behaviour for this delivery guarantee. What delivery guarantee does the outbox relay provide?


Scenario:

Your event-sourced Order service stores every state change as an immutable event. Orders from long-running enterprise customers now have 8,000+ events each. Rebuilding the current order state requires replaying all 8,000 events on every read — taking 800ms per request. Your tech lead says to periodically persist the computed state after every 100 events so that rebuilds only need to replay at most 99 events. What is this persisted computed state called?


Quiz — NoSQL in Microservices in Production

Scenario:

During an architecture review, you discover that the Order service and the Recommendation service both have direct MongoDB connections to the same database and both read from the orders collection. The Order service team wants to rename the created_at field to ordered_at. They cannot deploy the change without coordinating with the Recommendation team. What architectural violation is causing this coordination dependency?

Scenario:

A new microservice deployed to Kubernetes is crashing under modest load with Too many open files errors. You review the handler code and find that every incoming HTTP request calls MongoClient("mongodb://...") to create a new client, runs the query, then calls client.close(). There are 30 pods each receiving 100 req/sec. What is the correct fix?

Scenario:

Your payment service writes a transaction record to MongoDB and then publishes a payment.completed event to Kafka. A network partition between the service and Kafka causes the event to be dropped 0.1% of the time. The shipping service misses 1 in every 1,000 payments and never ships those orders. The customers do not get their products. What is the correct architectural fix?

Up Next · Lesson 34

NoSQL Security Best Practices

Authentication, authorisation, encryption at rest and in transit — the controls that prevent your database from becoming the breach that ends your company.