NO SQL Lesson 15 – MongoDB in NoSQ | Dataplexa
NoSQL Database Types · Lesson 15

MongoDB in the NoSQL Ecosystem

MongoDB is the most widely deployed document database in the world — used by over 47,000 companies including Forbes, Toyota, eBay, and Verizon. But it exists alongside CouchDB, Firestore, Couchbase, and RavenDB, each built around different philosophies and trade-offs. Knowing where MongoDB sits in this landscape — what it does brilliantly, what it struggles with, and when a different document database is the better call — is what separates engineers who use NoSQL effectively from engineers who use it by accident.

The Document Database Family — Who Built What and Why

MongoDB

Born: 2009 — Developer speed at web scale

Built by the team at 10gen (now MongoDB Inc.) to solve the same problem Foursquare hit — web apps generating rich JSON data that SQL was forcing into unnatural shapes. Prioritises: rich query language, aggregation pipeline, horizontal scaling, developer experience.

CouchDB

Born: 2005 — Offline-first sync and HTTP everywhere

Built for apps that need to work offline and sync when reconnected — think field service apps, medical devices in areas with poor connectivity. Every CouchDB operation is a REST HTTP call. Prioritises: multi-master replication, conflict resolution, HTTP native API.

Firestore

Born: 2017 — Real-time sync for mobile and web

Google's fully managed document database. Designed for apps where data must update in real time across all connected clients — like collaborative tools, live feeds, and mobile apps. Prioritises: real-time listeners, offline support, tight Firebase/GCP integration, zero ops.

Couchbase

Born: 2011 — Document database with key-value speed

A hybrid — document storage with a built-in distributed key-value cache (Memcached lineage). Used heavily in gaming, travel booking, and telecoms where sub-millisecond reads AND rich document queries are both required. Prioritises: memory-first architecture, N1QL (SQL-like query language), edge computing.

MongoDB's Strengths — Where It Genuinely Wins

MongoDB earns its dominance in specific scenarios. These are not opinions — they're patterns seen across thousands of production deployments:

🚀 Rapidly evolving product schemas

Startups shipping 3 features a week. Every sprint adds new fields. Zero migration scripts. Zero 2am ALTER TABLE incidents. The schema evolves with the product without touching the database.

📦 Heterogeneous product catalogues

Electronics have specs, clothing has sizes and colours, food has ingredients and nutrition. One MongoDB collection handles all product types naturally. A SQL table would have hundreds of nullable columns.

🌍 Content management at scale

Articles, videos, podcasts, events — each content type has its own metadata shape. MongoDB stores them all in one collection and queries across them with rich filters. The BBC, The New York Times, and Forbes run their CMS on MongoDB.

📊 Aggregation-heavy analytics

MongoDB's aggregation pipeline handles $group, $bucket, $facet, $lookup — complex analytics without moving data to a separate warehouse for routine reporting.

MongoDB's Weaknesses — Where Engineers Regret It

MongoDB is frequently chosen for the wrong reasons. These are the patterns that consistently cause pain:

REGRET 1

Multi-entity financial transactions

Teams switch e-commerce payment systems to MongoDB because "NoSQL scales better" — then discover that debit Account A + credit Account B across two documents requires multi-document transactions (added in MongoDB 4.0). These work but add significant complexity and latency compared to a single SQL transaction. For pure financial workloads, PostgreSQL is almost always the better call.

REGRET 2

Complex cross-collection reporting

$lookup across 4 collections to generate a monthly P&L report. Each $lookup is expensive without careful indexing. Teams end up recreating SQL JOINs in MongoDB — getting the worst of both worlds: the complexity of NoSQL data modelling with the performance of multi-table JOINs.

REGRET 3

High-volume time-series data

Storing 50,000 IoT sensor readings per second in MongoDB. MongoDB 5.0 added time-series collections to address this — but Cassandra and InfluxDB are still significantly faster and cheaper for pure time-series workloads at extreme scale.

MongoDB vs Firestore — The Mobile/Web Decision

The scenario: You're building a collaborative task management app. Multiple users edit the same task list simultaneously. Changes should appear on all connected devices in real time. Here's how Firestore handles this natively — something MongoDB Atlas does not offer out of the box:

// Firestore real-time listener — JavaScript (web/mobile)
import { getFirestore, doc, onSnapshot } from "firebase/firestore"

const db = getFirestore()

// Subscribe to a task document — fires on every change
const unsubscribe = onSnapshot(
  doc(db, "tasks", "task_8821"),
  (snapshot) => {
    const task = snapshot.data()
    // This callback runs automatically whenever ANY client updates this doc
    console.log("Task updated:", task.title, "— Status:", task.status)
  }
)

// To stop listening:
// unsubscribe()
onSnapshot()

This is Firestore's real-time listener. The callback fires immediately with the current data, then fires again every time the document changes — from any client, anywhere in the world. Google's infrastructure handles the WebSocket connection, fan-out, and delivery. Zero infrastructure to set up.

The MongoDB equivalent

MongoDB has Change Streams — a way to watch for changes on a collection. But it requires you to run a WebSocket server, manage connections, handle reconnection logic, and fan out updates to connected clients. Firestore does all of this for you. For mobile/web apps where real-time sync is the core feature — Firestore wins clearly.

// Firestore write — optimistic offline support built in
import { updateDoc, serverTimestamp } from "firebase/firestore"

await updateDoc(doc(db, "tasks", "task_8821"), {
  status:     "completed",
  completed_by: "u_441",
  completed_at: serverTimestamp()   // server-side timestamp — not client clock
})
-- Write acknowledged immediately (optimistic local update)
-- If offline: change queued locally, syncs when connection restored
-- All onSnapshot() listeners for this document fire within ~100ms
-- serverTimestamp() resolves to server time — prevents clock skew issues
-- Conflict resolution handled automatically by Firestore

serverTimestamp() — a Firestore sentinel value. Instead of sending the client's timestamp (which may be wrong if the device clock is skewed), Firestore replaces it with the server's timestamp when it processes the write. For any time-sensitive data, always use server timestamps.

Offline persistence: If the device has no internet, Firestore queues the write locally and syncs automatically when reconnected — no code required. Building this yourself with MongoDB would take weeks.

MongoDB Change Streams — Real-Time Without Firestore

When you're on MongoDB and need real-time capabilities — for example, notifying a fulfilment system every time an order is placed — Change Streams are the answer:

from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017')
db = client['storefront']

# Watch for any insert or update on the orders collection
pipeline = [{'$match': {'operationType': {'$in': ['insert', 'update']}}}]

with db.orders.watch(pipeline) as stream:
    print("Watching for order changes...")
    for change in stream:                            # blocks, waiting for changes
        op   = change['operationType']               # 'insert' or 'update'
        doc  = change.get('fullDocument', {})        # the changed document
        keys = change.get('updateDescription', {})   # what fields changed

        if op == 'insert':
            print(f"New order: {doc.get('order_id')} — £{doc.get('total')}")
        elif op == 'update':
            print(f"Order updated. Changed fields: {keys.get('updatedFields', {}).keys()}")
Watching for order changes...
New order: ord_9001 — £149.99
New order: ord_9002 — £34.50
Order updated. Changed fields: dict_keys(['status', 'updated_at'])
New order: ord_9003 — £220.00
db.orders.watch(pipeline)

Change Streams use MongoDB's oplog (operation log) — the same log used for replication. Every write to the database is appended to the oplog. The Change Stream tails this log and surfaces matching changes. Requires a replica set (at least one secondary node) — does not work on standalone MongoDB instances.

change['updateDescription']['updatedFields']

For update operations, Change Streams can tell you exactly which fields changed — not just that something changed. This lets downstream consumers react intelligently: only trigger email notification if status changed, not if internal_notes changed.

MongoDB Transactions — When You Need Them

The scenario: A user purchases a product. Two things must happen atomically: decrement the inventory count AND create the order record. If the server crashes between them, neither should persist:

from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/?replicaSet=rs0')
db = client['storefront']

def purchase_product(user_id, product_id, quantity):
    # Start a session for the transaction
    with client.start_session() as session:
        with session.start_transaction():

            # Step 1: Check and decrement inventory
            result = db.products.find_one_and_update(
                {'_id': product_id, 'stock': {'$gte': quantity}},
                {'$inc': {'stock': -quantity}},
                session=session
            )

            if not result:
                # Insufficient stock — transaction auto-aborts
                raise ValueError("Insufficient stock")

            # Step 2: Create the order record
            db.orders.insert_one({
                'user_id':    user_id,
                'product_id': product_id,
                'quantity':   quantity,
                'status':     'confirmed'
            }, session=session)

            # Both operations committed atomically
session.start_transaction()

MongoDB transactions require a session. All operations inside the with session.start_transaction() block are part of the transaction. If any operation fails or raises an exception, the entire transaction is rolled back — both the inventory decrement and the order insert are undone.

session=session passed to every operation

Every database operation that should be part of the transaction must receive the session parameter. Operations without it run outside the transaction — a common bug that causes partial writes.

replicaSet=rs0 in connection string

Multi-document transactions only work on replica sets. A standalone MongoDB instance doesn't support transactions. In production, always run a replica set — even a single-node one works for transaction support.

Complete Comparison — Document Databases Side by Side

Criteria MongoDB CouchDB Firestore Couchbase
Query language Rich MQL + aggregation pipeline MapReduce views + Mango Limited — equality + range N1QL (SQL-like)
Real-time sync Change Streams (manual) _changes feed Built-in onSnapshot DCP (eventing)
Offline support Manual (Realm for mobile) Native — core feature Built-in Lite (embedded) product
Transactions Multi-doc (v4.0+, replica set) Single doc only (MVCC) Single doc atomic ACID multi-doc
Hosting Self-hosted or Atlas (managed) Self-hosted or Cloudant Google Cloud only Self-hosted or Capella
Best for Web apps, catalogues, CMS Offline-first, field apps Mobile/web real-time apps Gaming, travel, telecoms

Real Architecture — How Companies Actually Use MongoDB

📰

Forbes

CMS for 70M+ monthly articles. Each article type (story, listicle, video, podcast) has different metadata. MongoDB handles schema variation with zero migrations.

🚗

Toyota

Connected vehicle platform. 10M+ vehicles send telemetry — each vehicle model has different sensor data shapes. MongoDB's flexible schema absorbs all variations.

🛍️

eBay

Metadata for 1.4 billion product listings. Each category has different required fields. MongoDB's document model handles all 15,000+ product categories in one database.

Decision Framework — MongoDB vs the Alternatives

Choose MongoDB when:

Your data is JSON-shaped with variable structure per record ✦ Your schema evolves frequently ✦ You need rich query language and aggregation ✦ You can self-host or want cloud flexibility (AWS, Azure, GCP) ✦ Your team already knows JavaScript/Python and wants a natural fit

Choose Firestore when:

You're building a mobile or web app ✦ Real-time sync across clients is a core feature ✦ You want zero backend infrastructure ✦ You're already on Google Cloud / Firebase ✦ Offline-first behaviour is required

Choose CouchDB when:

Offline-first is non-negotiable (field apps, medical devices) ✦ Multi-master replication across locations ✦ You want a pure HTTP/REST API with no custom client ✦ Conflict resolution built into your replication model

Teacher's Note

MongoDB's biggest competitive advantage isn't its query language or its scale — it's its ecosystem. The combination of Atlas (managed cloud), Atlas Search (full-text search), Atlas Charts (visualisation), Realm (mobile sync), and Change Streams gives a team the ability to build a full production system without stitching together six different products. That ecosystem integration is why it dominates enterprise adoption. But none of that matters if you pick it for the wrong data model. Match the tool to your data shape first — ecosystem benefits second.

Practice Questions — You're the Engineer

Scenario:

You are building a real-time collaborative whiteboard app for mobile and web. Multiple users draw on the same canvas simultaneously. Every stroke must appear on all connected devices within 100ms. The app must continue working when a user's device goes offline — syncing changes when reconnected. Your team is already using Google Cloud. Which document database should you choose?


Scenario:

Your team wants to use MongoDB multi-document transactions to atomically update inventory and create order records. Your developer runs the code on their local MongoDB installation and gets the error: "Transaction numbers are only allowed on a replica set member or mongos." What infrastructure requirement are they missing?


Scenario:

You are using MongoDB for your order management system. Whenever a new order is inserted into the orders collection, your fulfilment service needs to be notified immediately and start processing it. You do not want to poll the database every second. What MongoDB feature lets you listen for new inserts in real time?


Quiz — Pick the Right Document Database

Scenario:

You're building a B2B marketplace where suppliers list industrial equipment — generators, forklifts, compressors, CNC machines. Each category has completely different technical specifications. New equipment categories are added monthly. Buyers filter by category-specific attributes. You host on AWS and your team uses Node.js. Which document database?

Scenario:

A healthcare NGO deploys field workers in remote areas with no reliable internet. Workers collect patient data on tablets. Data must sync with central servers when connectivity is available. Multiple workers may update the same patient record offline — conflict resolution must be automatic. Which document database was specifically designed for this scenario?

Scenario:

A developer implements a MongoDB transaction: debit a wallet balance AND create a payment record. The inventory deduction is inside session.start_transaction(). When a test simulates a failure after the debit, the wallet is debited but no payment record is created — money is lost. Investigation shows the insert_one call was missing a parameter. What was the missing parameter?

Up Next · Lesson 16

CouchDB Overview

The database that turns every operation into an HTTP call — how CouchDB's MVCC engine, multi-master replication, and offline-first architecture make it the right tool for disconnected systems.