Mango DBLesson 4 – MongoDB Features & Use Cases | Dataplexa

MongoDB Features & Use Cases

MongoDB's popularity is not accidental. It was built from the ground up to solve the pain points that developers and architects hit repeatedly with relational databases — rigid schemas, painful scaling, and a mismatch between application objects and database rows. Understanding MongoDB's full feature set helps you know exactly what it can do for your project and where it genuinely excels over alternatives.

This lesson covers every major MongoDB feature in depth and maps each one to the real-world scenarios where it delivers the most value.

1. Flexible Document Model

The document model is MongoDB's foundational feature. Instead of rows with fixed columns, data is stored as BSON documents — rich, nested objects that can hold strings, numbers, arrays, sub-documents, dates, and binary data all in one place.

Why it matters: application objects rarely map cleanly to flat rows. A user has multiple addresses, multiple orders, and multiple preferences. In a relational database these require four tables and three JOINs. In MongoDB it is one document.

Real-world use: content management systems like The Guardian store articles with varying structures — some have videos, some have image galleries, some have related links — all in the same collection with different fields per document.

# Flexible document model — same collection, different structures

product_shoe = {
    "_id": "prod_001",
    "type": "shoe",
    "name": "Air Runner Pro",
    "brand": "SpeedFoot",
    "sizes": [6, 7, 8, 9, 10, 11],       # shoe-specific field
    "colours": ["black", "white", "red"],  # shoe-specific field
    "price": 89.99
}

product_laptop = {
    "_id": "prod_002",
    "type": "laptop",
    "name": "UltraBook X1",
    "brand": "TechCore",
    "ram_gb": 16,          # laptop-specific field
    "storage_gb": 512,     # laptop-specific field
    "cpu": "Intel i7",     # laptop-specific field
    "price": 1199.99
}

# Both live in the same "products" collection — no schema clash
products = [product_shoe, product_laptop]
for p in products:
    print(f"{p['name']} ({p['type']}) — ${p['price']}")

Air Runner Pro (shoe) — $89.99
UltraBook X1 (laptop) — $1199.99

No ALTER TABLE needed when adding new fields — just add them to new documents
Different documents in the same collection can have completely different shapes
Nested documents and arrays eliminate the need for most JOIN operations
Schema changes in MongoDB are additive — old documents coexist with new ones during migration

2. Rich Query Language (MQL)

MongoDB's Query Language (MQL) is expressive and powerful. Despite being NoSQL, MongoDB supports filtering, sorting, projection, pagination, regular expressions, geospatial queries, and full-text search — all through a consistent JSON-based API.

# MQL — rich queries as Python dictionaries

# Filter: products priced between $50 and $200, sorted by price ascending
query = {
    "price": {"$gte": 50, "$lte": 200}
}
sort  = [("price", 1)]   # 1 = ascending

# Projection: return only name and price, exclude _id
projection = {"name": 1, "price": 1, "_id": 0}

# Regex: find products whose name starts with "Ultra"
regex_query = {"name": {"$regex": "^Ultra", "$options": "i"}}

# Nested field query using dot notation
nested_query = {"address.city": "London"}

# Array query: products available in size 9
array_query = {"sizes": 9}

# Logical operators
logic_query = {
    "$or": [
        {"price": {"$lt": 50}},
        {"brand": "TechCore"}
    ]
}

print("MQL queries built — ready to pass to db.collection.find()")

MQL queries built — ready to pass to db.collection.find()

Comparison operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin
Logical operators: $and, $or, $not, $nor
Array operators: $elemMatch, $all, $size
Dot notation queries nested fields directly — no JOIN or subquery needed

3. Aggregation Pipeline

The aggregation pipeline is MongoDB's answer to SQL's GROUP BY, HAVING, and complex analytical queries. Data flows through a sequence of stages — each stage transforms the documents and passes them to the next. This makes complex analytics both readable and highly performant.

Real-world use: an e-commerce dashboard calculates total revenue per category, average order value per customer segment, and top-selling products by month — all through aggregation pipelines running directly in the database.

# Aggregation pipeline — analytics without leaving the database

pipeline = [
    # Stage 1: filter — only completed orders
    {"$match": {"status": "completed"}},

    # Stage 2: group — total revenue and order count per category
    {"$group": {
        "_id": "$category",
        "total_revenue": {"$sum": "$amount"},
        "order_count":   {"$sum": 1},
        "avg_order":     {"$avg": "$amount"}
    }},

    # Stage 3: add computed field — average formatted
    {"$addFields": {
        "avg_order_rounded": {"$round": ["$avg_order", 2]}
    }},

    # Stage 4: sort — highest revenue first
    {"$sort": {"total_revenue": -1}},

    # Stage 5: limit — top 5 categories only
    {"$limit": 5}
]

# Run with: db.orders.aggregate(pipeline)
print("Pipeline defined — 5 stages: match → group → addFields → sort → limit")

Pipeline defined — 5 stages: match → group → addFields → sort → limit

Pipeline stages run in order — each stage receives the output of the previous one
Common stages: $match, $group, $sort, $limit, $project, $unwind, $lookup, $addFields
Aggregations run inside the database — no need to pull data to the application layer for processing
MongoDB optimises pipelines automatically — placing $match early reduces the data processed by later stages

4. Indexing

MongoDB supports a rich variety of indexes that dramatically speed up query performance. Without an index, MongoDB performs a full collection scan — reading every document. With the right index, it jumps directly to matching documents.

# Index types in MongoDB

# Single field index — speed up queries on one field
# db.users.createIndex({"email": 1})

# Compound index — speed up queries on multiple fields together
# db.orders.createIndex({"user_id": 1, "status": 1})

# Text index — full-text search across string fields
# db.articles.createIndex({"title": "text", "body": "text"})

# Geospatial index — location-based queries (find nearby restaurants)
# db.places.createIndex({"location": "2dsphere"})

# TTL index — auto-delete documents after a time period (e.g. sessions, logs)
# db.sessions.createIndex({"created_at": 1}, {"expireAfterSeconds": 3600})

# Unique index — enforce uniqueness (like a UNIQUE constraint in SQL)
# db.users.createIndex({"email": 1}, {"unique": True})

# Partial index — index only documents matching a filter
# db.orders.createIndex({"amount": 1}, {"partialFilterExpression": {"status": "active"}})

print("Index types defined — each targets a specific query pattern")

Index types defined — each targets a specific query pattern

Always create indexes for fields used in find() filters, sort operations, and aggregation $match stages
TTL indexes are powerful for automatic data expiry — sessions, tokens, temporary records
Text indexes enable full-text search without a separate search engine for simple use cases
Too many indexes slow down writes — index strategically based on actual query patterns

5. Horizontal Scaling with Sharding

When a single server can no longer handle the data volume or query throughput, MongoDB distributes data across multiple servers through sharding. Each shard holds a subset of the data, and MongoDB routes queries to the correct shard automatically — the application sees one database.

Real-world use: Pokémon Go used MongoDB to handle tens of millions of concurrent players at launch. eBay uses MongoDB sharding to distribute billions of listing documents across dozens of servers.

# Sharding concept — distributing data across servers

# Choose a shard key — determines how data is distributed
# Good shard key: high cardinality, even distribution, matches query patterns

# Range-based sharding — documents with similar shard key values go to same shard
shard_config_range = {
    "shardKey": {"user_id": 1},
    "type": "range"
    # user_id 1-1000     → Shard A
    # user_id 1001-2000  → Shard B
    # user_id 2001-3000  → Shard C
}

# Hashed sharding — shard key is hashed for even distribution
shard_config_hashed = {
    "shardKey": {"user_id": "hashed"},
    "type": "hashed"
    # hash(user_id) distributed evenly across all shards
}

print("Sharding config defined")
print("Range sharding — good for range queries")
print("Hashed sharding — good for even write distribution")

Sharding config defined
Range sharding — good for range queries
Hashed sharding — good for even write distribution

Sharding is transparent to the application — the mongos router handles query routing automatically
Choose a shard key carefully — a poor choice causes hotspots where one shard gets all the traffic
Hashed sharding distributes writes evenly; range sharding is better for range-based queries
MongoDB can add new shards to a running cluster without downtime

6. High Availability with Replica Sets

A replica set is a group of MongoDB servers that hold copies of the same data. One node is the primary (handles all writes), the others are secondaries (replicate from the primary). If the primary fails, the secondaries automatically elect a new primary — typically within seconds.

Real-world use: every production MongoDB deployment uses replica sets. A three-node replica set across three availability zones means the database survives the failure of an entire data centre.

# Replica set — automatic failover and data redundancy

replica_set = {
    "name": "rs0",
    "members": [
        {"id": 0, "host": "mongo1:27017", "role": "PRIMARY"},
        {"id": 1, "host": "mongo2:27017", "role": "SECONDARY"},
        {"id": 2, "host": "mongo3:27017", "role": "SECONDARY"}
    ],
    "behaviour": {
        "writes":    "always go to PRIMARY",
        "reads":     "PRIMARY by default, can read from SECONDARY",
        "failover":  "automatic — secondary elected primary within seconds",
        "heartbeat": "every 2 seconds between members"
    }
}

for member in replica_set["members"]:
    print(f"  {member['host']} — {member['role']}")

print("\nFailover:", replica_set["behaviour"]["failover"])

mongo1:27017 — PRIMARY
mongo2:27017 — SECONDARY
mongo3:27017 — SECONDARY

Failover: automatic — secondary elected primary within seconds

Minimum recommended replica set size is three nodes — ensures a majority vote for elections
Read preference can be set to secondaryPreferred to distribute read traffic across replicas
Replica sets also serve as the building block for sharded clusters — each shard is itself a replica set
Oplog (operations log) is how secondaries replicate changes from the primary

7. ACID Transactions

Since MongoDB 4.0, multi-document ACID transactions are fully supported. You can update documents across multiple collections atomically — either all changes commit or none do. This closes the gap with relational databases for transaction-heavy workloads.

# Multi-document ACID transaction (PyMongo)

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017")
db     = client["bank"]

# Transfer $500 from Alice to Bob — must be atomic
with client.start_session() as session:
    with session.start_transaction():
        db.accounts.update_one(
            {"owner": "Alice"},
            {"$inc": {"balance": -500}},
            session=session
        )
        db.accounts.update_one(
            {"owner": "Bob"},
            {"$inc": {"balance": 500}},
            session=session
        )
        # Both updates commit together — or neither does if an error occurs
        print("Transfer committed successfully")

Transfer committed successfully

Use transactions when you need to update multiple documents or collections atomically
For single-document updates, MongoDB is always atomic — no transaction needed
Transactions have a performance cost — use them only when ACID guarantees are required
Transactions work across shards in MongoDB 4.2+ — full distributed ACID support

Key Use Cases

MongoDB is the database of choice across a wide range of industries and application types. Here is where it consistently delivers the most value.

Content & catalogue management — variable document structures per content type. CMS platforms, e-commerce catalogues, media libraries.
User profiles & personalisation — rich, nested user objects with preferences, history, and settings. Social networks, SaaS dashboards.
Real-time analytics — aggregation pipelines running directly on live data. IoT dashboards, operational reporting.
Mobile & web applications — rapid iteration, evolving schemas, JSON-native API responses. Startups, consumer apps.
Gaming — player profiles, game state, leaderboards, inventory. EA, Activision, and Ubisoft all use MongoDB.
IoT & time-series data — high-volume sensor writes, time-series collections. Smart devices, industrial monitoring.
Geospatial applications — location-based search, proximity queries, mapping. Delivery apps, real-estate platforms.

Summary Table

Feature	What It Does	Best Used For
Document model	Flexible nested BSON storage	Variable structures, rapid schema evolution
Rich MQL	JSON-based queries with operators	Filtering, sorting, search, geospatial
Aggregation pipeline	Multi-stage data transformation	Analytics, reporting, dashboards
Indexing	Speed up queries on specific fields	High-traffic queries, text search, geo, TTL
Sharding	Distribute data across multiple servers	Web-scale data volume and throughput
Replica sets	Automatic failover and redundancy	High availability, disaster recovery
ACID transactions	Multi-document atomic operations	Financial operations, inventory updates

Practice Questions

Practice 1. What makes the document model ideal for a product catalogue with different product types?

Practice 2. What is a TTL index and what is it used for?

Practice 3. In a replica set, which node handles all write operations?

Practice 4. From which version of MongoDB are multi-document ACID transactions supported?

Practice 5. What is the role of the mongos router in a sharded cluster?

Quiz

Quiz 1. Which aggregation stage is used to filter documents before processing in a pipeline?

$match
$filter
$where
$select

Quiz 2. What type of sharding distributes documents evenly by hashing the shard key?

Hashed sharding
Range sharding
Zone sharding
Tag sharding

Quiz 3. How many nodes are the minimum recommended for a replica set?

Two
Three
Four
Five

Quiz 4. What MongoDB index type is used for location-based proximity queries?

2dsphere index
Text index
Compound index
TTL index

Quiz 5. What is the key advantage of running aggregation pipelines inside the database rather than in the application?

Data is processed where it lives — no need to transfer large volumes to the application layer
Aggregation pipelines are faster than application code
Application servers cannot handle mathematical operations
MongoDB does not support returning raw documents

Next up — MongoDB Architecture: how mongod, mongos, replica sets, and sharded clusters fit together under the hood.

← Previous Course Index Next →