MongoDB
MongoDB Features & Use Cases
MongoDB's popularity is not accidental. It was built from the ground up to solve the pain points that developers and architects hit repeatedly with relational databases — rigid schemas, painful scaling, and a mismatch between application objects and database rows. Understanding MongoDB's full feature set helps you know exactly what it can do for your project and where it genuinely excels over alternatives.
This lesson covers every major MongoDB feature in depth and maps each one to the real-world scenarios where it delivers the most value.
1. Flexible Document Model
The document model is MongoDB's foundational feature. Instead of rows with fixed columns, data is stored as BSON documents — rich, nested objects that can hold strings, numbers, arrays, sub-documents, dates, and binary data all in one place.
Why it matters: application objects rarely map cleanly to flat rows. A user has multiple addresses, multiple orders, and multiple preferences. In a relational database these require four tables and three JOINs. In MongoDB it is one document.
Real-world use: content management systems like The Guardian store articles with varying structures — some have videos, some have image galleries, some have related links — all in the same collection with different fields per document.
# Flexible document model — same collection, different structures
product_shoe = {
"_id": "prod_001",
"type": "shoe",
"name": "Air Runner Pro",
"brand": "SpeedFoot",
"sizes": [6, 7, 8, 9, 10, 11], # shoe-specific field
"colours": ["black", "white", "red"], # shoe-specific field
"price": 89.99
}
product_laptop = {
"_id": "prod_002",
"type": "laptop",
"name": "UltraBook X1",
"brand": "TechCore",
"ram_gb": 16, # laptop-specific field
"storage_gb": 512, # laptop-specific field
"cpu": "Intel i7", # laptop-specific field
"price": 1199.99
}
# Both live in the same "products" collection — no schema clash
products = [product_shoe, product_laptop]
for p in products:
print(f"{p['name']} ({p['type']}) — ${p['price']}")UltraBook X1 (laptop) — $1199.99
- No ALTER TABLE needed when adding new fields — just add them to new documents
- Different documents in the same collection can have completely different shapes
- Nested documents and arrays eliminate the need for most JOIN operations
- Schema changes in MongoDB are additive — old documents coexist with new ones during migration
2. Rich Query Language (MQL)
MongoDB's Query Language (MQL) is expressive and powerful. Despite being NoSQL, MongoDB supports filtering, sorting, projection, pagination, regular expressions, geospatial queries, and full-text search — all through a consistent JSON-based API.
# MQL — rich queries as Python dictionaries
# Filter: products priced between $50 and $200, sorted by price ascending
query = {
"price": {"$gte": 50, "$lte": 200}
}
sort = [("price", 1)] # 1 = ascending
# Projection: return only name and price, exclude _id
projection = {"name": 1, "price": 1, "_id": 0}
# Regex: find products whose name starts with "Ultra"
regex_query = {"name": {"$regex": "^Ultra", "$options": "i"}}
# Nested field query using dot notation
nested_query = {"address.city": "London"}
# Array query: products available in size 9
array_query = {"sizes": 9}
# Logical operators
logic_query = {
"$or": [
{"price": {"$lt": 50}},
{"brand": "TechCore"}
]
}
print("MQL queries built — ready to pass to db.collection.find()")- Comparison operators:
$eq,$ne,$gt,$gte,$lt,$lte,$in,$nin - Logical operators:
$and,$or,$not,$nor - Array operators:
$elemMatch,$all,$size - Dot notation queries nested fields directly — no JOIN or subquery needed
3. Aggregation Pipeline
The aggregation pipeline is MongoDB's answer to SQL's GROUP BY, HAVING, and complex analytical queries. Data flows through a sequence of stages — each stage transforms the documents and passes them to the next. This makes complex analytics both readable and highly performant.
Real-world use: an e-commerce dashboard calculates total revenue per category, average order value per customer segment, and top-selling products by month — all through aggregation pipelines running directly in the database.
# Aggregation pipeline — analytics without leaving the database
pipeline = [
# Stage 1: filter — only completed orders
{"$match": {"status": "completed"}},
# Stage 2: group — total revenue and order count per category
{"$group": {
"_id": "$category",
"total_revenue": {"$sum": "$amount"},
"order_count": {"$sum": 1},
"avg_order": {"$avg": "$amount"}
}},
# Stage 3: add computed field — average formatted
{"$addFields": {
"avg_order_rounded": {"$round": ["$avg_order", 2]}
}},
# Stage 4: sort — highest revenue first
{"$sort": {"total_revenue": -1}},
# Stage 5: limit — top 5 categories only
{"$limit": 5}
]
# Run with: db.orders.aggregate(pipeline)
print("Pipeline defined — 5 stages: match → group → addFields → sort → limit")- Pipeline stages run in order — each stage receives the output of the previous one
- Common stages:
$match,$group,$sort,$limit,$project,$unwind,$lookup,$addFields - Aggregations run inside the database — no need to pull data to the application layer for processing
- MongoDB optimises pipelines automatically — placing
$matchearly reduces the data processed by later stages
4. Indexing
MongoDB supports a rich variety of indexes that dramatically speed up query performance. Without an index, MongoDB performs a full collection scan — reading every document. With the right index, it jumps directly to matching documents.
# Index types in MongoDB
# Single field index — speed up queries on one field
# db.users.createIndex({"email": 1})
# Compound index — speed up queries on multiple fields together
# db.orders.createIndex({"user_id": 1, "status": 1})
# Text index — full-text search across string fields
# db.articles.createIndex({"title": "text", "body": "text"})
# Geospatial index — location-based queries (find nearby restaurants)
# db.places.createIndex({"location": "2dsphere"})
# TTL index — auto-delete documents after a time period (e.g. sessions, logs)
# db.sessions.createIndex({"created_at": 1}, {"expireAfterSeconds": 3600})
# Unique index — enforce uniqueness (like a UNIQUE constraint in SQL)
# db.users.createIndex({"email": 1}, {"unique": True})
# Partial index — index only documents matching a filter
# db.orders.createIndex({"amount": 1}, {"partialFilterExpression": {"status": "active"}})
print("Index types defined — each targets a specific query pattern")- Always create indexes for fields used in
find()filters, sort operations, and aggregation$matchstages - TTL indexes are powerful for automatic data expiry — sessions, tokens, temporary records
- Text indexes enable full-text search without a separate search engine for simple use cases
- Too many indexes slow down writes — index strategically based on actual query patterns
5. Horizontal Scaling with Sharding
When a single server can no longer handle the data volume or query throughput, MongoDB distributes data across multiple servers through sharding. Each shard holds a subset of the data, and MongoDB routes queries to the correct shard automatically — the application sees one database.
Real-world use: Pokémon Go used MongoDB to handle tens of millions of concurrent players at launch. eBay uses MongoDB sharding to distribute billions of listing documents across dozens of servers.
# Sharding concept — distributing data across servers
# Choose a shard key — determines how data is distributed
# Good shard key: high cardinality, even distribution, matches query patterns
# Range-based sharding — documents with similar shard key values go to same shard
shard_config_range = {
"shardKey": {"user_id": 1},
"type": "range"
# user_id 1-1000 → Shard A
# user_id 1001-2000 → Shard B
# user_id 2001-3000 → Shard C
}
# Hashed sharding — shard key is hashed for even distribution
shard_config_hashed = {
"shardKey": {"user_id": "hashed"},
"type": "hashed"
# hash(user_id) distributed evenly across all shards
}
print("Sharding config defined")
print("Range sharding — good for range queries")
print("Hashed sharding — good for even write distribution")Range sharding — good for range queries
Hashed sharding — good for even write distribution
- Sharding is transparent to the application — the mongos router handles query routing automatically
- Choose a shard key carefully — a poor choice causes hotspots where one shard gets all the traffic
- Hashed sharding distributes writes evenly; range sharding is better for range-based queries
- MongoDB can add new shards to a running cluster without downtime
6. High Availability with Replica Sets
A replica set is a group of MongoDB servers that hold copies of the same data. One node is the primary (handles all writes), the others are secondaries (replicate from the primary). If the primary fails, the secondaries automatically elect a new primary — typically within seconds.
Real-world use: every production MongoDB deployment uses replica sets. A three-node replica set across three availability zones means the database survives the failure of an entire data centre.
# Replica set — automatic failover and data redundancy
replica_set = {
"name": "rs0",
"members": [
{"id": 0, "host": "mongo1:27017", "role": "PRIMARY"},
{"id": 1, "host": "mongo2:27017", "role": "SECONDARY"},
{"id": 2, "host": "mongo3:27017", "role": "SECONDARY"}
],
"behaviour": {
"writes": "always go to PRIMARY",
"reads": "PRIMARY by default, can read from SECONDARY",
"failover": "automatic — secondary elected primary within seconds",
"heartbeat": "every 2 seconds between members"
}
}
for member in replica_set["members"]:
print(f" {member['host']} — {member['role']}")
print("\nFailover:", replica_set["behaviour"]["failover"])mongo2:27017 — SECONDARY
mongo3:27017 — SECONDARY
Failover: automatic — secondary elected primary within seconds
- Minimum recommended replica set size is three nodes — ensures a majority vote for elections
- Read preference can be set to
secondaryPreferredto distribute read traffic across replicas - Replica sets also serve as the building block for sharded clusters — each shard is itself a replica set
- Oplog (operations log) is how secondaries replicate changes from the primary
7. ACID Transactions
Since MongoDB 4.0, multi-document ACID transactions are fully supported. You can update documents across multiple collections atomically — either all changes commit or none do. This closes the gap with relational databases for transaction-heavy workloads.
# Multi-document ACID transaction (PyMongo)
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017")
db = client["bank"]
# Transfer $500 from Alice to Bob — must be atomic
with client.start_session() as session:
with session.start_transaction():
db.accounts.update_one(
{"owner": "Alice"},
{"$inc": {"balance": -500}},
session=session
)
db.accounts.update_one(
{"owner": "Bob"},
{"$inc": {"balance": 500}},
session=session
)
# Both updates commit together — or neither does if an error occurs
print("Transfer committed successfully")- Use transactions when you need to update multiple documents or collections atomically
- For single-document updates, MongoDB is always atomic — no transaction needed
- Transactions have a performance cost — use them only when ACID guarantees are required
- Transactions work across shards in MongoDB 4.2+ — full distributed ACID support
Key Use Cases
MongoDB is the database of choice across a wide range of industries and application types. Here is where it consistently delivers the most value.
- Content & catalogue management — variable document structures per content type. CMS platforms, e-commerce catalogues, media libraries.
- User profiles & personalisation — rich, nested user objects with preferences, history, and settings. Social networks, SaaS dashboards.
- Real-time analytics — aggregation pipelines running directly on live data. IoT dashboards, operational reporting.
- Mobile & web applications — rapid iteration, evolving schemas, JSON-native API responses. Startups, consumer apps.
- Gaming — player profiles, game state, leaderboards, inventory. EA, Activision, and Ubisoft all use MongoDB.
- IoT & time-series data — high-volume sensor writes, time-series collections. Smart devices, industrial monitoring.
- Geospatial applications — location-based search, proximity queries, mapping. Delivery apps, real-estate platforms.
Summary Table
| Feature | What It Does | Best Used For |
|---|---|---|
| Document model | Flexible nested BSON storage | Variable structures, rapid schema evolution |
| Rich MQL | JSON-based queries with operators | Filtering, sorting, search, geospatial |
| Aggregation pipeline | Multi-stage data transformation | Analytics, reporting, dashboards |
| Indexing | Speed up queries on specific fields | High-traffic queries, text search, geo, TTL |
| Sharding | Distribute data across multiple servers | Web-scale data volume and throughput |
| Replica sets | Automatic failover and redundancy | High availability, disaster recovery |
| ACID transactions | Multi-document atomic operations | Financial operations, inventory updates |
Practice Questions
Practice 1. What makes the document model ideal for a product catalogue with different product types?
Practice 2. What is a TTL index and what is it used for?
Practice 3. In a replica set, which node handles all write operations?
Practice 4. From which version of MongoDB are multi-document ACID transactions supported?
Practice 5. What is the role of the mongos router in a sharded cluster?
Quiz
Quiz 1. Which aggregation stage is used to filter documents before processing in a pipeline?
Quiz 2. What type of sharding distributes documents evenly by hashing the shard key?
Quiz 3. How many nodes are the minimum recommended for a replica set?
Quiz 4. What MongoDB index type is used for location-based proximity queries?
Quiz 5. What is the key advantage of running aggregation pipelines inside the database rather than in the application?
Next up — MongoDB Architecture: how mongod, mongos, replica sets, and sharded clusters fit together under the hood.