Mango DBLesson 32 – Performance Optimization | Dataplexa

Performance Optimization

A MongoDB deployment that runs fine at 10,000 documents can grind to a halt at 10 million if the right optimisations have not been applied. Performance problems in MongoDB almost always come from one of three sources: missing or incorrect indexes causing collection scans, documents too large for the WiredTiger cache causing excessive disk I/O, or poorly structured queries doing more work than necessary. This lesson covers the full toolkit for diagnosing and fixing performance problems — the query profiler, explain() plans, slow query logging, covered queries, projection optimisation, and the write performance strategies that keep high-throughput applications healthy.

1. The Query Profiler

MongoDB's built-in query profiler records information about every operation that exceeds a configurable time threshold into a special capped collection called system.profile. Enabling it is the first step in any performance investigation — it tells you exactly which queries are slow, how long they took, and how many documents they examined.

# Enabling and reading the query profiler

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Profiler levels:
# 0 = off (default)
# 1 = log only slow operations (above slowms threshold)
# 2 = log ALL operations (use only during debugging — very verbose)

# Enable profiling — log operations slower than 10 ms
db.command("profile", 1, slowms=10)
print("Profiler enabled: level 1, slowms threshold = 10 ms")

# Check current profiler status
status = db.command("profile", -1)   # -1 = read current settings without changing
print(f"Current profiler status:")
print(f"  was:    {status['was']}")
print(f"  slowms: {status['slowms']} ms")

# Run some queries that will be profiled
list(db.products.find({"category": "Electronics", "price": {"$gt": 50}}))
list(db.orders.find({"status": "delivered"}).sort("date", -1))
list(db.users.find({"membership": "premium"}))

# Read the profile collection — most recent entries first
print("\nRecent profiled operations (system.profile):")
profile_entries = db.system.profile.find(
    {},
    {
        "op":          1,   # operation type: query, update, insert, etc.
        "ns":          1,   # namespace: database.collection
        "millis":      1,   # execution time in milliseconds
        "docsExamined":1,   # documents scanned
        "nreturned":   1,   # documents returned
        "keysExamined":1,   # index keys scanned
        "planSummary": 1,   # index used or COLLSCAN
    }
).sort("ts", -1).limit(5)

for entry in profile_entries:
    ns        = entry.get("ns", "")
    op        = entry.get("op", "")
    millis    = entry.get("millis", 0)
    examined  = entry.get("docsExamined", 0)
    returned  = entry.get("nreturned", 0)
    plan      = entry.get("planSummary", "unknown")
    print(f"  {op:8} {ns:30} {millis:4} ms  "
          f"examined: {examined:4}  returned: {returned:3}  plan: {plan}")

# Turn profiler off when done
db.command("profile", 0)
print("\nProfiler disabled")
Profiler enabled: level 1, slowms threshold = 10 ms
Current profiler status:
was: 0
slowms: 10 ms

Recent profiled operations (system.profile):
query dataplexa.products 2 ms examined: 4 returned: 3 plan: IXSCAN { category: 1 }
query dataplexa.orders 1 ms examined: 4 returned: 4 plan: IXSCAN { status: 1 }
query dataplexa.users 0 ms examined: 3 returned: 3 plan: IXSCAN { membership: 1 }

Profiler disabled
  • Always turn the profiler off after debugging — level 2 logs every operation and the system.profile capped collection will fill quickly, adding write overhead to every database operation
  • A query where docsExamined is much larger than nreturned is a red flag — it means the index is not selective enough or is missing entirely
  • In production, use level 1 with a slowms of 100 ms — this captures genuine performance problems without logging the thousands of fast queries that are already performing correctly

2. explain() — Reading the Query Plan

explain() is the most detailed diagnostic tool available. It shows exactly what MongoDB did to execute a query — which index it used, how many keys and documents it examined, whether it sorted in memory or used an index, and how long each stage took. Reading explain output is an essential skill for any MongoDB developer.

# explain() — reading execution stats and winning plan

from pymongo import MongoClient, DESCENDING

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

def explain_query(query_cursor, label: str):
    """Run explain and print the key stats in a readable format."""
    plan = query_cursor.explain("executionStats")

    stats      = plan["executionStats"]
    win_plan   = plan["queryPlanner"]["winningPlan"]

    # Navigate to the deepest input stage to find IXSCAN or COLLSCAN
    def get_leaf_stage(stage):
        if "inputStage" in stage:
            return get_leaf_stage(stage["inputStage"])
        return stage

    leaf       = get_leaf_stage(win_plan)
    stage_name = leaf.get("stage", "UNKNOWN")
    index_name = leaf.get("indexName", "—")

    print(f"\n  {label}")
    print(f"  Stage:              {win_plan.get('stage', '?')} → {stage_name}")
    print(f"  Index used:         {index_name}")
    print(f"  Keys examined:      {stats.get('totalKeysExamined', 0)}")
    print(f"  Docs examined:      {stats['totalDocsExamined']}")
    print(f"  Docs returned:      {stats['nReturned']}")
    print(f"  Execution time:     {stats['executionTimeMillis']} ms")
    ratio = (stats['totalDocsExamined'] / stats['nReturned']
             if stats['nReturned'] > 0 else float('inf'))
    print(f"  Examine ratio:      {ratio:.1f}x  "
          f"{'✓ efficient' if ratio <= 2 else '✗ investigate'}")

print("Query explain analysis:\n")

# Query 1 — good: uses compound index, examine ratio close to 1
explain_query(
    db.products.find(
        {"category": "Electronics", "price": {"$gt": 20}}
    ).sort("rating", DESCENDING),
    "Electronics > $20 sorted by rating (compound index)"
)

# Query 2 — bad: no index on rating alone — collection scan
explain_query(
    db.products.find({"rating": {"$gt": 4.5}}),
    "Products with rating > 4.5 (no index on rating)"
)

# Query 3 — good: after adding a single-field index
db.products.create_index([("rating", DESCENDING)], name="idx_products_rating")
explain_query(
    db.products.find({"rating": {"$gt": 4.5}}),
    "Products with rating > 4.5 (after adding index)"
)
Query explain analysis:

Electronics > $20 sorted by rating (compound index)
Stage: FETCH → IXSCAN
Index used: idx_products_cat_rating_price
Keys examined: 5
Docs examined: 4
Docs returned: 4
Execution time: 0 ms
Examine ratio: 1.0x ✓ efficient

Products with rating > 4.5 (no index on rating)
Stage: COLLSCAN → COLLSCAN
Index used: —
Keys examined: 0
Docs examined: 7
Docs returned: 3
Execution time: 0 ms
Examine ratio: 2.3x ✗ investigate

Products with rating > 4.5 (after adding index)
Stage: FETCH → IXSCAN
Index used: idx_products_rating
Keys examined: 3
Docs examined: 3
Docs returned: 3
Execution time: 0 ms
Examine ratio: 1.0x ✓ efficient
  • An examine ratio of 1.0 means every document fetched was returned — the index is perfectly selective for this query. A ratio above 5× warrants investigation
  • COLLSCAN in the winning plan stage is the most important warning sign — it means no index was used and every document was read
  • The three explain() verbosity levels are "queryPlanner" (plan only), "executionStats" (plan + runtime stats), and "allPlansExecution" (stats for all candidate plans) — use "executionStats" for routine diagnosis

3. Covered Queries — Zero Document Reads

A covered query is one where the index contains all the fields the query needs — both for filtering and for the projection. MongoDB can satisfy the query entirely from the index without touching any documents at all. This is the fastest possible query execution because it eliminates disk reads entirely.

# Covered queries — satisfying a query entirely from the index

from pymongo import MongoClient, ASCENDING, DESCENDING

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Create an index that covers our most common product listing query
# Fields needed: category (filter), rating (sort + return), price (return)
db.products.create_index(
    [("category", ASCENDING), ("rating", DESCENDING), ("price", ASCENDING)],
    name="idx_cover_cat_rating_price"
)

# NON-COVERED query — requests 'name' field which is NOT in the index
# MongoDB must fetch the full document from disk to get 'name'
non_covered = db.products.find(
    {"category": "Electronics"},
    {"category": 1, "rating": 1, "price": 1, "name": 1, "_id": 0}  # name not in index
).explain("executionStats")

nc_stats = non_covered["executionStats"]
print("Non-covered query (name not in index):")
print(f"  Docs examined:  {nc_stats['totalDocsExamined']}")
print(f"  Keys examined:  {nc_stats['totalKeysExamined']}")
print(f"  Stage requires document fetch: YES (name not in index)")

# COVERED query — only requests fields that exist in the index
covered = db.products.find(
    {"category": "Electronics"},
    {"category": 1, "rating": 1, "price": 1, "_id": 0}  # all in index, _id excluded
).explain("executionStats")

c_stats = covered["executionStats"]
print("\nCovered query (all projected fields in index):")
print(f"  Docs examined:  {c_stats['totalDocsExamined']}")
print(f"  Keys examined:  {c_stats['totalKeysExamined']}")
print(f"  Stage requires document fetch: "
      f"{'YES' if c_stats['totalDocsExamined'] > 0 else 'NO — pure index scan'}")

# Run the covered query to see actual results
results = list(db.products.find(
    {"category": "Electronics"},
    {"category": 1, "rating": 1, "price": 1, "_id": 0}
).sort("rating", DESCENDING))

print("\nResults from covered query:")
for p in results:
    print(f"  rating: {p['rating']}  price: ${p['price']:.2f}  ({p['category']})")
Non-covered query (name not in index):
Docs examined: 4
Keys examined: 4
Stage requires document fetch: YES (name not in index)

Covered query (all projected fields in index):
Docs examined: 0
Keys examined: 4
Docs examined: 0 — NO — pure index scan

Results from covered query:
rating: 4.7 price: $89.99 (Electronics)
rating: 4.6 price: $299.99 (Electronics)
rating: 4.5 price: $29.99 (Electronics)
rating: 4.3 price: $49.99 (Electronics)
  • A covered query shows totalDocsExamined: 0 in explain output — the ultimate performance result, because zero documents were read from disk
  • To cover a query, the index must contain every field in the filter AND every field in the projection — always explicitly exclude _id with "_id": 0 unless it is in the index
  • Covered queries are most valuable for high-frequency read operations like product listing pages, search autocomplete, and dashboard summary cards

4. Projection Optimisation

Even when a query cannot be covered by an index, projecting only the fields your application needs reduces the amount of data transferred from the server to the client and the amount of data MongoDB must deserialise. On documents with many fields or large embedded arrays, this can cut network transfer and application memory usage dramatically.

# Projection optimisation — transferring only the fields you need

from pymongo import MongoClient
import sys, json

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Simulate a product with many fields (as it would exist in production)
# Typical product document might have description, images, specs, metadata...
sample_product = db.products.find_one({"_id": "p001"})

# Measure full document size
full_size = len(json.dumps(sample_product, default=str).encode("utf-8"))
print(f"Full product document size:  {full_size} bytes")

# Projected document — only what the listing page needs
projected = db.products.find_one(
    {"_id": "p001"},
    {"name": 1, "price": 1, "rating": 1, "category": 1, "_id": 0}
)
proj_size = len(json.dumps(projected, default=str).encode("utf-8"))
print(f"Projected document size:     {proj_size} bytes  "
      f"({100 - round(proj_size/full_size*100)}% smaller)")

# Bad pattern — fetching full documents and filtering in Python
print("\nBAD pattern — fetch all, filter in Python:")
all_products = list(db.products.find({}))
electronics_names = [p["name"] for p in all_products if p["category"] == "Electronics"]
print(f"  Fetched {len(all_products)} full documents, used 4")

# Good pattern — filter and project in MongoDB
print("\nGOOD pattern — filter and project in MongoDB:")
electronics_names = list(db.products.find(
    {"category": "Electronics"},
    {"name": 1, "_id": 0}
))
print(f"  Fetched {len(electronics_names)} documents, each with only 'name' field")
for p in electronics_names:
    print(f"  {p['name']}")

# Projection rules reminder
print("\nProjection rules:")
print("  Inclusion: {'field': 1} — only listed fields returned")
print("  Exclusion: {'field': 0} — all fields except listed ones returned")
print("  Cannot mix inclusion and exclusion (except _id which can always be excluded)")
print("  $slice on arrays: {'tags': {'$slice': 2}} — return first 2 elements only")
Full product document size: 198 bytes
Projected document size: 73 bytes (63% smaller)

BAD pattern — fetch all, filter in Python:
Fetched 7 full documents, used 4

GOOD pattern — filter and project in MongoDB:
Fetched 4 documents, each with only 'name' field
Wireless Mouse
Mechanical Keyboard
USB-C Hub
Monitor 27-inch
  • Always project only the fields the application actually uses — never fetch full documents and filter fields in Python when MongoDB can do it for you
  • On large embedded arrays use $slice in a projection to return only the first N elements rather than the entire array — especially valuable for reviews, items, and log entries
  • The network transfer cost of unnecessary fields compounds at scale — 100 extra bytes per document becomes 100 MB on a query returning one million documents

5. Write Performance — Bulk Operations and Indexes

Write performance degrades when operations are sent individually over many round trips or when a collection has too many indexes. Each insert, update, or delete must update every index on the collection — a collection with ten indexes pays ten times the index maintenance cost on every write. Bulk operations batch multiple writes into a single server round trip, dramatically improving throughput.

# Write performance — bulk_write vs individual writes, index overhead

from pymongo import MongoClient, InsertOne, UpdateOne, DeleteOne
import time

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

db.perf_test.drop()

# ── Individual writes — one round trip per operation ───────────────────────
start = time.perf_counter()
for i in range(100):
    db.perf_test.insert_one({"_id": i, "val": i * 2, "tag": "batch_a"})
individual_ms = (time.perf_counter() - start) * 1000

db.perf_test.drop()

# ── Bulk write — all 100 operations in ONE round trip ─────────────────────
start = time.perf_counter()
operations = [InsertOne({"_id": i, "val": i * 2, "tag": "batch_b"})
              for i in range(100)]
db.perf_test.bulk_write(operations, ordered=False)
bulk_ms = (time.perf_counter() - start) * 1000

print(f"100 individual inserts: {individual_ms:.1f} ms")
print(f"100 bulk inserts:       {bulk_ms:.1f} ms")
print(f"Bulk speedup:           ~{individual_ms / bulk_ms:.1f}x faster")

# ── Mixed bulk operations — insert + update + delete in one call ───────────
mixed_ops = [
    InsertOne({"_id": "new_p1", "name": "Test Product A", "price": 9.99}),
    InsertOne({"_id": "new_p2", "name": "Test Product B", "price": 19.99}),
    UpdateOne({"_id": "new_p1"}, {"$set": {"price": 7.99}}),
    DeleteOne({"_id": "new_p2"}),
]
result = db.perf_test.bulk_write(mixed_ops)
print(f"\nMixed bulk_write result:")
print(f"  inserted:  {result.inserted_count}")
print(f"  modified:  {result.modified_count}")
print(f"  deleted:   {result.deleted_count}")

# ── Index overhead on writes ───────────────────────────────────────────────
print("\nIndex write overhead — rule of thumb:")
overhead = [
    ("0 extra indexes", "baseline write cost"),
    ("1 extra index",   "~1.1x baseline"),
    ("3 extra indexes", "~1.3x baseline"),
    ("10 extra indexes","~2x+ baseline — audit and drop unused indexes"),
]
for indexes, cost in overhead:
    print(f"  {indexes:20}  {cost}")

db.perf_test.drop()
100 individual inserts: 312.4 ms
100 bulk inserts: 18.7 ms
Bulk speedup: ~16.7x faster

Mixed bulk_write result:
inserted: 2
modified: 1
deleted: 1

Index write overhead — rule of thumb:
0 extra indexes baseline write cost
1 extra index ~1.1x baseline
3 extra indexes ~1.3x baseline
10 extra indexes ~2x+ baseline — audit and drop unused indexes
  • ordered=False in bulk_write allows MongoDB to execute operations in any order — this enables parallel execution and is faster than ordered bulk writes when operations are independent
  • Use ordered=True (the default) only when later operations depend on earlier ones — for example when inserting a user and then inserting their first order
  • Every index on a collection must be updated on every write — run $indexStats regularly and drop any index with zero usage to keep write performance healthy

6. The Working Set and Memory

MongoDB's WiredTiger storage engine uses an in-memory cache — the working set — to serve reads without touching disk. When the data and indexes that a query needs are in the cache, reads are fast. When they are not, WiredTiger must fetch pages from disk — orders of magnitude slower. Keeping the working set within available RAM is one of the most important performance considerations for production deployments.

# Working set and memory — monitoring cache usage

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# Server status — WiredTiger cache statistics
server_stats = client.admin.command("serverStatus")
wt           = server_stats.get("wiredTiger", {})
cache        = wt.get("cache", {})

bytes_in_cache  = cache.get("bytes currently in the cache", 0)
max_cache_bytes = cache.get("maximum bytes configured", 0)
pages_read_disk = cache.get("pages read into cache", 0)
pages_evicted   = cache.get("pages evicted because they exceeded the in-memory maximum", 0)

mb = 1024 * 1024
print("WiredTiger cache status:")
print(f"  Cache size configured:  {max_cache_bytes / mb:.0f} MB")
print(f"  Cache currently used:   {bytes_in_cache / mb:.1f} MB")
print(f"  Pages read from disk:   {pages_read_disk}")
print(f"  Pages evicted:          {pages_evicted}")

# Collection stats — data size vs index size
print("\nCollection sizes:")
for coll_name in ["users", "products", "orders", "reviews"]:
    stats = db.command("collStats", coll_name)
    data_kb  = stats.get("size",        0) / 1024
    index_kb = stats.get("totalIndexSize", 0) / 1024
    count    = stats.get("count", 0)
    print(f"  {coll_name:10}  docs: {count:4}  "
          f"data: {data_kb:6.1f} KB  indexes: {index_kb:6.1f} KB")

# Optimisation rules for working set
print("\nWorking set optimisation rules:")
rules = [
    "Configure WiredTiger cache to 50–60% of available RAM",
    "Keep your hottest indexes fully in RAM — check $indexStats for usage",
    "Use projections to reduce document size in cache",
    "Archive cold data to separate collections or Atlas Online Archive",
    "Monitor 'pages evicted' — high eviction rate means working set exceeds RAM",
]
for r in rules:
    print(f"  • {r}")
WiredTiger cache status:
Cache size configured: 256 MB
Cache currently used: 12.4 MB
Pages read from disk: 847
Pages evicted: 0

Collection sizes:
users docs: 5 data: 2.3 KB indexes: 28.0 KB
products docs: 7 data: 3.1 KB indexes: 64.0 KB
orders docs: 7 data: 4.8 KB indexes: 36.0 KB
reviews docs: 5 data: 2.9 KB indexes: 24.0 KB

Working set optimisation rules:
• Configure WiredTiger cache to 50–60% of available RAM
• Keep your hottest indexes fully in RAM — check $indexStats for usage
• Use projections to reduce document size in cache
• Archive cold data to separate collections or Atlas Online Archive
• Monitor 'pages evicted' — high eviction rate means working set exceeds RAM
  • If pages evicted is consistently non-zero, the working set is larger than the cache — the most reliable fix is to add more RAM or reduce document and index sizes
  • Index size often exceeds data size on well-indexed collections — this is normal and expected, but indexes that are never used waste cache space that hot indexes could use
  • WiredTiger's default cache size is the larger of 256 MB or 50% of RAM minus 1 GB — on a 4 GB server this gives approximately 1.5 GB of cache

Summary Table

Technique What It Fixes Key Metric to Check
Query profiler (level 1) Identifies slow queries in production docsExamined / nreturned ratio
explain("executionStats") Diagnoses individual query plans COLLSCAN vs IXSCAN, examine ratio
Covered queries Eliminates document reads entirely totalDocsExamined: 0 in explain
Projection Reduces network transfer and cache usage Document size before vs after projection
bulk_write Reduces write round trips Operations per second throughput
Drop unused indexes Reduces write overhead and cache waste $indexStats accesses.ops = 0
Working set monitoring Prevents disk I/O from hot data WiredTiger pages evicted count

Practice Questions

Practice 1. What does a high docsExamined / nreturned ratio in the profiler or explain output indicate?



Practice 2. What two conditions must be true for a query to be a covered query?



Practice 3. Why does ordered=False in bulk_write improve performance and when should you use ordered=True instead?



Practice 4. What does a consistently high pages evicted count in WiredTiger cache stats indicate and what is the fix?



Practice 5. What profiler level should you use in production and why not use level 2?



Quiz

Quiz 1. What does totalDocsExamined: 0 in an explain() output mean?






Quiz 2. Which profiler level logs ALL operations and should only be used during short debugging sessions?






Quiz 3. Why must _id be explicitly excluded when creating a covered query projection?






Quiz 4. What is the approximate performance benefit of bulk_write compared to individual writes for 100 operations?






Quiz 5. What WiredTiger cache metric indicates the working set has grown larger than available RAM?






Next up — Replication: How replica sets work, primary and secondary roles, automatic failover, and configuring read preferences to distribute query load.