NoSQL
NoSQL Performance Optimization
A slow NoSQL system is almost never a hardware problem. More RAM, faster disks, and bigger nodes mask symptoms for weeks before the same patterns resurface at the next scale milestone. Real performance fixes target root causes — the wrong partition key, a missing index, an unbounded scan, a consistency level set carelessly. This lesson is a systematic playbook for diagnosing and fixing performance problems in production NoSQL systems.
The Optimization Hierarchy
Performance problems in NoSQL systems almost always belong to one of four layers. Always start at the top — a schema fix gives orders-of-magnitude improvement. Hardware upgrades at the bottom give 20–50% improvement at 10× the cost.
1 — Schema & Data Model
Wrong partition key, missing table for access pattern, low-cardinality keys, unbounded partitions. Fix here = 100–10,000× improvement.
2 — Indexes
Missing index on filtered fields, wrong compound index field order (ESR), over-indexed collection adding write overhead. Fix here = 100–1,000× improvement.
3 — Query Patterns
N+1 query problems, fetching more data than needed, missing projections, scatter-gather queries that could hit one partition. Fix here = 5–50× improvement.
4 — Infrastructure & Configuration
JVM heap sizing, compaction strategy, connection pool sizing, caching configuration, consistency level tuning. Fix here = 20–50% improvement.
Diagnosing MongoDB Performance — The Toolkit
The scenario: You are on call at 2 AM. The ops dashboard shows MongoDB query latency at p99 climbing from 8ms to 4.2 seconds over the past hour. Traffic is normal. You have four minutes before your SLA is breached. You need to identify the slow query, understand why it is slow, and apply a fix.
// Step 1: find currently running slow operations
// maxTimeMS: 0 shows all running ops, sorted by duration
db.currentOp({ "active": true, "secs_running": { "$gt": 1 } })
{
inprog: [
{
opid: 8821,
op: "query",
ns: "marketplace.orders",
command: { find: "orders", filter: { status: "pending" },
sort: { created_at: -1 } },
secs_running: 3,
planSummary: "COLLSCAN", ← full table scan
numYields: 1204,
docsExamined: 48200000 ← 48M docs examined so far
}
]
}db.currentOp() — live query inspection
currentOp shows every operation currently executing on the server. secs_running > 1 filters to anything that has been running for more than a second — those are your immediate suspects. planSummary: "COLLSCAN" confirms the problem instantly: a full collection scan on 48 million orders. numYields: 1204 means the query has yielded to other operations over a thousand times — it is blocking the entire collection with read locks.
Emergency kill while you fix the index
If a runaway query is blocking other operations, kill it with db.killOp(8821) to restore normal latency immediately. This buys time to apply the proper fix — the missing index — without the query continuing to degrade the cluster for another 3 minutes.
// Step 2: confirm diagnosis with explain()
db.orders.find(
{ status: "pending" }
).sort(
{ created_at: -1 }
).explain("executionStats")
// Step 3: apply the fix — compound index following ESR rule
// E: status (equality) S: created_at (sort)
db.orders.createIndex(
{ status: 1, created_at: -1 },
{ name: "orders_status_date_idx", background: true }
)
explain() before index: stage: COLLSCAN | docsExamined: 48200000 | executionTimeMs: 4180 createIndex: "orders_status_date_idx" — building in background... Build complete in 62 seconds (one-time cost) explain() after index: stage: IXSCAN | docsExamined: 1840 | executionTimeMs: 3 p99 latency restored to 6ms ✓
background: true — safe for live clusters
Building an index in the foreground on a live collection holds a write lock for the entire build duration — completely blocking writes for 62 seconds. background: true builds the index while allowing concurrent reads and writes, at the cost of a slightly slower build. In MongoDB 4.2+ all index builds are concurrent by default, but on older versions always specify background: true on production clusters.
Diagnosing Cassandra Performance — The Toolkit
The scenario: Your Cassandra-backed order service has been running fine for eight months. Over the past two weeks, read latency on the orders_by_customer table has climbed steadily from 2ms to 180ms. No traffic change, no schema change. You suspect a growing data problem rather than a query problem.
# Check table-level read/write latency stats
nodetool tablestats orders_keyspace.orders_by_customer
# Check the size and count of the largest partitions
# (requires cassandra-stress or a custom script at scale)
# Quick check: use nodetool cfhistograms to see partition size distribution
nodetool cfhistograms orders_keyspace orders_by_customer
nodetool tablestats: Table: orders_by_customer Read Count: 8,204,112 Read Latency: 184.2 ms avg ← was 2ms eight months ago Write Count: 142,891,044 Write Latency: 1.1 ms avg ← writes still fast SSTable Count: 284 ← very high, compaction lagging Live SS Table Disk Used: 892 GB nodetool cfhistograms (partition size): Percentile Partition Size 50% 12 KB 75% 48 KB 95% 8.4 MB 99% 892 MB ← enormous partitions at the tail Max: 4.2 GB ← way beyond 100MB recommendation
SSTable Count: 284 — compaction is falling behind
Every time the memtable flushes, a new SSTable file is written. Compaction merges SSTables in the background. 284 SSTables means compaction cannot keep up with write volume — each read must check more files, each requiring a Bloom filter check and potentially a disk read. Healthy Cassandra tables typically have under 10 SSTables. 284 is a compaction emergency.
Max partition: 4.2 GB — the real root cause
The 99th percentile partition at 892MB and the max at 4.2GB tell the real story. A handful of extremely large partitions are causing both problems: reads on large partitions must scan gigabytes of data, and compaction struggles to merge 4GB partition files efficiently. The partition key is unbounded — high-volume customers have accumulated years of order history in a single partition.
The scenario continues: You identify the root cause — the partition key is customer_id with no time bucketing. Your top 50 customers have been on the platform for three years and have accumulated millions of orders each. You redesign the partition key to add a monthly bucket, capping each partition at one month of orders per customer.
-- BEFORE: unbounded partition — grows forever per customer
CREATE TABLE orders_by_customer (
customer_id TEXT,
ordered_at TIMESTAMP,
order_id UUID,
total DECIMAL,
PRIMARY KEY (customer_id, ordered_at, order_id)
);
-- AFTER: monthly bucket — caps partition at ~30 days of orders per customer
-- order_month derived from ordered_at in application: "2025-01"
CREATE TABLE orders_by_customer_v2 (
customer_id TEXT,
order_month TEXT, -- "YYYY-MM" bucket
ordered_at TIMESTAMP,
order_id UUID,
total DECIMAL,
PRIMARY KEY ((customer_id, order_month), ordered_at, order_id)
) WITH CLUSTERING ORDER BY (ordered_at DESC, order_id ASC);
Created table orders_by_customer_v2 Took 0.644 seconds After migration and compaction: SSTable Count: 8 ← down from 284 Read Latency: 1.8ms ← down from 184ms Max Partition: 18 MB ← down from 4.2 GB ✓
Monthly bucket as part of composite partition key
Adding order_month to the partition key creates one partition per customer per month. The busiest customer with 50,000 orders per month creates a partition of roughly 5–15MB — well within Cassandra's sweet spot. The application derives order_month from the order timestamp before writing. When querying the last 3 months, the app fires 3 partition queries and merges results.
The N+1 Query Problem — A Universal NoSQL Tax
The N+1 problem is not unique to SQL — it is just as common in NoSQL applications that use referencing. You fetch a list of N items, then fire one additional query per item to retrieve related data. The result: N+1 round trips to the database for what should be 1 or 2.
The scenario: A product listing page fetches 20 products, then for each product fires a separate query to get the seller's name. 21 database round trips per page load. Under load this becomes the primary bottleneck.
# BAD: N+1 — 1 query for products + 20 queries for sellers
products = list(db.products.find(
{"category": "Electronics"},
limit=20
))
for product in products:
# One extra query per product — 20 extra round trips
seller = db.sellers.find_one({"_id": product["seller_id"]})
product["seller_name"] = seller["name"]
# GOOD: 2 queries total — collect IDs, fetch in one $in query
products = list(db.products.find(
{"category": "Electronics"},
limit=20
))
seller_ids = list({p["seller_id"] for p in products})
sellers = {
s["_id"]: s for s in
db.sellers.find({"_id": {"$in": seller_ids}})
}
for product in products:
product["seller_name"] = sellers[product["seller_id"]]["name"]
N+1 approach: 21 round trips | total latency: 62ms
$in approach: 2 round trips | total latency: 4ms
At 500 req/sec: N+1 = 10,500 DB queries/sec
$in = 1,000 DB queries/sec ✓$in with a set of IDs — the standard fix
The $in operator fetches all matching documents in a single query. Building a set of seller IDs first (using a Python set comprehension to deduplicate) then fetching all sellers in one shot collapses 20 queries into 1. The sellers are indexed by _id, so MongoDB resolves all 20 lookups efficiently in a single B-tree traversal. The extended reference pattern from Lesson 25 eliminates this even further — if you had embedded seller_name in the product document, you would need zero extra queries.
Connection Pool Sizing — The Hidden Bottleneck
Application servers connect to database nodes through a connection pool. If the pool is too small, application threads queue waiting for a free connection — query latency climbs even though the database itself is underloaded. If the pool is too large, the database is overwhelmed with concurrent connections — each connection consumes memory and CPU on the database node.
The scenario: Your MongoDB service has 10 application server instances. Database CPU is at 15%. Application server threads are waiting for connections 40% of the time. Adding more database nodes has not helped. The problem is pool sizing, not database capacity.
from pymongo import MongoClient
# BEFORE: default pool size of 100 — too small for 10 app servers
# Each app server gets 100/10 = 10 effective connections
# At 500 concurrent requests per server, 490 threads are waiting
client_bad = MongoClient(
"mongodb://localhost:27017/",
maxPoolSize=100 # default — often too small
)
# AFTER: sized to match actual concurrency
# Rule of thumb: maxPoolSize ≈ (peak concurrent threads per server) * 1.2
# 50 concurrent threads per server * 1.2 = 60
client_good = MongoClient(
"mongodb://localhost:27017/",
maxPoolSize=60, # sized to actual concurrency
minPoolSize=10, # keep warm connections ready
waitQueueTimeoutMS=2000 # fail fast rather than hang
)
# Monitor pool usage — check server status
db = client_good["marketplace"]
server_status = db.command("serverStatus")
pool_stats = server_status["connections"]
print(f"Current: {pool_stats['current']} "
f"Available: {pool_stats['available']} "
f"Total created: {pool_stats['totalCreated']}")
Before (maxPoolSize=100, 10 servers): Connections in use: 100/100 — SATURATED Thread wait queue: 487 threads waiting ← bottleneck p99 latency: 2,400ms After (maxPoolSize=60, 10 servers = 600 total DB connections): Connections in use: 48/60 — healthy headroom Thread wait queue: 0 p99 latency: 11ms ✓ DB server connections: current=612 available=388 (1000 max)
waitQueueTimeoutMS: 2000
If all pool connections are in use, a new request waits up to waitQueueTimeoutMS for a free connection before raising a timeout error. Without this, requests queue indefinitely — a traffic spike causes threads to pile up, consuming server memory until the process crashes. 2 seconds is a reasonable ceiling: fast enough to fail before the HTTP request timeout triggers, slow enough to allow brief spikes to clear.
Total DB connections = maxPoolSize × app server count
With 10 app servers at maxPoolSize=60, the database receives up to 600 connections total. MongoDB's default limit is 1,000,000 connections — not the constraint. The constraint is the database's memory: each MongoDB connection uses approximately 1MB. 600 connections = 600MB of connection overhead on the database server. Size your pool to match actual concurrency, not to the maximum the database can theoretically accept.
Compaction Strategy in Cassandra — Choosing the Right Mode
Cassandra's compaction strategy determines how SSTables are merged. The wrong strategy for your workload pattern is one of the most common sources of read latency degradation over time.
| Strategy | Best for | Avoid when | Space amplification |
|---|---|---|---|
| STCS (SizeTieredCompaction) | Write-heavy workloads, time-series inserts | Read-heavy with random access — SSTables multiply | High — needs 2× space during compaction |
| LCS (LeveledCompaction) | Read-heavy, updates, consistent latency | Very high write throughput — compaction can't keep up | Low — 10× smaller SSTables, predictable reads |
| TWCS (TimeWindowCompaction) | Time-series with TTL — best for IoT, metrics | Data without clear time windows or no TTL | Low — expired time windows deleted as whole files |
A Performance Optimization Checklist
MongoDB
explain("executionStats") on every slow querytotalDocsExamined / nReturned ratio — target 1:1$indb.currentOp() during latency spikesCassandra
nodetool tablestats — check read/write latencynodetool cfhistograms — check partition size tailnodetool status — verify even data distributionTeacher's Note
The fastest path to a performance improvement is almost always finding one query that is doing a full scan and adding the right index or table. Before tuning JVM settings, compaction strategies, or anything at the infrastructure layer — spend ten minutes with explain() in MongoDB or nodetool cfhistograms in Cassandra. The schema fix is almost always hiding in plain sight, and it is worth 100× more than any configuration tweak.
Practice Questions — You're the Engineer
Scenario:
db.currentOp() during a latency spike and find a query that has been running for 4 seconds. It shows planSummary: "COLLSCAN" and docsExamined: 61,000,000. You also notice a field in the output that shows the query has paused and resumed over 2,400 times to yield execution to other operations — evidence that it is blocking concurrent reads. What is this field called?
Scenario:
customer_id. After 18 months in production, nodetool cfhistograms shows the 99th percentile partition is 2.1GB. Your five biggest enterprise customers each have a 4GB+ partition and their reads take 12 seconds. You need to bound partition growth without changing the access pattern. You add a second field to the composite partition key derived from the order date — limiting each partition to one month of data. What is this technique called?
Scenario:
db.sellers.find_one({"{"}_id: product["seller_id"]{"}"} call to retrieve the seller name. Under load at 300 requests per second, your MongoDB server is receiving 6,300 queries per second for what should be 600 queries per second. You need to collapse the 20 individual seller queries into a single query. Which MongoDB operator do you use?
Quiz — Performance Optimization in Production
Scenario:
nodetool tablestats shows read latency at 340ms — up from 4ms at launch. Write latency is still 1ms. SSTable count is 312. The table receives mostly reads with occasional profile updates. The partition sizes are healthy (under 1MB each). What is the most likely root cause and the correct fix?
Scenario:
db.serverStatus() shows connections.current: 100 and connections.available: 0. Application server logs show threads waiting an average of 3.1 seconds for a database connection. Adding more MongoDB nodes has not helped. What is the root cause?
Scenario:
Up Next · Lesson 33
NoSQL in Microservices
Each microservice owns its own database — and the decisions you make about which database each service gets determine whether your system scales or becomes a distributed monolith.