NoSQL
Choosing the Right NoSQL Database
The most expensive database mistake is not picking the wrong engine — it is picking it for the wrong reasons. Teams choose MongoDB because it is popular, Redis because it is fast, Cassandra because Netflix uses it. Six months later they are fighting the data model, rewriting queries, or migrating to something else entirely. This lesson gives you a systematic framework so you choose the right database the first time.
The Decision Framework — Four Questions First
Before you look at any database, answer these four questions about your workload. The answers will eliminate most of the options immediately.
What does a read look like?
A single key lookup? A range scan? A multi-hop traversal? A full-text search? Aggregation across millions of rows? The read pattern is the single most important constraint.
What is the write volume and latency requirement?
100 writes/sec or 1 million? Does every write need sub-millisecond acknowledgement, or can you tolerate 10ms? This determines whether you need an LSM-tree engine or a B-tree.
How flexible is your data structure?
Does every record look the same, or do different entities have wildly different shapes? Do you need schema enforcement or schema freedom? Is the structure evolving rapidly?
How critical is consistency?
Can users briefly see stale data? Or does your domain — banking, healthcare, inventory — require that every read sees the latest committed write? This determines your CAP theorem position.
The Full Comparison — All Four NoSQL Families
| Type | Best read pattern | Write strength | Consistency | Avoid when | Examples |
|---|---|---|---|---|---|
| Key-Value | Single key lookup by exact ID | Sub-millisecond, extremely high throughput | Eventual (DynamoDB) or strong (Redis) | You need to query by anything other than the key | Redis, DynamoDB, Memcached |
| Document | Rich queries on nested fields, flexible filters | Good — indexed writes, flexible schema | Tunable (MongoDB) or eventual (CouchDB) | You need multi-document transactions or time-series at scale | MongoDB, CouchDB, Firestore |
| Column-Family | Range scans on a partition key + clustering column | Exceptional — LSM-tree, millions of writes/sec | Tunable (Cassandra) or strong (HBase) | You need ad-hoc queries or multi-partition transactions | Cassandra, HBase, ScyllaDB |
| Graph | Multi-hop traversals, relationship pattern matching | Moderate — strong for connected writes | Strong (Neo4j) | You need heavy aggregations or high write throughput | Neo4j, Amazon Neptune, TigerGraph |
Decision Tree — From Workload to Database
Follow the path that matches your workload
Real Architecture Decisions — Five Case Studies
The scenario: You have just joined a fast-growing startup as the lead engineer. Five different product requirements land on your desk the same week. Each one needs a database decision justified to your CTO. Work through each one.
Live Football Score Updates
5 million concurrent users. Each needs the current score for one match — accessed by match ID. Scores change every few minutes. Stale data older than 5 seconds is acceptable. The backend already pushes updates via WebSocket.
E-commerce Product Catalogue
200,000 products, each with a completely different attribute structure — electronics have voltage and warranty, clothing has size and material, food has allergens and calories. Marketing wants to filter by any combination of attributes. New product types are added monthly.
IoT Temperature Monitoring
80,000 sensors, each sending a reading every 5 seconds — 16,000 writes per second sustained. Queries are always "get all readings for sensor X between time A and time B." Data older than 90 days is irrelevant and should be automatically deleted.
LinkedIn-Style "People You May Know"
10 million users. Each user has connections to other users, and those connections have connections. The feature needs second- and third-degree connections in under 50ms. The prototype in Postgres using recursive CTEs takes 8 seconds and degrades further as the network grows.
Hospital Patient Records
Complex nested data (diagnoses, medications, allergies, test results). Relationships between patients, doctors, and departments matter. Every write must be ACID. Regulators require that data is never lost or inconsistent. 50,000 records total — no massive scale.
The Polyglot Pattern — When You Need More Than One
Most production systems at scale do not use a single database. They use the right database for each concern. This is called polyglot persistence — different data stores for different parts of the same application.
Example: E-commerce Platform — Polyglot Architecture
Orders & Payments
PostgreSQL
ACID transactions, financial integrity, referential constraints
Product Catalogue
MongoDB
Flexible schema per category, rich attribute queries, fast iteration
Sessions & Cart
Redis
Sub-ms reads, TTL for abandoned carts, pub-sub for real-time updates
Recommendations
Neo4j
Purchase graph traversal, collaborative filtering, real-time recommendations
Clickstream Events
Cassandra
Millions of events/sec, partitioned by user, TTL for 90-day retention
The Warning Signs — You Chose the Wrong Database
You picked Cassandra and now…
Every feature request requires a new table because your queries keep changing. You are doing ALLOW FILTERING on every query. Developers are asking "how do I JOIN these two tables?" You needed a document database.
You picked MongoDB and now…
Writes are piling up at 500,000/sec and the cluster is gasping. The collection has 20 billion documents. Every read scans a huge index. You needed Cassandra for the write throughput and partition-based access.
You picked Redis and now…
The dataset grew to 400GB and you are paying a fortune for RAM. You have complex filtering requirements beyond key lookup. You needed DynamoDB or MongoDB — Redis was the right tool for a much smaller hot dataset.
You picked Neo4j and now…
You are trying to aggregate revenue across 2 billion transactions using graph traversal. Query times are minutes. The data is fundamentally tabular — relationships are not the interesting part. You needed a columnar data warehouse.
Teacher's Note
The right answer is almost always "it depends" — but that is not an excuse to avoid deciding. Use the four questions at the top of this lesson every single time, and when in doubt, choose the simplest option that solves the problem. PostgreSQL solves more problems than most engineers give it credit for. NoSQL is not an upgrade from SQL — it is a specialisation. Reach for it only when SQL's constraints are genuinely causing you pain, not before.
Practice Questions — You're the Engineer
Scenario:
Scenario:
Scenario:
Quiz — Database Selection in Production
Scenario:
Scenario:
Scenario:
ALLOW FILTERING and takes 40 seconds. They need to filter events by region, event_type, user_plan, and date — in any combination, with no fixed access pattern. What is the root cause of the problem?
Up Next · Lesson 23
NoSQL Data Modeling
The principles that separate a schema that scales from one that falls apart at 10 million records — and how to apply them from day one.