NO SQL Lesson 22 – Choosing the Right NoSQL | Dataplexa

NoSQL Database Types · Lesson 22

Choosing the Right NoSQL Database

The most expensive database mistake is not picking the wrong engine — it is picking it for the wrong reasons. Teams choose MongoDB because it is popular, Redis because it is fast, Cassandra because Netflix uses it. Six months later they are fighting the data model, rewriting queries, or migrating to something else entirely. This lesson gives you a systematic framework so you choose the right database the first time.

The Decision Framework — Four Questions First

Before you look at any database, answer these four questions about your workload. The answers will eliminate most of the options immediately.

Question 1

What does a read look like?

A single key lookup? A range scan? A multi-hop traversal? A full-text search? Aggregation across millions of rows? The read pattern is the single most important constraint.

Question 2

What is the write volume and latency requirement?

100 writes/sec or 1 million? Does every write need sub-millisecond acknowledgement, or can you tolerate 10ms? This determines whether you need an LSM-tree engine or a B-tree.

Question 3

How flexible is your data structure?

Does every record look the same, or do different entities have wildly different shapes? Do you need schema enforcement or schema freedom? Is the structure evolving rapidly?

Question 4

How critical is consistency?

Can users briefly see stale data? Or does your domain — banking, healthcare, inventory — require that every read sees the latest committed write? This determines your CAP theorem position.

The Full Comparison — All Four NoSQL Families

Type	Best read pattern	Write strength	Consistency	Avoid when	Examples
Key-Value	Single key lookup by exact ID	Sub-millisecond, extremely high throughput	Eventual (DynamoDB) or strong (Redis)	You need to query by anything other than the key	Redis, DynamoDB, Memcached
Document	Rich queries on nested fields, flexible filters	Good — indexed writes, flexible schema	Tunable (MongoDB) or eventual (CouchDB)	You need multi-document transactions or time-series at scale	MongoDB, CouchDB, Firestore
Column-Family	Range scans on a partition key + clustering column	Exceptional — LSM-tree, millions of writes/sec	Tunable (Cassandra) or strong (HBase)	You need ad-hoc queries or multi-partition transactions	Cassandra, HBase, ScyllaDB
Graph	Multi-hop traversals, relationship pattern matching	Moderate — strong for connected writes	Strong (Neo4j)	You need heavy aggregations or high write throughput	Neo4j, Amazon Neptune, TigerGraph

Decision Tree — From Workload to Database

Follow the path that matches your workload

Is your primary access pattern a single key lookup?
├── YES: Is the data ephemeral / needs expiry or pub-sub?
│   ├── YES → Redis (in-memory, TTL, pub-sub, data structures)
│   └── NO  → DynamoDB (durable, serverless, global tables)
└── NO: continue ↓
Does your data have rich, varying structure with nested fields?
├── YES: Do you need offline sync / multi-master replication?
│   ├── YES → CouchDB (MVCC, offline-first, HTTP API)
│   └── NO  → MongoDB (rich queries, aggregation pipeline, indexes)
└── NO: continue ↓
Is the workload time-series or event-stream with very high write volume?
├── YES: Are you already running Hadoop / HDFS?
│   ├── YES → HBase (HDFS-native, batch + random access)
│   └── NO  → Cassandra (peer-to-peer, tunable consistency, no HDFS needed)
└── NO: continue ↓
Is the primary challenge multi-hop relationship traversal?
├── YES → Neo4j / Graph DB (fraud, recommendations, knowledge graphs)
└── NO  → PostgreSQL — you probably do not need NoSQL at all

Real Architecture Decisions — Five Case Studies

The scenario: You have just joined a fast-growing startup as the lead engineer. Five different product requirements land on your desk the same week. Each one needs a database decision justified to your CTO. Work through each one.

Case 1

Live Football Score Updates

5 million concurrent users. Each needs the current score for one match — accessed by match ID. Scores change every few minutes. Stale data older than 5 seconds is acceptable. The backend already pushes updates via WebSocket.

Decision: Redis — single key lookup by match ID, sub-millisecond reads, TTL keeps stale scores from accumulating, pub-sub pushes updates to WebSocket handlers the moment the score changes.

Case 2

E-commerce Product Catalogue

200,000 products, each with a completely different attribute structure — electronics have voltage and warranty, clothing has size and material, food has allergens and calories. Marketing wants to filter by any combination of attributes. New product types are added monthly.

Decision: MongoDB — flexible document schema absorbs any product shape without migrations, rich query operators filter on any field, indexes cover the specific attribute combinations marketing uses most often.

Case 3

IoT Temperature Monitoring

80,000 sensors, each sending a reading every 5 seconds — 16,000 writes per second sustained. Queries are always "get all readings for sensor X between time A and time B." Data older than 90 days is irrelevant and should be automatically deleted.

Decision: Cassandra — partition key on sensor_id, clustering column on timestamp, TTL handles 90-day expiry automatically. LSM-tree write path sustains 16,000 writes/sec without breaking a sweat. Range scans per sensor are a single partition read.

Case 4

LinkedIn-Style "People You May Know"

10 million users. Each user has connections to other users, and those connections have connections. The feature needs second- and third-degree connections in under 50ms. The prototype in Postgres using recursive CTEs takes 8 seconds and degrades further as the network grows.

Decision: Neo4j — variable-length traversal follows direct relationship pointers, depth-2 connections are native. Traversal time stays constant as the network grows. Postgres recursive CTEs degrade quadratically; Neo4j does not.

Case 5

Hospital Patient Records

Complex nested data (diagnoses, medications, allergies, test results). Relationships between patients, doctors, and departments matter. Every write must be ACID. Regulators require that data is never lost or inconsistent. 50,000 records total — no massive scale.

Decision: PostgreSQL + JSONB — full ACID, JSONB columns handle nested data flexibly, foreign keys enforce referential integrity, 50,000 records is not a NoSQL-scale problem. Do not reach for NoSQL when Postgres already solves the problem cleanly.

The Polyglot Pattern — When You Need More Than One

Most production systems at scale do not use a single database. They use the right database for each concern. This is called polyglot persistence — different data stores for different parts of the same application.

Example: E-commerce Platform — Polyglot Architecture

Orders & Payments

PostgreSQL

ACID transactions, financial integrity, referential constraints

Product Catalogue

MongoDB

Flexible schema per category, rich attribute queries, fast iteration

Sessions & Cart

Redis

Sub-ms reads, TTL for abandoned carts, pub-sub for real-time updates

Recommendations

Neo4j

Purchase graph traversal, collaborative filtering, real-time recommendations

Clickstream Events

Cassandra

Millions of events/sec, partitioned by user, TTL for 90-day retention

The Warning Signs — You Chose the Wrong Database

You picked Cassandra and now…

Every feature request requires a new table because your queries keep changing. You are doing ALLOW FILTERING on every query. Developers are asking "how do I JOIN these two tables?" You needed a document database.

You picked MongoDB and now…

Writes are piling up at 500,000/sec and the cluster is gasping. The collection has 20 billion documents. Every read scans a huge index. You needed Cassandra for the write throughput and partition-based access.

You picked Redis and now…

The dataset grew to 400GB and you are paying a fortune for RAM. You have complex filtering requirements beyond key lookup. You needed DynamoDB or MongoDB — Redis was the right tool for a much smaller hot dataset.

You picked Neo4j and now…

You are trying to aggregate revenue across 2 billion transactions using graph traversal. Query times are minutes. The data is fundamentally tabular — relationships are not the interesting part. You needed a columnar data warehouse.

Teacher's Note

The right answer is almost always "it depends" — but that is not an excuse to avoid deciding. Use the four questions at the top of this lesson every single time, and when in doubt, choose the simplest option that solves the problem. PostgreSQL solves more problems than most engineers give it credit for. NoSQL is not an upgrade from SQL — it is a specialisation. Reach for it only when SQL's constraints are genuinely causing you pain, not before.

Practice Questions — You're the Engineer

Scenario:

A fitness app logs a GPS coordinate every 3 seconds for each active user. At peak, 200,000 users are active simultaneously — that is 66,000 writes per second. The only query the app ever makes is "get all coordinates for user X between time A and time B." Coordinates older than 30 days are never needed. You have no existing Hadoop infrastructure. Which database family and specific system should you choose?

Scenario:

Your CTO reviews the architecture diagram for your new platform and notices it uses PostgreSQL for orders, MongoDB for the product catalogue, Redis for session data, and Cassandra for event streams. She asks what architectural pattern this represents — using multiple specialised databases rather than a single general-purpose one.

Scenario:

A startup is building an invoicing tool. It has customers, invoices, line items, and payments — all tightly related with referential integrity requirements. The dataset is expected to reach 500,000 records. A junior developer suggests MongoDB because "NoSQL is more modern." What database should you actually choose, and why is the junior developer's reasoning flawed?

Quiz — Database Selection in Production

Up Next · Lesson 23

NoSQL Data Modeling

The principles that separate a schema that scales from one that falls apart at 10 million records — and how to apply them from day one.

← Previous Course Index Next →