NoSQL
Problems with RDBMS
A fintech startup is running smoothly on PostgreSQL. 2 million users. Clean schema. Fast queries. Then they hit 50 million users. Their DBA runs one routine command — ALTER TABLE users ADD COLUMN loyalty_tier VARCHAR(20) — and the entire app freezes for 4 hours. No payments. No logins. No anything. This lesson is about exactly why that happens, and the four other walls you'll hit when you push a relational database past what it was designed for.
What RDBMS Was Built For — and What Changed
Relational databases were engineered in the 1970s around a very specific world: predictable data, moderate scale, strong consistency. Think bank ledgers, employee payroll, inventory counts. That world still exists — and SQL still dominates it.
But the modern web added three things SQL never planned for: unpredictable data shapes, planet-scale write volumes, and the need to spread data across hundreds of machines. Those three things expose five specific cracks in the relational model. Let's look at each one with real code.
Problem 1 — The Schema Lock
In SQL, every row in a table must match the same schema. That sounds reasonable until your product evolves — and products always evolve. Adding a new column to a small table is instant. Adding it to a 200-million-row table is a different story entirely.
The scenario: You're a backend engineer at a payments company. Your product team wants to add a loyalty_tier field to the users table. The table has 180 million rows. Your DBA runs the migration on staging first to check. Here's what happens:
-- Step 1: Check how big the table is before we touch it
SELECT COUNT(*) FROM users;
count ----------- 180,442,819 (1 row) Time: 2340ms
What this tells you:
180 million rows. Even just counting them took 2.3 seconds. Now watch what happens when you try to add a column.
-- Step 2: Add the new column
-- This looks innocent. It is not.
ALTER TABLE users ADD COLUMN loyalty_tier VARCHAR(20);
-- What PostgreSQL does internally: -- 1. Acquires an ACCESS EXCLUSIVE lock on the table -- 2. Rewrites every single row to include the new column -- 3. Updates every index that touches this table -- 4. Releases the lock -- Every read AND write to users table is BLOCKED during this time ALTER TABLE Time: 14400000ms -- that's 4 hours
What just happened — and why it's painful:
ACCESS EXCLUSIVE lock
PostgreSQL has to guarantee that no row gets the old schema while others get the new schema. So it locks the entire table — nobody can read or write until the migration finishes. With 180M rows, that's hours.
Rewrites every single row
SQL doesn't just add a column header. It physically rewrites every row on disk to include space for the new field — even rows where the value will be NULL. 180 million disk writes. Every. Single. One.
The app during those 4 hours
Every user trying to log in, pay, or view their account gets an error or a spinning loader. Your on-call phone is ringing. Your CEO is awake. This is a real scenario that happens to real companies every month.
In NoSQL: You just start storing loyalty_tier in new documents. Old documents don't have it — that's fine. Zero downtime. Zero migrations. Zero 4am incidents.
Problem 2 — Join Performance at Scale
Joins are SQL's superpower — and its Achilles heel. A 3-table join on small tables is fast. The same join on tables with tens of millions of rows each starts to crawl. And the data you need is almost always spread across multiple tables.
The scenario: You're building a dashboard for an e-commerce platform. The product manager wants to see each order, the customer name, the product name, the warehouse it shipped from, and the courier assigned. That requires joining 5 tables. Here's what that query looks like:
-- A real dashboard query — 5 table join
SELECT
o.id AS order_id,
u.name AS customer,
p.title AS product,
w.location AS warehouse,
c.name AS courier
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN products p ON o.product_id = p.id
JOIN warehouses w ON o.warehouse_id = w.id
JOIN couriers c ON o.courier_id = c.id
WHERE o.created_at > NOW() - INTERVAL '30 days';
order_id | customer | product | warehouse | courier ----------+---------------+-------------------+--------------+--------- 10044221 | Priya Sharma | Wireless Keyboard | London North | DHL 10044222 | James Okafor | USB-C Hub | Manchester | FedEx 10044223 | Lin Wei | Laptop Stand | London South | UPS ... (1,204,882 rows) Time: 38,400ms -- 38 seconds to load a dashboard
Why 38 seconds? Let's trace what SQL actually did:
Step 1 — Scan orders
Find all orders from the last 30 days. Say that's 1.2 million rows. The database reads those from disk.
Step 2 — Join users
For each of those 1.2M orders, find the matching user. That's 1.2M lookups into the users table.
Steps 3, 4, 5 — Repeat for products, warehouses, couriers
Three more rounds of 1.2M lookups each. The database is doing roughly 4.8 million individual lookups to build your dashboard page.
In a document database: All of this data lives in one order document — customer name, product title, warehouse, courier — all embedded. One read. One document. Done in under 5ms.
Problem 3 — Vertical Scaling Hits a Ceiling
When a relational database gets slow, the traditional answer is: buy a bigger server. More RAM. Faster CPU. Bigger disk. This works — up to a point. Then you hit the ceiling.
The Cost of Scaling Up vs Scaling Out
⬆️ SQL: Scale Up (Vertical)
Cost grows exponentially. Hard ceiling. One server = one point of failure.
➡️ NoSQL: Scale Out (Horizontal)
Cost grows linearly. No ceiling. Losing one node doesn't bring down the system.
Problem 4 — Forcing Unstructured Data Into Rows
Modern applications live on APIs. Every API returns JSON. And JSON is almost never perfectly flat — it has nested objects, arrays, and optional fields. Stuffing that into SQL tables is a constant battle.
The scenario: You're integrating a third-party payments API. Every transaction comes back as a JSON object with nested billing details. You need to store it. Here's what you get from the API, and the painful workaround SQL forces on you:
// What the payments API gives you — natural JSON
{
"txn_id": "txn_9982",
"amount": 149.99,
"currency": "GBP",
"billing": {
"name": "Sara Osei",
"city": "London",
"postcode": "E1 6RF"
},
"tags": ["online", "card", "3ds-verified"]
}
The problem:
This JSON has a nested object (billing) and an array (tags). A SQL table can't store either of those natively. You have two bad options:
❌ Option A — Flatten it (ugly)
amount DECIMAL
currency VARCHAR
billing_name VARCHAR
billing_city VARCHAR
billing_postcode VARCHAR
tag_1 VARCHAR
tag_2 VARCHAR
tag_3 VARCHAR
What if a transaction has 5 tags? 10 tags? You need to guess the max and add that many columns — most of which will be NULL.
❌ Option B — Separate tables (complex)
TABLE transaction_billing
TABLE transaction_tags
-- Now every read needs 3 JOINs
-- Every write needs 3 INSERTs
-- All inside a transaction block
Three tables to store what the API gives you in one object. Complexity triples. Performance suffers. Migrations are a nightmare.
// In MongoDB — store the API response directly, zero transformation
db.transactions.insertOne({
txn_id: "txn_9982",
amount: 149.99,
currency: "GBP",
billing: { name: "Sara Osei", city: "London", postcode: "E1 6RF" },
tags: ["online", "card", "3ds-verified"]
})
What just happened:
billing: { ... } — MongoDB stores nested objects natively. No flattening, no extra table. The billing address lives inside the transaction document exactly as it arrived.
tags: [...] — Arrays are a first-class data type in MongoDB. You can even query against them: db.transactions.find({tags: "3ds-verified"}) returns all 3DS-verified transactions in one shot.
Problem 5 — Write Throughput Under Load
SQL databases are built to guarantee that every write is safe, consistent, and immediately visible to every other query. That guarantee has a cost — it requires locks, transaction logs, and disk flushes. At moderate volume, you never notice. At high volume, it becomes a bottleneck.
The scenario: You're running a live sports platform. A major football match just ended. 2 million users click "view result" at the same moment. Your app logs every page view. Here's what your PostgreSQL write log looks like under that load:
-- What PostgreSQL does for every single INSERT:
-- 1. Begin transaction
-- 2. Check constraints (foreign keys, unique, not-null)
-- 3. Acquire row-level lock
-- 4. Write to WAL (Write-Ahead Log) on disk
-- 5. Write actual row to heap
-- 6. Update all relevant indexes
-- 7. Commit and flush to disk
-- 8. Release lock
-- At 50,000 inserts/sec: steps 3-4 become the bottleneck
-- PostgreSQL performance under load (real-world benchmarks): Writes/sec Avg latency Queue depth Status ---------- ----------- ----------- ------ 1,000 2ms 0 Healthy 10,000 8ms 12 Fine 30,000 45ms 340 Degraded 50,000 320ms 2,100 Critical 80,000 timeout overflowing System down
Why the cliff at 50k writes/sec:
The WAL bottleneck: PostgreSQL writes every change to a Write-Ahead Log before writing the actual data. At high volume, this single log file becomes the chokepoint — everything queues behind it.
Lock contention: Thousands of concurrent inserts try to lock the same index pages simultaneously. They queue behind each other. Latency multiplies.
Cassandra handles this differently: It writes to an in-memory structure first, acknowledges the write immediately, and flushes to disk in the background. Same data safety — but the write path is never blocked. Cassandra routinely handles 1 million writes/sec on a 10-node cluster.
All 5 Problems — at a Glance
| Problem | What Breaks | When You Feel It | NoSQL Fix |
|---|---|---|---|
| Schema Lock | ALTER TABLE locks entire table for hours | Adding any new field at scale | Schema-less — just add fields |
| Join Performance | Multi-table JOINs do millions of lookups | Complex queries on large tables | Embed related data in one document |
| Vertical Scaling | Bigger servers cost 10x more for 2x performance | Sustained traffic growth | Horizontal sharding across nodes |
| Unstructured Data | JSON/arrays need ugly workarounds | Any modern API integration | Store JSON natively as documents |
| Write Throughput | WAL + locks bottleneck at high write rates | High-volume event/log ingestion | In-memory writes, async flush to disk |
SQL Is Not the Villain
Everything above is true — and none of it means you should replace SQL with NoSQL everywhere. That would be swapping one set of problems for another.
Keep SQL when:
- Your data has clear, stable relationships
- You need multi-table transactions (bank transfers, stock deductions)
- Your team already knows SQL deeply
- Your scale is under a few million rows per table
- Regulatory compliance requires strict ACID guarantees
Switch to NoSQL when:
- Your schema changes frequently with your product
- You need to scale writes beyond ~30k/sec
- Your data is naturally JSON or graph-shaped
- You need global distribution across regions
- High availability matters more than perfect consistency
Teacher's Note
The best engineers I've seen don't pick sides. They run PostgreSQL for their financial transactions and MongoDB for their user profiles and Redis for their cache — all in the same system. Each tool doing the job it was designed for. That's the goal of this course: give you the judgment to make those calls confidently.
Practice Questions — You're the Engineer
Scenario:
ALTER TABLE orders ADD COLUMN delivery_estimate DATE on a table with 250 million rows. The entire orders page on your app stops loading for 3 hours. What is this specific problem called?
Scenario:
Scenario:
Quiz — What Would You Do?
Scenario:
products table now has 60 columns and 55 of them are NULL for any given product. Your schema migrations are taking hours. What is the right architectural fix?
Scenario:
orders, customers, products, warehouses, and delivery_slots. It takes 42 seconds to load on a table with 80M orders. You've already added indexes. They helped a little. What is the root cause and what is the right fix?
Scenario:
Up Next · Lesson 4
SQL vs NoSQL
A head-to-head comparison with real queries — the same data problem solved in both SQL and NoSQL so you can see exactly where each one wins.