NO SQL Lesson 5 – CAP Theorem | Dataplexa

NoSQL Fundamentals · Lesson 5

CAP Theorem

It's 3am. A construction crew accidentally cuts a fibre cable between two AWS data centres. Suddenly, your London servers can't talk to your Dublin servers. Your database is split in two — and it has to make an instant decision: do I keep accepting writes and risk the two halves getting out of sync? Or do I refuse all writes until the connection is restored? That decision — made in milliseconds, automatically — is what CAP Theorem is about. And every distributed database you'll ever use has already made it.

Why This Only Matters When You Have Multiple Servers

On a single server, none of this matters. You write data, it's saved, anyone who reads gets the latest version. Simple. But the moment you add a second server — for redundancy, for performance, for scale — you've entered distributed systems territory, and a hard mathematical reality kicks in.

🖥️

Single Server — Easy

One machine. One copy of the data. Write goes in, read comes back. No coordination needed. But if this server dies — everything dies with it.

🖥️ ↔ 🖥️

Multiple Servers — Hard

Two machines. Two copies. A write on Server 1 must somehow get to Server 2. What if the connection between them breaks? Now you have a problem CAP Theorem describes.

The Three Letters — One Analogy Each

CAP stands for Consistency, Availability, and Partition Tolerance. Every database textbook makes these sound complicated. They aren't. Here's each one in plain English:

Consistency

Definition: Every read gets the most recent write. Every node in the cluster shows the same data at the same time.

Real-world analogy: You check your bank balance on your phone — it shows £500. Your partner checks the same account on their laptop at the exact same second — they also see £500. Not £480. Not £520. The exact same number. That's consistency.

Availability

Definition: Every request gets a response — always. The system never refuses or times out, even if some nodes are down.

Real-world analogy: You go to an ATM during a bank system outage. An available system lets you withdraw cash even if it can't verify the latest balance from headquarters. You always get a response. The answer might not be perfectly up to date — but you get an answer.

Partition Tolerance

Definition: The system keeps working even when the network between nodes breaks and messages are lost.

Real-world analogy: Your London office and Dublin office both run the same system. A storm cuts the undersea cable between them. Partition tolerance means both offices keep running — processing orders, serving customers — even though they can't talk to each other temporarily.

The Brutal Truth — Why You Can Only Pick Two

In 2000, computer scientist Eric Brewer proved that no distributed system can guarantee all three properties simultaneously. This became CAP Theorem. Let's prove it with a concrete scenario so it clicks permanently.

The Scenario: A network partition just happened

You have two database nodes — Node A in London, Node B in Dublin. The network between them is cut. A user writes balance = £300 to Node A. Node B still shows balance = £500 (the old value). Now another user reads from Node B.

Your database has to choose — right now:

Option 1 — Choose Consistency

Refuse the read on Node B. Return an error: "Data may be stale — try again later." The user gets no answer. But at least nobody reads wrong data.

You chose C. You sacrificed A (availability).

Option 2 — Choose Availability

Answer the read from Node B with £500. The user gets a response instantly. But it's the wrong balance — Node A already updated it to £300.

You chose A. You sacrificed C (consistency).

The point: There is no Option 3. When a partition happens, you must pick one. The database you choose has already decided which one it picks — and that decision defines its entire design philosophy.

The CAP Triangle

Every database sits on one edge of this triangle — getting the two properties at that edge, and giving up the one at the opposite corner:

CP — Consistent + Partition Tolerant

During a partition: refuses requests rather than returning stale data. Availability is sacrificed.

AP — Available + Partition Tolerant

During a partition: keeps answering but might return stale data. Consistency is sacrificed.

CA — Consistent + Available

Only works on a single node or trusted network. Not partition tolerant — breaks in distributed environments.

CP in Action — MongoDB Under a Partition

MongoDB is a CP database. When a partition happens, it chooses to protect data consistency — even if that means refusing writes temporarily. Here's exactly what that looks like in code:

The scenario: You're a backend engineer. Your MongoDB replica set has 3 nodes. One node goes down — maybe a network split, maybe a server crash. You try to write a payment record. Here's what MongoDB does:

// Connect to MongoDB replica set
const client = new MongoClient(
  "mongodb://node1,node2,node3/payments_db?replicaSet=rs0"
)

// Write with "majority" concern
// This means: confirm that MOST nodes received this write
const result = await db.collection("payments").insertOne(
  { user_id: "u_441", amount: 250, status: "confirmed" },
  { writeConcern: { w: "majority" } }
)

What each line does:

replicaSet=rs0

Tells the driver this is a replica set — multiple nodes that all hold copies of the same data. The driver knows to coordinate between them.

writeConcern: {"{ w: 'majority' }"}

This is MongoDB's consistency dial. w: "majority" means: don't confirm this write until more than half the nodes have saved it. With 3 nodes, at least 2 must confirm. If only 1 node is reachable — the write is rejected.

// What happens when node2 and node3 are unreachable
// MongoDB cannot reach "majority" — so it REFUSES the write
// This is CP behaviour: protect consistency over availability

MongoServerError: Write concern error
{
  code: 91,
  codeName: "ShutdownInProgress",
  errmsg: "Not enough data-bearing nodes are available to satisfy the write concern",
  writeConcernError: {
    code: 100,
    writeConcernError: "waiting for replication timed out"
  }
}

What just happened — this is CP:

Not enough data-bearing nodes

MongoDB detected it couldn't reach the majority. Rather than write to just one node and risk inconsistency, it threw an error. The payment record was not saved. The user sees an error. That's the trade-off: consistency over availability.

When is this the right call?

Payment systems. Medical records. Inventory counts. Anywhere where showing wrong data is worse than showing no data. A failed payment is recoverable. A double payment or a wrong balance is a compliance nightmare.

AP in Action — Cassandra Under a Partition

Cassandra is an AP database. When a partition happens, it keeps accepting writes and reads — even from nodes that might be behind. It chooses to stay available, accepting that some reads might be slightly stale.

The scenario: Same situation — network split, one node unreachable. You write a user activity event. Watch what Cassandra does differently:

from cassandra.cluster import Cluster
from cassandra import ConsistencyLevel
from cassandra.query import SimpleStatement

cluster = Cluster(['node1', 'node2', 'node3'])
session = cluster.connect('analytics')

Setup explained:

Cluster(['node1','node2','node3']) — connects to all three Cassandra nodes. The driver knows about all of them and routes requests intelligently.

session.connect('analytics') — selects the keyspace (Cassandra's equivalent of a database). All queries now run against this keyspace.

# Write with consistency level ONE
# ONE means: just one node needs to confirm — then we're done
query = SimpleStatement(
  "INSERT INTO page_views (user_id, page, ts) VALUES (%s, %s, %s)",
  consistency_level=ConsistencyLevel.ONE
)

session.execute(query, ('u_441', '/checkout', datetime.now()))

-- Even with node3 unreachable, Cassandra responds:

Write acknowledged by node1
Status: SUCCESS

-- node3 will sync when it comes back online
-- via Cassandra's "hinted handoff" mechanism

What just happened — this is AP:

ConsistencyLevel.ONE

Only one node needs to acknowledge the write. Even if two nodes are down, as long as one is alive, Cassandra writes the data and returns success. The system stays available.

hinted handoff

When node3 comes back online, Cassandra automatically syncs the missed writes to it. This is how AP systems achieve eventual consistency — data converges to the same state, just not immediately.

When is this the right call?

Page views, click tracking, activity feeds, social likes. If someone's like count shows 1,482 instead of 1,483 for 200 milliseconds — nobody is hurt. Availability matters more than perfect precision here.

What Actually Happens During a Network Partition

Here's the same event — a network split between two data centres — shown through the eyes of three different database types:

CP Database

MongoDB · Redis · HBase

🔌 Partition occurs
London ↔ Dublin connection lost

↓

🔒 DB detects split
Cannot confirm majority

↓

❌ Writes refused
Returns error to app

↓

✅ Partition heals
Normal operation resumes

Result: No stale data ever. Some requests fail during partition.

AP Database

Cassandra · CouchDB · DynamoDB

🔌 Partition occurs
London ↔ Dublin connection lost

↓

✅ Both sides keep running
Each accepts reads + writes

↓

⚠️ Data diverges
London + Dublin out of sync

↓

🔄 Partition heals
Conflict resolution runs

Result: Always available. Briefly inconsistent during partition.

CA Database

PostgreSQL · MySQL (single node)

🔌 Partition occurs
London ↔ Dublin connection lost

↓

💥 System breaks
Not designed for this scenario

↓

❌ One side goes offline
Manual intervention needed

↓

👨‍💻 DBA manually fixes
Slow, painful recovery

Result: Works perfectly until a partition. Then needs human help.

Which Type Should You Choose?

Type	Guarantees	Sacrifices	Use When	Examples
CP	Consistency + Partition tolerance	Availability	Wrong data is worse than no data. Financial, medical, inventory.	MongoDB, Redis, HBase
AP	Availability + Partition tolerance	Consistency	Always-on matters more than perfect data. Social, analytics, IoT.	Cassandra, DynamoDB, CouchDB
CA	Consistency + Availability	Partition tolerance	Single-node or trusted private network. Not for internet-scale systems.	PostgreSQL, MySQL (single node)

Teacher's Note

CAP Theorem is often taught as a strict binary choice. In practice, most modern databases let you tune the dial. MongoDB lets you lower write concern to get more availability. Cassandra lets you raise consistency level to QUORUM for more consistency. The theorem defines the limits — but within those limits, you have real control. Understanding CAP tells you which knobs exist and what each one costs you.

Practice Questions — You're the Engineer

Scenario:

Your social media app's "likes" counter shows 4,201 on one server and 4,200 on another for about 300 milliseconds after a network hiccup — then both show 4,201. The system never returned an error to any user. Which CAP type is your database?

Scenario:

During a network partition your database throws a Write concern error: not enough nodes available error and refuses to save the record. The app shows an error to the user. No stale data is ever returned. Which CAP type is this?

Scenario:

You are designing a live sports scoreboard. Millions of users watch scores update in real time. A 200ms delay in score sync between two data centres is acceptable. The scoreboard must never go down — not for a network blip, not for a node failure, not for anything. Between consistency and availability, which property must you prioritise?

Quiz — Network Just Split. What Does Your Database Do?

Up Next · Lesson 6

ACID vs BASE

Two completely different contracts a database makes with your application — and why choosing the wrong one can corrupt your data or kill your throughput.

← Previous Course Index Next →