NoSQL
Consistency vs Availability
In October 2012, a significant portion of AWS US-East went down. Services that prioritised consistency went offline entirely — they refused to serve requests rather than risk returning stale data. Services that prioritised availability kept running — they returned whatever data they had, even if it was minutes old. Neither choice was wrong. But every team that had not explicitly made a choice was caught off guard. This lesson is about making that choice deliberately, before the outage happens.
The CAP Theorem Revisited — From Theory to Production
You covered the CAP theorem in Lesson 5. The core claim: a distributed system can guarantee at most two of three properties — Consistency, Availability, and Partition tolerance. Since network partitions are unavoidable in any real distributed system, every database must choose between C and A when a partition occurs.
But the real world is messier than the theory. Systems are not permanently CP or permanently AP — most are tunable. Cassandra with QUORUM consistency behaves like a CP system. The same Cassandra cluster with ONE consistency behaves like an AP system. The decision is made per operation, not per database. The engineer who understands this is the one who sets consistency levels correctly for each use case instead of applying a single setting across everything.
The PACELC Extension
CAP only describes behaviour during a partition. PACELC extends the model to cover normal operation too: during a Partition, choose between Availability and Consistency; Else (during normal operation), choose between Latency and Consistency. Most real-world decisions are about the Else branch — you are not partitioned most of the time, but you are always making a latency vs consistency trade-off on every read and write.
Strong Consistency — Every Read Sees the Latest Write
Strong consistency means that after a write completes, every subsequent read from any node returns that write's value. There is a single, agreed-upon ordering of all operations. The database behaves as if it were a single machine regardless of how many nodes it has.
The cost is latency and availability. To achieve strong consistency in a multi-node system, writes must be coordinated across replicas before returning to the client. If a node is slow or unreachable, the write blocks or fails rather than proceeding without full consensus.
Banking & payments
Account balances must reflect every committed debit and credit. A stale read showing £500 when the true balance is £0 could approve a fraudulent transaction. Strong consistency is non-negotiable.
Inventory reservation
Two users must not both see "1 item in stock" and both have their order confirmed. The second reservation must see the result of the first. Eventual consistency would oversell.
Unique constraints
Two users registering the same username simultaneously must not both succeed. A compare-and-swap or transaction must be serialised across replicas.
Configuration & permissions
If an admin revokes a user's access, the revocation must be visible immediately on every node. A stale read that grants access to a revoked user is a security vulnerability.
Eventual Consistency — Replicas Converge Over Time
Eventual consistency means that if no new writes occur, all replicas will eventually converge to the same value. There is no guarantee about how long "eventually" takes — it could be milliseconds or, in the event of a network partition, minutes. During that window, different nodes may return different values for the same key.
The benefit is speed and availability. Writes are acknowledged the moment one replica (or a small quorum) confirms them — the write does not wait for all replicas to sync. Reads can be served from any node regardless of whether it has the latest write. The system keeps running even during a partition.
Social media likes
A post showing 4,821 likes vs 4,819 likes for a few seconds is invisible to users. The count will converge. Blocking writes on all replicas for an exact count would kill throughput.
Product catalogue
A product description update taking 200ms to propagate to all regions is undetectable by users. The catalogue is read far more than it is written — eventual consistency delivers faster reads at no meaningful cost.
Analytics & metrics
A dashboard showing yesterday's revenue as £482,100 vs £482,099 is a non-issue. Analytic aggregates tolerate seconds or even minutes of staleness without any business consequence.
User activity feeds
A follower seeing a new post two seconds after it was published rather than instantly is an invisible latency. The feed does not require real-time accuracy — availability matters far more.
Hands-on — Tuning Consistency in Cassandra
The scenario: You are the platform engineer for a fintech company running a 9-node Cassandra cluster with RF=3 across three data centres. The platform serves two very different workloads: a payment ledger that requires every read to see the latest committed balance, and a notification feed where a few seconds of staleness is fine. You need to configure the correct consistency levels for each and demonstrate the latency difference.
from cassandra.cluster import Cluster
from cassandra import ConsistencyLevel
from cassandra.query import SimpleStatement
import time
session = Cluster(['10.0.1.1']).connect('fintech')
# --- PAYMENT LEDGER: strong consistency ---
# LOCAL_QUORUM: 2 of 3 replicas in local DC must agree
# W+R > N within DC ensures we always read the latest write
payment_read = SimpleStatement(
"SELECT balance FROM accounts WHERE account_id = %s",
consistency_level=ConsistencyLevel.LOCAL_QUORUM
)
payment_write = SimpleStatement(
"UPDATE accounts SET balance = %s WHERE account_id = %s",
consistency_level=ConsistencyLevel.LOCAL_QUORUM
)
t0 = time.time()
session.execute(payment_write, (4850.00, 'acc_441'))
row = session.execute(payment_read, ('acc_441',)).one()
payment_latency = (time.time() - t0) * 1000
print(f"Payment balance: £{row.balance:.2f} latency: {payment_latency:.1f}ms")
Payment balance: £4850.00 latency: 6.2ms Write acknowledged by 2/3 local replicas (LOCAL_QUORUM) Read confirmed by 2/3 local replicas (LOCAL_QUORUM) Guarantee: read always sees latest committed write ✓
LOCAL_QUORUM for payment data
With RF=3 and LOCAL_QUORUM (W=2, R=2), W+R=4 > N=3 within the DC. The write quorum and read quorum must share at least one replica — so any read after a committed write sees that write. This is strong consistency within the DC. The latency cost is real (6.2ms vs ~1ms for ONE) but acceptable for financial data where correctness is non-negotiable.
# --- NOTIFICATION FEED: eventual consistency ---
# ONE: a single replica acknowledges the write and serves the read
# Fastest possible — acceptable for non-critical feed data
notif_read = SimpleStatement(
"SELECT body, created_at FROM notifications WHERE user_id = %s LIMIT 20",
consistency_level=ConsistencyLevel.ONE
)
notif_write = SimpleStatement(
"INSERT INTO notifications (user_id, notif_id, body, created_at) "
"VALUES (%s, uuid(), %s, toTimestamp(now()))",
consistency_level=ConsistencyLevel.ONE
)
t0 = time.time()
session.execute(notif_write, ('usr_882', 'Alice liked your post'))
rows = session.execute(notif_read, ('usr_882',))
notif_latency = (time.time() - t0) * 1000
print(f"Notification feed latency: {notif_latency:.1f}ms "
f"rows: {len(list(rows))}")
Notification feed latency: 1.1ms rows: 20 Write acknowledged by 1 replica (ONE) Read served by nearest replica (ONE) Trade-off: read may be up to ~200ms stale during replica lag ✓
ONE for notification feed — 6× faster
ONE consistency means the nearest replica responds immediately without waiting for quorum. The 1.1ms vs 6.2ms difference is the cost of quorum coordination eliminated. During normal operation this is rarely visible — replication lag is typically under 50ms and users would never notice a notification appearing 50ms late. The staleness risk is real but the business impact is zero.
Same cluster, two consistency levels
Both workloads run on the same 9-node Cassandra cluster. The consistency level is set per query in application code — not per cluster or per keyspace. This is the practical power of Cassandra's tunable consistency: you pay the latency cost only for the data that truly requires it, and keep everything else fast.
Consistency Anomalies — The Failure Modes of Eventual Consistency
Eventual consistency is not a single guarantee — it is a spectrum. Depending on the system and the configuration, you may encounter different anomalies during the lag window between write and full replication.
A read returns an old value because the replica serving the read has not yet received the latest write. A user updates their display name — a second user still sees the old name for a few hundred milliseconds. Usually invisible.
Fix: use stronger consistency level for reads that must see latest state.
A user writes data and immediately reads it back — the read goes to a different replica that has not replicated the write yet. The user sees their own update disappear. More frustrating than stale reads because it contradicts what the user just did.
Fix: route reads for a user to the same replica they wrote to, or use LOCAL_QUORUM.
A user makes two successive reads of the same data. The first returns the new value. The second — served by a different, more lagged replica — returns the old value. Time appears to go backwards. Deeply confusing and hard to debug.
Fix: pin a user's session to the same replica, or use session tokens with version checks.
Two clients write to the same key on different replicas during a partition. When the partition heals, the replicas have conflicting values. Cassandra resolves this with last-write-wins (LWW) using timestamps — the write with the later timestamp survives. If the clocks are skewed, the earlier write can overwrite the later one.
Fix: use NTP for clock sync, or use CRDTs for conflict-free merging.
The Consistency Decision Matrix
| Data type | Staleness impact | Consistency level | Example |
|---|---|---|---|
| Financial balances | Critical — fraud, loss | LOCAL_QUORUM / ALL | Wallet balance, stock price |
| Inventory counts | High — overselling | LOCAL_QUORUM | Flash sale stock levels |
| User profile data | Medium — confusing UX | LOCAL_QUORUM (writes), ONE (reads) | Display name, avatar |
| Social counts | Low — invisible to users | ONE | Like counts, follower counts |
| Event streams / logs | Negligible | ONE / ANY | Clickstream, analytics events |
| Access control / permissions | Critical — security risk | QUORUM / ALL | Role revocation, API keys |
Behaviour During a Real Partition
The scenario: Your 6-node Cassandra cluster loses connectivity between its London and Frankfurt data centres. London nodes cannot reach Frankfurt nodes. Reads and writes are still arriving from both regions. You have configured different consistency levels for different data — and the partition reveals exactly what each setting means in practice.
from cassandra import Unavailable, WriteTimeout, ReadTimeout
def safe_payment_write(session, account_id, new_balance):
"""Payment write — strong consistency, will fail during partition."""
stmt = SimpleStatement(
"UPDATE accounts SET balance = %s WHERE account_id = %s",
consistency_level=ConsistencyLevel.LOCAL_QUORUM
)
try:
session.execute(stmt, (new_balance, account_id))
return {"ok": True}
except (Unavailable, WriteTimeout) as e:
# LOCAL_QUORUM requires 2 of 3 replicas to ACK
# If the partition leaves only 1 replica reachable: FAIL
# Correct behaviour — refuse to write rather than risk inconsistency
return {"ok": False, "reason": "quorum unavailable — partition detected",
"action": "retry_after_partition_heals"}
def safe_feed_write(session, user_id, body):
"""Notification write — eventual consistency, survives partition."""
stmt = SimpleStatement(
"INSERT INTO notifications (user_id, notif_id, body, created_at) "
"VALUES (%s, uuid(), %s, toTimestamp(now()))",
consistency_level=ConsistencyLevel.ONE
)
try:
session.execute(stmt, (user_id, body))
return {"ok": True, "note": "written to local replica, will sync later"}
except WriteTimeout:
return {"ok": False, "reason": "even local replica unreachable"}
--- During London ↔ Frankfurt partition ---
safe_payment_write('acc_441', 4850.00):
{'ok': False, 'reason': 'quorum unavailable — partition detected',
'action': 'retry_after_partition_heals'}
Behaviour: CP — refuses write, preserves consistency ✓
safe_feed_write('usr_882', 'Alice liked your post'):
{'ok': True, 'note': 'written to local replica, will sync later'}
Behaviour: AP — accepts write, syncs when partition heals ✓
After partition heals:
Payment write retried successfully — balance correct
Feed notification synced to Frankfurt replicasCP behaviour — fail rather than risk inconsistency
The payment write fails because LOCAL_QUORUM requires 2 of 3 London replicas — and during the partition, only 1 is reachable from London. Refusing the write is the correct behaviour for financial data. A write that only reaches 1 of 3 replicas risks being overwritten by a conflicting write on another replica once the partition heals. The application must handle this error, queue the payment, and retry.
AP behaviour — accept write, sync later
The feed write succeeds because ONE consistency only needs one replica. The notification is stored locally and will sync to Frankfurt replicas once the partition heals. A Frankfurt user might see the notification 30 seconds late — completely acceptable. The system remains available at the cost of brief inconsistency.
Teacher's Note
The most dangerous consistency mistake is not choosing eventual consistency for the wrong data — it is applying the same consistency level to everything and never thinking about it. One team sets ONE on all Cassandra operations because it is the default and it is fast. Six months later, an auditor finds payment records with stale balances. Another team sets QUORUM on everything and wonders why their notification service is slow. Map each data type to the staleness impact it can tolerate, then set the consistency level accordingly. Write that mapping down and review it when new tables are added.
Practice Questions — You're the Engineer
Scenario:
Scenario:
Scenario:
Quiz — Consistency vs Availability in Production
Scenario:
Scenario:
Scenario:
Up Next · Lesson 32
NoSQL Performance Optimization
From slow queries to overloaded nodes — the systematic process for diagnosing and fixing performance problems in production NoSQL systems.