NO SQL Lesson 18 – Cassandra Introduction | Dataplexa

NoSQL Database Types · Lesson 18

Cassandra Introduction

In 2008, Facebook had a problem: their inbox search needed to handle 50 billion messages. Each search query had to scan millions of rows across hundreds of servers in milliseconds. Their MySQL cluster was drowning. Two engineers — Avinash Lakshman and Prashant Malik — designed a new database that borrowed ideas from both Amazon's Dynamo paper and Google's Bigtable paper. They called it Cassandra. Facebook open-sourced it in 2008. Apache adopted it in 2010. Today it handles millions of writes per second for Netflix, Instagram, Apple, Twitter, and Uber — and it's about to make complete sense to you.

What Makes Cassandra Fundamentally Different

Cassandra was engineered around one core belief: writes must never block. Every architectural decision flows from this. The result is a database that can ingest a million writes per second across a 10-node cluster — sustained, without degradation.

🌐

No single point of failure

Every node is equal. No master, no primary, no leader. Any node can accept any read or write.

📈

Linear scalability

Double the nodes, double the throughput. Cassandra scales horizontally with near-perfect linearity.

⚡

Write-optimised

Writes go to RAM first (memtable), confirmed immediately, flushed to disk in the background.

The Ring Architecture — No Masters, No Hierarchy

Cassandra organises its nodes in a ring. Each node owns a range of the token space (0 to 2^64). When you write a row, Cassandra hashes the partition key to a token, finds which node owns that token range, and writes there. Every node knows the ring topology — any node can route any request.

Cassandra Ring — 4 nodes, replication factor 2

With Replication Factor 2, every write goes to the primary node AND the next node clockwise. Node A failure? Node B still has the data. No data loss. No manual failover.

CQL — Cassandra Query Language

CQL looks like SQL — familiar syntax, familiar keywords. But it behaves completely differently under the hood. The most important difference: CQL only allows queries that use the partition key. Anything else requires an index or a full table scan — which Cassandra was not designed for.

Table Design — The Most Critical Cassandra Skill

In Cassandra, your table design is your query. You design tables around the queries you need to run — not around the data's natural relationships. This is the opposite of relational design.

The scenario: You're building a ride-hailing platform. Drivers send GPS coordinates every 5 seconds. You need two query patterns: (1) get all locations for a specific driver in the last hour, (2) get the most recent location for a driver. Here's how to design for both:

-- Connect to Cassandra
cqlsh

-- Create a keyspace (equivalent to a database)
CREATE KEYSPACE ridehailing
  WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'us-east': 3    -- 3 replicas in us-east datacenter
  };

USE ridehailing;

NetworkTopologyStrategy

The production-grade replication strategy. Distributes replicas across datacenters and racks. 'us-east': 3 means 3 replicas in the us-east datacenter. For multi-region: 'us-east': 3, 'eu-west': 3 gives 6 total replicas across two regions.

SimpleStrategy

The alternative for single-datacenter or development. WITH replication = {"{"}'class': 'SimpleStrategy', 'replication_factor': 3{"}"}. Never use SimpleStrategy in production multi-DC deployments — it doesn't understand datacenter topology.

-- Design the table around the query: "all locations for driver X in last hour"
CREATE TABLE driver_locations (
  driver_id   TEXT,       -- partition key: all rows for one driver on one node
  recorded_at TIMESTAMP,  -- clustering column: rows sorted by time within partition
  latitude    DOUBLE,
  longitude   DOUBLE,
  speed_kmh   FLOAT,
  PRIMARY KEY (driver_id, recorded_at)
) WITH CLUSTERING ORDER BY (recorded_at DESC);
-- DESC = most recent first — "latest location" query reads the first row only

PRIMARY KEY (driver_id, recorded_at)

The first element in PRIMARY KEY is always the partition key. Everything after is a clustering column. This table design means: all GPS pings from driver_id live on the same node (partition key), sorted by recorded_at (clustering column). Range queries on recorded_at are extremely fast — sequential reads on pre-sorted data.

CLUSTERING ORDER BY (recorded_at DESC)

Data is physically stored on disk in this order. DESC means newest first. Getting the latest location for a driver = reading just the first row of the partition. No sort at query time — it's already sorted. Massively efficient for "get latest" queries.

Writing Data — The Fast Path

The scenario: 80,000 drivers sending GPS pings every 5 seconds — that's 16,000 writes per second. Here's how Cassandra handles them:

from cassandra.cluster import Cluster
from cassandra import ConsistencyLevel
from cassandra.query import SimpleStatement
from datetime import datetime

cluster = Cluster(['node1', 'node2', 'node3'])
session = cluster.connect('ridehailing')

Cluster(['node1','node2','node3']) — the driver connects to all seed nodes. It downloads the ring topology from any one of them and knows exactly which node owns which token range. Future requests are sent directly to the right node — no proxy, no routing layer.

# Prepare the statement once — reuse for every write
# Prepared statements are compiled once and sent by ID on reuse — faster
insert_location = session.prepare("""
  INSERT INTO driver_locations (driver_id, recorded_at, latitude, longitude, speed_kmh)
  VALUES (?, ?, ?, ?, ?)
  USING TTL 86400
""")
# TTL 86400 = auto-delete after 24 hours (location data is temporary)

session.prepare()

Sends the CQL statement to the cluster once for compilation. Cassandra returns a prepared statement ID. On every subsequent execution, only the ID + bound values are sent — not the full CQL text. At 16,000 inserts/sec, this eliminates enormous parsing overhead.

USING TTL 86400

Every row expires and is automatically deleted after 86,400 seconds (24 hours). GPS history older than 24 hours is irrelevant — no cleanup jobs needed. Cassandra's TTL applies per-row, not per-table. You can even set different TTLs on different columns.

# Execute a write with QUORUM consistency
# QUORUM = majority of replicas must confirm (2 out of 3 with RF=3)
bound = insert_location.bind((
    'driver_d7821',
    datetime.now(),
    51.5074,       # latitude
    -0.1278,       # longitude
    42.5           # speed in km/h
))
bound.consistency_level = ConsistencyLevel.QUORUM

session.execute(bound)

-- Write path internally:
-- 1. Driver receives write request
-- 2. Hashes 'driver_d7821' to token → finds primary node
-- 3. Writes to primary node's commit log (disk) — for durability
-- 4. Writes to primary node's memtable (RAM) — for fast reads
-- 5. Forwards to RF-1 replica nodes asynchronously
-- 6. QUORUM: waits for 2/3 nodes to confirm → returns OK

Write acknowledged: ~1.8ms
No locking. No transaction log coordination across nodes.

The write path — why it's so fast:

Commit log write

A sequential append to the commit log file — the fastest possible disk write. Sequential I/O is 100x faster than random I/O. Even if the node crashes before the memtable is flushed, the commit log is replayed on restart to recover the write.

Memtable → SSTable flush

The memtable fills up in RAM, then Cassandra flushes it to an immutable SSTable file on disk in the background. SSTables are never modified — new writes always create new SSTables. No in-place updates. No random writes. This is why Cassandra sustains millions of writes/sec.

Reading Data — Partition Key First, Always

The scenario: Fetch the last hour of GPS data for a driver, and separately fetch just the most recent location:

from datetime import datetime, timedelta

one_hour_ago = datetime.now() - timedelta(hours=1)

# Query 1: last hour of locations for driver_d7821
rows = session.execute("""
  SELECT recorded_at, latitude, longitude, speed_kmh
  FROM driver_locations
  WHERE driver_id   = 'driver_d7821'
    AND recorded_at >= %s
""", (one_hour_ago,))

for row in rows:
    print(f"{row.recorded_at}: ({row.latitude}, {row.longitude}) @ {row.speed_kmh}km/h")

2024-01-15 15:59:55: (51.5074, -0.1278) @ 42.5km/h
2024-01-15 15:59:50: (51.5071, -0.1275) @ 41.2km/h
2024-01-15 15:59:45: (51.5068, -0.1272) @ 40.8km/h
...720 rows (one per 5 seconds for 1 hour)

Query time: 6ms
Rows scanned: 720 (sequential read of one partition — no cross-node lookups)

Why this query is fast: driver_id = 'driver_d7821' directs Cassandra to exactly one partition (one node). All 720 rows are co-located on that node, pre-sorted by recorded_at DESC. The range filter is a sequential read from the top of the partition. No cross-node coordination, no sorting at query time.

# Query 2: most recent location only — LIMIT 1 on DESC-ordered partition
row = session.execute("""
  SELECT recorded_at, latitude, longitude
  FROM driver_locations
  WHERE driver_id = 'driver_d7821'
  LIMIT 1
""").one()

print(f"Last seen: {row.recorded_at} at ({row.latitude}, {row.longitude})")

Last seen: 2024-01-15 15:59:55 at (51.5074, -0.1278)

Query time: 0.9ms
Rows scanned: 1 (reads first row of partition — already sorted DESC)

LIMIT 1 on a DESC-ordered partition = instant — because data is stored newest-first, the first row IS the latest entry. Cassandra reads one row and stops. 0.9ms for what would be an expensive MAX(recorded_at) aggregate in SQL.

Consistency Levels — The Tunable Dial

Cassandra's consistency level is set per-query — not per-table or per-database. This is unique: you can run the same table with strong consistency for critical reads and eventual consistency for bulk analytics, all in the same application.

Level	Nodes Required	Latency	Use When
ONE	1 of N	Fastest	Stale data acceptable. Analytics, logs, metrics.
QUORUM	⌊N/2⌋+1 of N	Medium	Strong consistency without ALL overhead. Most production reads/writes.
LOCAL_QUORUM	Majority in local DC	Low (local DC only)	Multi-DC: strong locally, eventual across regions.
ALL	All N nodes	Slowest	Financial writes where every replica must confirm. Rarely used.

The QUORUM guarantee — a worked example with RF=3:

Write QUORUM + Read QUORUM = Strong Consistency

RF=3 → QUORUM = 2 nodes. Write confirmed on 2 nodes. Read returns from 2 nodes and picks the newest. Since write touched 2 and read touches 2, at least 1 node overlap is guaranteed → you always read the latest write.

Write ONE + Read ONE = Eventual Consistency

Write confirmed on 1 node. Read from 1 node (could be a different node that hasn't received the write yet). Brief stale window. For analytics and counters — perfectly acceptable and 3x faster.

The Partition Key Problem — Hot Partitions

The most common Cassandra mistake is a poor partition key. If your partition key has low cardinality — few unique values — all the traffic goes to a small number of nodes. This is called a hot partition and it destroys performance.

❌ Hot partition — BAD design

      CREATE TABLE events (

        country TEXT,      -- BAD partition key

        event_id UUID,

        PRIMARY KEY (country, event_id)

      );

      -- "US" partition receives 300M writes

      -- "LI" (Liechtenstein) gets 100

      -- One node is melting, others idle

Country has low cardinality — only ~200 values. All US traffic to one partition. That one node becomes a bottleneck while others are idle. This is the #1 Cassandra anti-pattern.

✅ Even distribution — GOOD design

      CREATE TABLE events (

        user_id  UUID,      -- GOOD: millions of unique values

        event_id UUID,

        PRIMARY KEY (user_id, event_id)

      );

      -- Millions of unique user_ids

      -- Traffic spread evenly across all nodes

      -- Linear scalability achieved

High-cardinality partition key = even distribution. Each user's events live on one node (fast reads) but different users are spread across all nodes (no bottleneck).

Teacher's Note

The single biggest mindset shift when learning Cassandra: stop thinking about your data model and start thinking about your query patterns. In SQL you design a normalised schema and let the query planner figure out how to answer any query. In Cassandra, you design tables for specific queries — one table per query pattern is a common approach. If you have 5 different query patterns, you might have 5 tables, each storing the same data in a different shape. Duplication is intentional. Cassandra trades storage space for query performance, and at its scale, that trade-off is absolutely worth it.

Practice Questions — You're the Engineer

Scenario:

You're designing a Cassandra table for storing user activity events. Your most common query is "get all events for user X in the last 30 days." You define a composite PRIMARY KEY. The first element of the key determines which node stores a row and co-locates all rows with the same value on the same node. What is this first element called?

Scenario:

Your Cassandra cluster has a Replication Factor of 3. You need strong consistency — every read must return the most recently written value. You decide to use the same consistency level for both writes and reads. Which consistency level, when used for both writes and reads with RF=3, mathematically guarantees that the read will always overlap with at least one node that received the write?

Scenario:

An e-commerce team creates a Cassandra table with product_category as the partition key. They have 20 categories but one category ("electronics") receives 70% of all writes. One Cassandra node is consistently at 95% CPU while the others sit at 10%. Query latency on electronics is degrading. What is this anti-pattern called?

Quiz — Cassandra Under Pressure

Up Next · Lesson 19

HBase Overview

Google's Bigtable paper implemented in open source — how HBase integrates with the Hadoop ecosystem, its column family architecture, and why it's the database of choice for petabyte-scale analytics workloads.

← Previous Course Index Next →