NO SQL Lesson 24 – Query-Driven Design | Dataplexa
Data Modeling & Design · Lesson 24

Query-Driven Design

Knowing that NoSQL requires query-first thinking is one thing. Sitting down with a blank whiteboard, a list of product requirements, and actually building a schema from scratch is another. This lesson is a complete, worked methodology — from requirements to production-ready tables — using a real application so you can follow the same process on your next project.

The Query-Driven Design Process — Four Steps

Every NoSQL schema design follows the same four-step process. Skip a step and you will be rewriting tables in production.

1

Identify the entities

What are the real-world things your application manages? Users, orders, products, messages, events. List them — this gives you your vocabulary.

2

List every access pattern

Sit with every team that touches the data — product, analytics, ops. For each feature, write the exact query: what you filter by, what you sort by, what you return, and how often it runs.

3

Map each access pattern to a table

Each distinct query gets its own table. The filter field becomes the partition key. The sort field becomes the clustering column. The fields you return become the table columns.

4

Validate partition size and distribution

Check that no partition grows unboundedly. Check that your partition key has high cardinality. Estimate the maximum rows per partition — if it could exceed a few hundred MB, add a bucketing dimension.

The Application — A Music Streaming Service

We will use a single application throughout this entire lesson and build its complete Cassandra schema from scratch. The app is a music streaming service — think Spotify at scale. Tens of millions of users, billions of play events, a catalogue of 100 million tracks.

Step 1 — The entities: Users, Tracks, Artists, Albums, Playlists, Play Events. These are the things the app manages.

Step 2 — The access patterns. After sitting with the product, analytics, and mobile teams, you come out with this list:

# Access Pattern Filter by Sort by Frequency
AP1 Get a user's listening history user_id played_at DESC Every page load
AP2 Get top tracks for an artist artist_id play_count DESC Every artist page view
AP3 Get all tracks in a playlist playlist_id position ASC Every playlist open
AP4 Get global play counts per track today date play_count DESC Charts page, every hour

Step 3 — Translating Access Patterns into Tables

The scenario: You are the lead engineer on the streaming platform's backend team. Your tech lead has signed off on the access patterns above. Now you need to translate each one into a Cassandra table definition. You are designing for 50 million users and 5 billion play events per month — every partition decision matters.

-- AP1: "Get a user's listening history, newest first"
-- Filter: user_id  |  Sort: played_at DESC
CREATE TABLE listening_history (
  user_id   TEXT,
  played_at TIMESTAMP,
  track_id  TEXT,
  track_name TEXT,
  artist    TEXT,
  duration_s INT,
  PRIMARY KEY (user_id, played_at, track_id)
) WITH CLUSTERING ORDER BY (played_at DESC, track_id ASC)
  AND default_time_to_live = 7776000; -- 90-day TTL
Created table listening_history
Took 0.731 seconds
PRIMARY KEY (user_id, played_at, track_id)

user_id is the partition key — all of one user's history lives on one node. played_at is the first clustering column — it physically sorts the rows within the partition. track_id is appended to ensure uniqueness if two tracks are played at the exact same millisecond (unlikely but possible).

default_time_to_live = 7776000

90 days in seconds. Play events older than 90 days are automatically deleted by Cassandra — no cleanup job needed. This keeps the partition size bounded: a user who streams 8 hours a day will never accumulate more than 90 days × 8 hours × ~15 tracks/hour ≈ 10,800 rows. Predictable, manageable, forever.

-- AP2: "Get top tracks for an artist, sorted by play count"
-- Filter: artist_id  |  Sort: play_count DESC
CREATE TABLE top_tracks_by_artist (
  artist_id  TEXT,
  play_count BIGINT,
  track_id   TEXT,
  track_name TEXT,
  album      TEXT,
  duration_s INT,
  PRIMARY KEY (artist_id, play_count, track_id)
) WITH CLUSTERING ORDER BY (play_count DESC, track_id ASC);
Created table top_tracks_by_artist
Took 0.612 seconds
play_count as clustering column — the sorting trick

Cassandra physically stores rows in clustering column order. Putting play_count DESC means the most-played track is always the first row in the partition — reading the top 10 tracks is a 10-row sequential read with zero computation. In SQL you would sort at query time; here the sort is baked into the storage layout at write time.

Updating play_count — the delete-then-insert pattern

Because play_count is a clustering column, you cannot UPDATE it in place — changing it would change the row's position in the sort order. The correct pattern is DELETE the old row and INSERT a new one with the updated count. This is standard Cassandra practice for mutable sort keys.

-- AP3: "Get all tracks in a playlist, in order"
-- Filter: playlist_id  |  Sort: position ASC
CREATE TABLE playlist_tracks (
  playlist_id TEXT,
  position    INT,
  track_id    TEXT,
  track_name  TEXT,
  artist      TEXT,
  duration_s  INT,
  added_by    TEXT,
  PRIMARY KEY (playlist_id, position, track_id)
) WITH CLUSTERING ORDER BY (position ASC, track_id ASC);
Created table playlist_tracks
Took 0.584 seconds
position as clustering column

The playlist's track order is preserved on disk — track at position 1 is literally the first row, position 2 is second. When the app renders the playlist, it reads rows sequentially in physical storage order. No sorting, no ranking function, no window clause. The data is already in the right order.

Partition size check

A playlist with 10,000 tracks (extreme edge case) is still a small partition — a few MB at most. No bucketing needed here. If this were a table storing every song ever added/removed across all playlists, you would need to add a time bucket. Know your data growth shape before you model.

-- AP4: "Global play counts per track, for today's charts"
-- Filter: date  |  Sort: play_count DESC
-- Using composite partition key to bound partition size
CREATE TABLE daily_charts (
  chart_date  DATE,
  play_count  BIGINT,
  track_id    TEXT,
  track_name  TEXT,
  artist      TEXT,
  PRIMARY KEY (chart_date, play_count, track_id)
) WITH CLUSTERING ORDER BY (play_count DESC, track_id ASC)
  AND default_time_to_live = 2592000; -- 30-day TTL
Created table daily_charts
Took 0.601 seconds
chart_date as partition key — one day, one partition

Every track that gets played today is in the same partition — rows sorted by play_count DESC so the most-streamed track is first. The charts page reads the top 100 rows: a single sequential partition read. After 30 days the TTL fires and that day's chart is automatically deleted.

Partition size: 100 million tracks × ~50 bytes per row ≈ 5GB

This is the partition size check from Step 4. A 5GB partition is too large for Cassandra — Cassandra's recommended maximum is 100–500MB per partition. In production this table would be redesigned to use a composite partition key like (chart_date, shard) where shard is a number 0–99, distributing tracks across 100 partitions per day.

Step 4 — Partition Size Validation in Practice

The scenario: Your tech lead flags that the daily_charts partition could reach 5GB on a day with 100 million active tracks. You need to redesign it with a shard bucket to cap each partition at a safe size — and update the write and read logic accordingly.

import hashlib

# Shard the daily_charts partition into 100 buckets
# Each shard holds ~1M tracks → ~50MB per partition (safe)
NUM_SHARDS = 100

def get_shard(track_id: str) -> int:
    """Deterministic shard from track_id — same track always same shard."""
    return int(hashlib.md5(track_id.encode()).hexdigest(), 16) % NUM_SHARDS

# Write to daily_charts_sharded
def record_play(session, track_id, track_name, artist, play_count, date):
    shard = get_shard(track_id)
    session.execute("""
        INSERT INTO daily_charts_sharded
          (chart_date, shard, play_count, track_id, track_name, artist)
        VALUES (%s, %s, %s, %s, %s, %s)
    """, (date, shard, play_count, track_id, track_name, artist))
Shard for 'track_abc123': 47
Shard for 'track_xyz789': 12
100M tracks distributed across 100 shards → ~50MB per partition ✓
hashlib.md5(track_id.encode()).hexdigest()

MD5 hash of the track ID modulo 100 gives a deterministic shard number between 0 and 99. The same track always maps to the same shard — critical for reads. If the shard were random, you could never find the row again. MD5 is fast and its output is uniformly distributed, so all 100 shards receive roughly equal data.

The read trade-off — scatter-gather

To build today's top 100 chart, you now need to query all 100 shards and merge their results in application code. This is called a scatter-gather read — more complex than a single partition read, but acceptable for a charts page that runs once per hour. The write safety (bounded partitions) justifies the read complexity.

The Complete Schema — Side by Side

Music Streaming Service — Final Schema Summary

listening_history

PK: user_id

CK: played_at DESC

Answers AP1. TTL 90 days.

Max partition: ~10,800 rows / user

top_tracks_by_artist

PK: artist_id

CK: play_count DESC

Answers AP2. Delete-insert on update.

Max partition: tracks per artist (~500 typical)

playlist_tracks

PK: playlist_id

CK: position ASC

Answers AP3. Order baked in at write time.

Max partition: tracks per playlist (typically <1,000)

daily_charts_sharded

PK: (chart_date, shard)

CK: play_count DESC

Answers AP4. Scatter-gather on read.

Max partition: ~1M tracks / shard ≈ 50MB ✓

Query-Driven Design vs SQL Design — The Key Differences

Aspect SQL / Relational Query-Driven NoSQL
Starting point Model entities and relationships Model the queries the app will run
Tables One table per entity, normalised One table per access pattern
Duplication Minimised — single source of truth per field Accepted — same data in multiple tables
Sorting Done at query time with ORDER BY Baked into the storage layout at write time
Adding a new query Write a new SELECT — schema unchanged May need a new table + backfill of historical data
Read performance Flexible but degrades at scale with JOINs Consistently fast — single partition reads

Teacher's Note

The biggest sign of a query-driven design done right is that every production query in your application looks almost identical to the table definition — WHERE partition_key = X, optional AND clustering_col > Y, and a LIMIT. If your queries look like they need ALLOW FILTERING, multiple round trips, or post-query sorting in application code, go back to step 2 and re-examine whether you captured the access pattern correctly.

Practice Questions — You're the Engineer

Scenario:

Your team is designing the top_tracks_by_artist table. The product requirement says the artist page must always show tracks sorted highest-plays to lowest with no delay. Rather than sorting at query time, your tech lead says to bake the sort order into the physical storage layout using a specific Cassandra column type. Which column in the table schema achieves this?


Scenario:

After sharding the daily_charts table into 100 partitions to prevent a 5GB single partition, a new engineer asks how the charts page now builds its top 100 list. You explain that the app must query all 100 shards and merge the results in application code. What is this read pattern called?


Scenario:

A developer on your team tries to run UPDATE top_tracks_by_artist SET play_count = 5000001 WHERE artist_id = 'art_99' AND play_count = 5000000 AND track_id = 'trk_42'. The query executes but the sort order on the artist page is now broken — the track appears in the wrong position. You explain that because play_count is a clustering column, updating it in place does not reposition the row. What is the correct write pattern?


Quiz — Query-Driven Design in Production

Scenario:

Your Cassandra-backed order management system has one table: orders_by_customer, partitioned by customer_id. Three months after launch, the fulfilment team asks for a dashboard showing all orders with status 'pending' sorted by order date — a query your current schema cannot serve without a full table scan. What is the correct approach?

Scenario:

A streaming app stores every play event ever for each user in a table partitioned by user_id with no TTL. The app has been live for three years. One RegionServer node starts falling behind on compaction, read latency on that node spikes, and you trace the problem to a small number of accounts. One user with an account since day one has 180 million play events. What is the root cause?

Scenario:

A junior engineer reviews the top_tracks_by_artist table and asks why querying the top 10 tracks for an artist with 50,000 tracks takes the same 2ms as querying an artist with only 10 tracks. She expected the 50,000-track artist to be slower because "there is more data to sort through." How do you explain the constant-time performance?

Up Next · Lesson 25

Embedding vs Referencing

The decision that shapes every MongoDB schema — when to nest data inside a document and when to keep it in a separate collection.