NO SQL Lesson 11 – Key-Value Databases | Dataplexa
NoSQL Database Types · Lesson 11

Key-Value Databases

Twitter's timeline service was once the slowest part of their stack. Loading a user's feed required assembling tweets from hundreds of accounts, sorting them, and returning thousands of rows — all under 100ms. It couldn't be done from a relational database at that scale. Their solution was a key-value layer in front of everything: pre-compute each user's timeline and store it behind a single key. Feed load time dropped from 800ms to 6ms. That's the key-value database doing exactly what it was built for.

How a Key-Value Store Works Under the Hood

At its core, a key-value store is a hash map — the same data structure you've used in every programming language. You give it a key, it hashes that key to a memory address, and reads or writes at that address directly. No scanning. No searching. No index traversal.

The lookup — visualised:

Your Key

"session:u_4421"

Hash Function

SHA1("session:u_4421")

Memory Address

0x7f4a2b8c

Value Returned

{"{"}"token":"abc123"{"}"}

This entire path — from your key to the value — takes a single memory read. That's why Redis achieves 100,000+ operations per second on a single thread. There is no parsing, no index scan, no disk seek. Just a hash and a memory address.

Redis — The Most Widely Used Key-Value Database

Redis (Remote Dictionary Server) is not just a key-value store — it's a data structure server. It supports strings, lists, sets, sorted sets, hashes, streams, and more — all stored in RAM, all accessible in sub-millisecond time. Redis has been the most-loved database in the Stack Overflow Developer Survey for 7 consecutive years.

String

Any binary-safe value. Text, JSON, integers, serialised objects. Max 512MB per value.

Hash

A map of field→value pairs inside one key. Perfect for user objects or configuration.

List

Ordered collection. Push/pop from both ends. Message queues and activity feeds.

Set

Unique, unordered members. Union/intersection operations. Tagging, followers.

Sorted Set

Members with scores. Auto-sorted. Leaderboards, priority queues, time windows.

Stream

Append-only log of events. Consumer groups. Real-time event processing pipelines.

Strings — The Foundation

The scenario: You're building a distributed lock system. Two instances of your order processing service might try to process the same order simultaneously. You need to ensure only one instance claims an order at a time. Redis SETNX (Set if Not eXists) is the standard solution:

import redis
r = redis.Redis(host='localhost', port=6379)

def claim_order(order_id, worker_id, timeout_seconds=30):
    key = f"lock:order:{order_id}"

    # SET key value NX EX — only set if key doesn't exist
    # NX = only write if key does NOT exist (atomic check + set)
    # EX = auto-expire after timeout_seconds (prevents deadlocks)
    acquired = r.set(key, worker_id, nx=True, ex=timeout_seconds)

    return acquired is not None  # True = lock claimed, False = already taken
nx=True

This is the critical flag. NX means "only write this key if it does not already exist." The check and the write happen atomically — no other command can run between them. If two workers call this simultaneously, only one gets True. The other gets None. No race condition possible.

ex=timeout_seconds

If the worker that claimed the lock crashes before releasing it, the lock auto-expires after 30 seconds. Without this, a crashed worker would lock the order forever — a deadlock. The TTL is the safety net.

# Two workers race for the same order
worker_a = claim_order('ord_8821', 'worker-1')
worker_b = claim_order('ord_8821', 'worker-2')

print(f"Worker A got lock: {worker_a}")
print(f"Worker B got lock: {worker_b}")
Worker A got lock: True    ← claimed first
Worker B got lock: False   ← key already exists, NX rejected the write

-- lock:order:ord_8821 holds "worker-1"
-- Expires in 30 seconds automatically
-- Worker A processes the order, then deletes the key
-- Worker B can retry after lock is released

Why this can't be done cleanly in SQL: You'd need a SELECT to check existence, then an INSERT to claim. Between those two operations, another process can sneak in. You'd need explicit row-level locking with SELECT FOR UPDATE, adding transaction overhead. Redis does it atomically in a single command.

Hashes — Objects Without Serialisation

Redis Hashes let you store a whole object under one key, with individual fields accessible without deserialising the entire value. This is more efficient than storing a JSON string when you only need to update one field.

The scenario: You're caching user profile data. Profiles are read on every page load. You want to update just the last_seen field without re-serialising the whole object:

# Store a user profile as a Hash — each field accessible individually
r.hset('user:u_4421', mapping={
    'name':       'Priya Sharma',
    'email':      'priya@example.com',
    'plan':       'pro',
    'login_count': '142'
})

hset with mapping — stores multiple field-value pairs under one key in a single command. Each field is individually addressable. The entire hash is one Redis key — one TTL, one memory allocation, one eviction decision.

# Update just one field — no need to read and rewrite the whole object
r.hset('user:u_4421', 'login_count', 143)

# Read just one field
plan = r.hget('user:u_4421', 'plan')

# Read all fields at once
profile = r.hgetall('user:u_4421')
plan = b'pro'

profile = {
  b'name':        b'Priya Sharma',
  b'email':       b'priya@example.com',
  b'plan':        b'pro',
  b'login_count': b'143'
}
hget vs hgetall

hget fetches a single field — efficient when you only need one value. hgetall fetches the entire hash — use when you need the full object. Unlike JSON strings, you never have to deserialise data you didn't ask for.

r.hset('user:u_4421', 'login_count', 143)

This updates only login_count. The other three fields — name, email, plan — are untouched. Compare with storing the profile as a JSON string: you'd have to GET the string, parse it, update the field, serialise it back, and SET it. Four operations vs one. At 50,000 profile updates per second, that difference adds up fast.

Lists — Message Queues and Activity Feeds

The scenario: You're building a notification system. Background workers process notifications asynchronously. Producers push to the queue, consumers pop from it. Redis Lists give you a built-in queue with blocking pop — consumers wait efficiently without polling:

# Producer — push notification jobs to the right end of the list
def queue_notification(user_id, message):
    job = json.dumps({'user_id': user_id, 'msg': message})
    r.rpush('queue:notifications', job)  # RPUSH = push to Right end

# Push 3 jobs
queue_notification('u_001', 'Your order shipped!')
queue_notification('u_002', 'New follower: Priya')
queue_notification('u_003', '50% off — today only')

rpush — appends a value to the right (tail) of the list. Returns the new list length. Multiple workers can push simultaneously — Redis is single-threaded internally so all commands are serialised, no concurrent write conflicts possible.

# Consumer — blocking pop from the left end
# BLPOP blocks for up to 30 seconds waiting for a job
# Returns immediately if a job is already in the queue
def process_notifications():
    while True:
        result = r.blpop('queue:notifications', timeout=30)
        if result:
            _, job_json = result
            job = json.loads(job_json)
            send_notification(job['user_id'], job['msg'])
-- Queue state after 3 pushes:
queue:notifications = [
  '{"user_id": "u_001", "msg": "Your order shipped!"}',   ← left (oldest)
  '{"user_id": "u_002", "msg": "New follower: Priya"}',
  '{"user_id": "u_003", "msg": "50% off — today only"}'   ← right (newest)
]

-- Consumer pops from left (FIFO order):
Processing: u_001 — "Your order shipped!"
Processing: u_002 — "New follower: Priya"
Processing: u_003 — "50% off — today only"
blpop (Blocking Left Pop)

The B in BLPOP means blocking. If the queue is empty, the consumer waits up to 30 seconds for a job to arrive instead of polling every few milliseconds. This eliminates CPU waste — your worker uses zero CPU while waiting. The moment a producer pushes, Redis wakes the waiting consumer instantly.

RPUSH + BLPOP = FIFO queue

Push to the right, pop from the left. First in, first out. This is a production-grade message queue with no additional infrastructure — no RabbitMQ, no SQS. For moderate throughput (under 100k messages/sec), Redis Lists handle it natively.

Persistence — RAM is Volatile, Redis Has a Plan

The obvious concern with an in-memory database: what happens when the server restarts? Redis has two persistence strategies — and most production deployments use both:

RDB — Snapshot (Point-in-Time)

# redis.conf
save 900 1 # snapshot if 1+ change in 15min
save 300 10 # snapshot if 10+ changes in 5min
save 60 10000 # snapshot if 10k+ changes in 1min

How it works: Redis forks a child process that writes the entire dataset to a binary file (dump.rdb) on disk. The parent keeps serving requests. On restart, Redis loads the snapshot.

Trade-off: Fast restart, compact file. But you can lose up to 5 minutes of writes if the server crashes between snapshots.

AOF — Append-Only File (Every Write)

# redis.conf
appendonly yes
appendfsync everysec # flush to disk every second
# or: appendfsync always (every write)
# or: appendfsync no (OS decides)

How it works: Every write command is appended to a log file. On restart, Redis replays the log to reconstruct the dataset.

Trade-off: Maximum 1 second of data loss with everysec. Larger file size. Slower restart on very large datasets. But much stronger durability guarantee.

Production recommendation:

Use RDB + AOF together. On restart, Redis prefers the AOF file (more complete). The RDB snapshot provides a fast backup baseline. This gives you both fast restarts and near-zero data loss.

Key Expiry — The TTL System

TTL (Time To Live) is one of Redis's most powerful features. Every key can have an expiry time — Redis automatically deletes it when the time is up. No cron jobs. No cleanup scripts. Here's the full TTL toolkit:

# Set TTL at creation time
r.set('otp:u_4421', '847291', ex=300)        # expire in 5 minutes

# Add TTL to an existing key
r.expire('session:u_4421', 1800)             # expire in 30 minutes

# Check remaining TTL
ttl = r.ttl('otp:u_4421')                   # returns seconds remaining

# Make a key permanent again (remove TTL)
r.persist('session:u_4421')                 # key now lives forever

# Set expiry as an absolute Unix timestamp
r.expireat('report:jan_2024', 1735689600)   # expires Jan 1 2025
ttl = 287   # 287 seconds remaining (300 - 13 seconds elapsed)

# Special TTL return values:
# -1 = key exists with no expiry (persistent)
# -2 = key does not exist

How Redis handles expiry internally:

Lazy expiry

When you access a key, Redis checks if it's expired. If yes, it deletes it and returns nil. This means expired keys don't cost CPU until they're accessed.

Active expiry (background sweep)

Every 100ms, Redis randomly samples a set of keys with TTLs and deletes expired ones. This prevents memory from filling up with expired but never-accessed keys. Combined with lazy expiry, almost no expired key lives more than a few hundred ms past its TTL.

Key Design Patterns — Naming Your Keys Well

Redis has no namespacing at the database level beyond 16 logical databases (db 0–15). Good key naming is how you organise everything in one Redis instance. The community standard is colon-separated namespacing:

Pattern Example Key Used For
type:id user:u_4421 Simple entity lookup
type:id:field session:u_4421:token Specific attribute of an entity
feature:scope:id rate:api:ip_203.0.113.5 Feature-scoped counters
queue:name queue:email:notifications Job queues — scannable by pattern
cache:version:key cache:v3:homepage Versioned cache — invalidate by bumping version

Teacher's Note

The biggest mistake I see teams make with Redis is using it as a primary database. Redis is brilliant as a cache, a session store, a queue, a leaderboard, a rate limiter — but it's not designed to be the source of truth for data that can't be reconstructed. Always ask: if this Redis instance went down and the RDB/AOF files were lost, could I rebuild this data from somewhere else? If the answer is no, either add AOF persistence or reconsider what's in Redis vs your primary database.

Practice Questions — You're the Engineer

Scenario:

You're implementing a distributed lock in Redis. Two worker processes might try to acquire the same lock simultaneously. You need the write to succeed only if the key does not already exist — and this check-and-set must be atomic. Which Redis SET flag ensures this?


Scenario:

Your Redis instance stores financial calculation results that are expensive to recompute. The RDB snapshot runs every 5 minutes. Your team decides that losing up to 5 minutes of data on a crash is unacceptable — you need maximum durability with at most 1 second of data loss. Which Redis persistence mode should you enable?


Scenario:

You store user profiles in Redis as JSON strings. Every time a user's last_active timestamp updates (which happens on every request), you GET the string, parse it, update the field, serialise it, and SET it back — 4 operations for one field update. A colleague suggests switching to Redis Hashes. Which single Redis command would let you update just the last_active field without touching any other profile data?


Quiz — Key-Value in Production

Scenario:

You're storing one-time passwords (OTPs) in Redis for two-factor authentication. Each OTP must expire after exactly 5 minutes. A junior developer suggests running a separate cleanup process every minute to delete expired OTPs. What is the cleaner, more reliable Redis-native approach?

Scenario:

A developer asks: "We use r.incr('visit_count') to count page views. 5,000 requests hit this simultaneously. How does Redis ensure the count is accurate and not affected by race conditions?" What is the correct explanation?

Scenario:

Your image processing service has producer processes that queue resize jobs and worker processes that pick them up. Workers currently poll a Redis key every 100ms looking for new jobs — causing 600 Redis reads per minute per worker with no jobs to process. What is the correct Redis pattern to eliminate this wasted polling?

Up Next · Lesson 12

Redis Introduction

A full hands-on tour of Redis — installation, CLI, Sorted Sets, Pub/Sub, pipelines, and the Lua scripting that makes complex atomic operations possible.