NO SQL Lesson 13 – DynamoDB Introduction | Dataplexa
NoSQL Database Types · Lesson 13

DynamoDB Introduction

Prime Day 2023. Amazon processes over 375 million items in 48 hours. At peak, that's millions of cart reads and writes per second — across every country, every device, every payment method simultaneously. The database underneath all of it is DynamoDB. Amazon built it internally in 2007, made it public in 2012, and it has been the backbone of the world's largest e-commerce platform ever since. This lesson teaches you exactly how it works — and how to use it.

What Makes DynamoDB Different

DynamoDB is a fully managed, serverless key-value and document database. "Fully managed" means AWS handles everything — servers, patching, replication, scaling. You never SSH into a DynamoDB node. You never set up replication. You just use it.

Single-digit ms

Guaranteed latency at any scale. 10 users or 10 million — same response time.

Unlimited scale

No maximum table size. AWS automatically partitions your data as it grows.

🔧

Zero administration

No servers to manage. No index rebuilds. No vacuum. No connection pools.

The Core Concept — Tables, Items, and Attributes

DynamoDB uses its own terminology. Before writing a single line of code, these three concepts must be clear:

DynamoDB Term SQL Equivalent What It Actually Is
Table Table A collection of items. No fixed schema — items can have different attributes.
Item Row A collection of attributes. Like a JSON object. Max 400KB per item.
Attribute Column A name-value pair. Can be string, number, binary, boolean, list, map, or set.
Partition Key Primary Key The mandatory unique identifier. DynamoDB hashes this to decide which server stores the item.
Sort Key Composite PK (second part) Optional. When combined with partition key, enables range queries within a partition.

Partition Keys and Sort Keys — The Design That Changes Everything

DynamoDB's performance entirely depends on your key design. This is the most important concept in the entire lesson. Get it right and queries are instant. Get it wrong and you'll scan entire tables.

The two key patterns:

Simple Primary Key (Partition Key only)

Table: users
PK: user_id (String)

{ user_id: "u_441", name: "Priya" }
{ user_id: "u_882", name: "Carlos" }

Fetch: GetItem(user_id="u_441")
Result: instant — direct hash lookup

Use when you always fetch by one unique ID. One item per partition key value.

Composite Primary Key (Partition Key + Sort Key)

Table: orders
PK: user_id (String)
SK: order_date (String)

{ user_id:"u_441", order_date:"2024-01-15", total:£49 }
{ user_id:"u_441", order_date:"2024-02-03", total:£120 }

Fetch: Query(user_id="u_441", order_date BETWEEN...)
Result: all orders for u_441 in date range

Use when you need to query a set of related items by range. Multiple items share the same partition key.

The golden rule of DynamoDB key design:

Design your keys around your most frequent query pattern. Unlike SQL where you can query any column, DynamoDB is fast only when you query by partition key. Everything else requires a scan or a secondary index.

Creating a Table and Writing Items

The scenario: You're building the order management system for an e-commerce platform. Orders need to be retrieved by customer — all orders for a specific user, filtered by date range. The composite key pattern is the right design:

import boto3
from datetime import datetime

# Connect to DynamoDB
dynamodb = boto3.resource('dynamodb', region_name='eu-west-1')

# Create the orders table
table = dynamodb.create_table(
    TableName='orders',
    KeySchema=[
        {'AttributeName': 'user_id',    'KeyType': 'HASH'},   # partition key
        {'AttributeName': 'order_date', 'KeyType': 'RANGE'}   # sort key
    ],
    AttributeDefinitions=[
        {'AttributeName': 'user_id',    'AttributeType': 'S'},  # S = String
        {'AttributeName': 'order_date', 'AttributeType': 'S'}
    ],
    BillingMode='PAY_PER_REQUEST'   # on-demand pricing — no capacity planning
)
KeyType: 'HASH' vs 'RANGE'

AWS uses HASH for partition key and RANGE for sort key — legacy naming from DynamoDB's origins. Just remember: HASH = partition key, RANGE = sort key. Only these two attributes need to be in AttributeDefinitions — all other item attributes are schema-free and don't need to be declared here.

BillingMode: 'PAY_PER_REQUEST'

On-demand mode — you pay per read/write operation, not for reserved capacity. No capacity planning, no throttling, scales instantly. For most applications this is simpler and cost-effective. The alternative is PROVISIONED — you specify read/write capacity units upfront, cheaper at very high sustained throughput.

table = dynamodb.Table('orders')

# Write an order — note: all attributes beyond keys are schema-free
table.put_item(Item={
    'user_id':    'u_441',            # partition key — required
    'order_date': '2024-01-15',       # sort key — required
    'order_id':   'ord_8821',         # additional attributes — any shape
    'status':     'delivered',
    'total':      Decimal('149.99'),  # use Decimal for numbers in DynamoDB
    'items': [                        # lists and nested objects supported
        {'sku': 'TEE-M-BLUE', 'qty': 2, 'price': Decimal('29.99')},
        {'sku': 'HAT-L-RED',  'qty': 1, 'price': Decimal('19.99')}
    ]
})
Decimal('149.99')

DynamoDB's Python SDK requires Decimal for floating-point numbers — Python's float type can introduce precision errors. Always import from decimal import Decimal and use it for any monetary or decimal value in DynamoDB.

items: [{"{"}"sku": ..., "qty": ..., "price": ...{"}"}]

Lists and nested maps are native DynamoDB types. The entire order — including line items — is one item in the table. No separate order_items table. No JOIN needed to get the full order. One get_item call returns everything.

Reading Data — GetItem vs Query vs Scan

DynamoDB has three ways to read data. Knowing which one to use — and which one to avoid — is critical for both performance and cost.

GetItem — Fetch one exact item

O(1) — Always use this if you can

Provide the full primary key (partition key + sort key if composite). Returns exactly one item. Uses the hash function to go directly to the storage partition — no scanning. Sub-millisecond at any table size.

Query — Fetch items in one partition

O(log N) — Use for range access

Provide the partition key + a condition on the sort key. Returns multiple items from one partition sorted by the sort key. Efficient — reads only the relevant partition. Use for "all orders by user X between date A and B."

Scan — Read every item in the table

O(N) — Avoid in production

Reads every single item in the table and applies a filter afterwards. Expensive: you're billed for every item read even if it's filtered out. Slow on large tables. Only acceptable for one-time data exports or tables under a few thousand items.

The scenario: Retrieve a specific order by user and date (GetItem), then get all orders for a user in January (Query):

# GetItem — fetch one exact order
response = table.get_item(
    Key={
        'user_id':    'u_441',       # partition key
        'order_date': '2024-01-15'   # sort key
    }
)
order = response.get('Item')
print(f"Order total: £{order['total']}")
Order total: £149.99

-- Consumed capacity: 0.5 RCU (read capacity units)
-- Latency: 2.1ms
-- Items scanned: 1 (direct hash lookup — no scanning)

response.get('Item') — returns None if the item doesn't exist, rather than raising an exception. Always use .get('Item') rather than response['Item'] to avoid KeyError on missing items.

from boto3.dynamodb.conditions import Key

# Query — all orders for user u_441 in January 2024
response = table.query(
    KeyConditionExpression=
        Key('user_id').eq('u_441') &                          # partition key — exact match
        Key('order_date').between('2024-01-01', '2024-01-31') # sort key — range condition
)

orders = response['Items']
print(f"Orders in January: {len(orders)}")
Orders in January: 4

-- Response Items:
[
  { user_id: "u_441", order_date: "2024-01-03", total: Decimal("89.99") },
  { user_id: "u_441", order_date: "2024-01-15", total: Decimal("149.99") },
  { user_id: "u_441", order_date: "2024-01-22", total: Decimal("34.50") },
  { user_id: "u_441", order_date: "2024-01-29", total: Decimal("220.00") }
]

-- Consumed capacity: 2 RCU (read only u_441's partition)
-- Latency: 3.8ms
-- Items scanned: 4 (only items matching the partition key)
Key('order_date').between('2024-01-01', '2024-01-31')

Sort key conditions: eq, lt, lte, gt, gte, between, begins_with. The between is inclusive on both ends. Sort keys work because items within a partition are physically stored in sort key order — range queries are sequential reads.

Key('user_id').eq('u_441') & Key('order_date').between(...)

The & operator combines key conditions. The partition key condition must always be an exact match (eq). Only the sort key condition can be a range. You cannot range-query on the partition key alone — that would require scanning all partitions.

Updating Items — Granular Field Updates

The scenario: An order's status changes to "shipped". You need to update just the status field and add a tracking number — without overwriting the entire item:

# Update specific attributes — leave everything else untouched
table.update_item(
    Key={
        'user_id':    'u_441',
        'order_date': '2024-01-15'
    },
    UpdateExpression='SET #s = :status, tracking = :tracking, updated_at = :ts',
    ExpressionAttributeNames={
        '#s': 'status'              # 'status' is a reserved word in DynamoDB
    },
    ExpressionAttributeValues={
        ':status':   'shipped',
        ':tracking': 'DHL-9921-EU',
        ':ts':       datetime.now().isoformat()
    }
)
UpdateExpression='SET #s = :status, tracking = :tracking'

The SET action adds or replaces specific attributes. Only the named attributes change — everything else in the item (items list, total, order_id) stays exactly as it was. No full-item replacement. DynamoDB also supports REMOVE (delete an attribute), ADD (increment a number or add to a set), and DELETE (remove set elements).

ExpressionAttributeNames: {"{"}'#s': 'status'{"}"}

DynamoDB has reserved words — status, name, type, count and many others. If your attribute name is a reserved word, you must use an expression attribute name (prefixed with #) as an alias. This is a common gotcha that causes confusing validation errors.

Global Secondary Indexes — Querying Non-Key Attributes

DynamoDB only lets you query efficiently by the table's primary key. But what if you need to find all orders with status "shipped"? Status is not the partition key — so a table query would require a full scan. The solution is a Global Secondary Index (GSI) — a separate copy of the data organised by a different key.

Think of a GSI as a second table with a different primary key, automatically kept in sync by DynamoDB:

Main Table
PK: user_id
SK: order_date

Fast: "orders by user"
Slow: "orders by status"
→ GSI →
GSI: status-index
PK: status
SK: order_date

Fast: "orders by status"
Fast: "shipped in Jan"
# Query the GSI — find all shipped orders in January
response = table.query(
    IndexName='status-date-index',           # name of the GSI
    KeyConditionExpression=
        Key('status').eq('shipped') &         # GSI partition key
        Key('order_date').between('2024-01-01', '2024-01-31')
)

print(f"Shipped orders in January: {len(response['Items'])}")
Shipped orders in January: 847

-- DynamoDB queried the GSI partition for "shipped"
-- Then range-filtered by order_date within that partition
-- No full table scan — efficient even at millions of orders
-- GSI is automatically kept in sync with the main table

IndexName='status-date-index' — tells DynamoDB to use the GSI instead of the main table. The query syntax is identical to a regular table query — you just point it at a different index. DynamoDB bills GSI reads separately from main table reads.

Important GSI limitation: GSIs are eventually consistent by default. There's a brief lag (milliseconds to seconds) between writing to the main table and the GSI reflecting the change. For strongly consistent reads, you must read from the main table using the original key.

DynamoDB vs Redis — When to Use Which

Criteria Redis DynamoDB
Data persistence Optional — primarily in-memory Always — durable by design
Data size limit Limited by RAM (GBs) Unlimited (petabytes)
Latency Sub-ms (RAM access) Single-digit ms (SSD)
Data structures Rich: sorted sets, lists, pub/sub Key-value + document only
Management Self-managed or Redis Cloud Fully managed by AWS
Best for Cache, sessions, real-time counters Durable application data at scale

Teacher's Note

The biggest DynamoDB mistake I see is treating it like a SQL database with a weird syntax. DynamoDB rewards a completely different mental model: you are pre-computing your query results at write time by choosing the right keys. If you find yourself writing a Scan or adding a GSI for every new query, that's a signal your key design doesn't match your access patterns. The best time to think about DynamoDB key design is before you write the first item — not after you have 100 million of them.

Practice Questions — You're the Engineer

Scenario:

You are designing a DynamoDB table for a messaging app. Every query will be "get all messages in conversation X." The most important design decision is choosing which attribute DynamoDB uses to hash and decide which physical server stores an item — ensuring all messages from the same conversation are co-located. What is this attribute called in DynamoDB?


Scenario:

A junior developer writes code to find all users who signed up in the last 7 days. The users table has user_id as the only key. Their code reads every item in the table and filters by created_at in the application. They notice it gets slower as the user base grows and AWS bills are increasing unexpectedly. Which DynamoDB operation are they using, and why is it problematic?


Scenario:

Your DynamoDB orders table uses user_id as partition key and order_date as sort key. Your operations team now needs to query all orders with status "pending" across all users to process them in bulk. You cannot change the primary key of an existing table. What DynamoDB feature lets you query by status efficiently without a Scan?


Quiz — DynamoDB Design Decisions

Scenario:

You're designing a DynamoDB table for a messaging app. The most frequent query is: "get the last 50 messages in conversation X, sorted newest first." Messages have message_id, conversation_id, sender_id, body, and timestamp. What is the correct key design?

Scenario:

A developer stores a product price as 'price': 29.99 using a Python float in a DynamoDB put_item call. After fetching the item back, the price shows as 29.989999999999. Your finance team is reporting rounding errors in invoices. What is the fix?

Scenario:

Your team is debating DynamoDB billing modes. One engineer says "always use PAY_PER_REQUEST." Another says "PROVISIONED is cheaper." When is each correct?

Up Next · Lesson 14

Document Databases

A deep dive into how document stores work under the hood — BSON storage, indexing strategies, embedding vs referencing, and the aggregation pipeline that makes complex queries possible without joins.