NO SQL Lesson 35 – Backup & Recovery | Dataplexa

Enterprise & Cloud · Lesson 35

Backup & Recovery

Every engineer knows backups are important. Far fewer have actually restored from one under pressure, at 2am, with a VP on the phone asking when the data will be back. A backup strategy that has never been tested is not a strategy — it is a hope. This lesson is about building something you can actually rely on when the worst happens.

Why NoSQL Backup Is Different

Relational databases have decades of mature backup tooling. NoSQL systems are more varied — a Cassandra cluster, a MongoDB replica set, and a DynamoDB table each require completely different backup approaches. Three concerns are common to all of them:

🔄

Consistency Across Nodes

Distributed clusters have multiple replicas. A backup taken node-by-node at different times captures different states. You need a consistent point-in-time snapshot across the entire cluster.

⚡

Impact on Live Traffic

Full backups on a node that is serving production queries compete for I/O, CPU, and network. Backup from a secondary replica to avoid impacting primary read/write performance.

⏱️

Recovery Time Objective

RTO is how long recovery takes. A 500GB mongodump restored over a slow link could take 6 hours. Snapshot-based restore from the same cloud region can be under 10 minutes.

Two key metrics every backup strategy must define upfront:

RPO — Recovery Point Objective

How much data can you afford to lose?

If you back up every 24 hours and lose the last backup, you lose up to 24 hours of data. For a payment system, RPO might be zero — meaning continuous replication, no data loss ever. For a blog, RPO might be 24 hours — a daily backup is fine.

RTO — Recovery Time Objective

How long can the system be down?

If your SLA requires 99.9% uptime, you can be down at most 8.7 hours per year. Your recovery process — finding the backup, transferring it, restoring, validating — must complete within that window under real conditions, not ideal lab conditions.

Method 1 — mongodump and mongorestore

mongodump is MongoDB's native logical backup tool. It connects to a running MongoDB instance and exports collections as BSON files. It is simple, portable, and works without any cloud infrastructure. The trade-off: for large databases it is slow, and restoring a large dump to a fresh instance takes significant time.

The scenario: You are the sole backend engineer at an early-stage startup. Your MongoDB database holds 3 months of user data — about 8GB. You have no cloud snapshot setup yet. You need a nightly backup script that runs from a secondary replica (to avoid hitting the primary), compresses the output, uploads it to S3, and retains the last 7 days. You need this running before you go to sleep tonight.

backup.sh — nightly mongodump to S3 with retention

#!/bin/bash
set -euo pipefail   # exit on error, unset vars, pipe failures

DATE=$(date +%Y-%m-%d)
BACKUP_DIR="/tmp/mongo-backup-$DATE"
S3_BUCKET="s3://myapp-backups/mongodb"
RETENTION_DAYS=7

echo "[$DATE] Starting mongodump from secondary..."

# --readPreference=secondary — dump from replica, not primary
# --gzip — compress BSON files inline (reduces size ~70%)
# --oplog — capture oplog entries for a point-in-time consistent backup
mongodump \
  --uri="mongodb://backup_user:$MONGO_PASS@mongo-secondary:27017" \
  --readPreference=secondary \
  --gzip \
  --oplog \
  --out="$BACKUP_DIR"

echo "Dump complete. Uploading to S3..."

# Tar the directory and stream directly to S3 (no extra disk space needed)
tar -czf - "$BACKUP_DIR" | aws s3 cp - "$S3_BUCKET/backup-$DATE.tar.gz"

# Remove local temp files
rm -rf "$BACKUP_DIR"

# Delete S3 objects older than retention window
aws s3 ls "$S3_BUCKET/" \
  | awk '{print $4}' \
  | while read key; do
      aws s3 rm "$S3_BUCKET/$key" \
        --expires-before "$(date -d "-$RETENTION_DAYS days" +%Y-%m-%d)" \
        2>/dev/null || true
    done

echo "[$DATE] Backup complete. Stored at $S3_BUCKET/backup-$DATE.tar.gz"

[2025-03-10] Starting mongodump from secondary...
2025-03-10T02:00:01.441+0000  writing myapp.users to dump/myapp/users.bson
2025-03-10T02:00:04.112+0000  done dumping myapp.users (184203 documents)
2025-03-10T02:00:04.113+0000  writing myapp.orders to dump/myapp/orders.bson
2025-03-10T02:00:09.887+0000  done dumping myapp.orders (442891 documents)
2025-03-10T02:00:09.901+0000  writing captured oplog  ✓
Dump complete. Uploading to S3...
upload: - to s3://myapp-backups/mongodb/backup-2025-03-10.tar.gz
[2025-03-10] Backup complete. Stored at s3://myapp-backups/mongodb/backup-2025-03-10.tar.gz
Elapsed: 4m 12s  |  Size: 2.1GB (compressed from 8.3GB)

--readPreference=secondary

Directs mongodump to read from a replica set secondary. The backup scan reads every document in every collection — a full table scan that saturates disk I/O. Running this on the primary would degrade query performance for every user actively using the app during the backup window.

--oplog

Without --oplog, the dump is not point-in-time consistent — if writes arrive while the dump is running, some collections are captured before those writes and some after. The --oplog flag captures the oplog entries generated during the dump, allowing mongorestore to replay them and produce a consistent snapshot of a single moment in time.

tar -czf - "$BACKUP_DIR" | aws s3 cp -

Piping the tar output directly to the AWS CLI means the compressed archive is never written to disk locally — it streams straight to S3. On a server with limited disk space (common on small EC2 instances), this avoids needing to store both the dump directory and the compressed archive simultaneously.

Restoring from a mongodump Backup

Knowing how to take a backup matters. Knowing how to restore from one matters more. Here is the restore process for the backup we just created — including the oplog replay for point-in-time consistency.

The scenario: A developer accidentally ran a script that dropped the orders collection in production. The incident happened at 03:47 UTC. Your last backup was taken at 02:00 UTC and completed at 02:04 UTC. You need to restore the orders collection to its exact state at 02:04 UTC — the moment the backup completed — without touching any other collections.

restore.sh — targeted collection restore with oplog replay

# Step 1: download the backup from S3
aws s3 cp s3://myapp-backups/mongodb/backup-2025-03-10.tar.gz /tmp/
tar -xzf /tmp/backup-2025-03-10.tar.gz -C /tmp/

# Step 2: restore only the orders collection
# --nsInclude — restore only this namespace (db.collection)
# --oplogReplay — replay the captured oplog for consistency
# --drop — drop the existing (empty) collection before restore
mongorestore \
  --uri="mongodb://admin:$MONGO_PASS@mongo-primary:27017" \
  --nsInclude="myapp.orders" \
  --oplogReplay \
  --drop \
  --gzip \
  /tmp/mongo-backup-2025-03-10/

echo "Restore complete. Validating document count..."

# Step 3: sanity check — count should match pre-incident count
mongo myapp --eval "db.orders.countDocuments()"

2025-03-10T03:55:12+0000  restoring myapp.orders from orders.bson.gz
2025-03-10T03:55:18+0000  442891 document(s) restored successfully
2025-03-10T03:55:18+0000  replaying oplog — 1247 entries applied  ✓
Restore complete. Validating document count...
442891   ✓  matches pre-incident count

Total restore time: 6m 44s
Orders collection live again at 04:01 UTC
Incident window: 03:47 → 04:01 (14 minutes)

--nsInclude="myapp.orders"

Limits the restore to exactly one collection. Without this flag, mongorestore would restore every collection in the dump — overwriting current production data in users, products, and every other collection with their state from 02:04 UTC. That would undo 2 hours of legitimate writes across the entire database.

--oplogReplay

Replays the oplog entries captured during the original dump. This fast-forwards the restored data to the exact moment the backup completed — 02:04 UTC — ensuring the restored collection is consistent with the oplog-captured state rather than an mid-dump snapshot.

db.orders.countDocuments()

Always validate after a restore. A document count matching the pre-incident baseline is a fast sanity check that the restore completed without truncation. For critical data, follow up with spot-checks on specific known records before declaring the incident resolved.

Method 2 — Cloud Snapshots (Atlas / EBS)

Logical backups like mongodump are flexible but slow. For large databases, volume snapshots are faster by an order of magnitude — because they operate at the storage block level rather than reading every document through the database engine. MongoDB Atlas continuous cloud backups and AWS EBS snapshots both use this approach.

mongodump / mongorestore

✓Works on any MongoDB instance

✓Restore individual collections

✓Portable — move between cloud providers

✗Slow on large databases (>50GB)

✗High I/O impact during dump

✗Restore time scales with data size

Volume Snapshots (Atlas / EBS)

✓Near-instant snapshot (incremental blocks)

✓Minimal I/O impact on live node

✓Fast restore — spin up new volume

✗Cloud-provider specific

✗Cannot restore a single collection

✗Requires managed service or manual EBS scripting

The scenario: Your team has migrated to MongoDB Atlas. Your database is now 200GB and growing. A mongodump takes 3 hours and restoration takes 5 hours — far beyond your 1-hour RTO. You are switching to Atlas continuous cloud backups with point-in-time recovery and need to verify the configuration via the Atlas API.

Atlas API — enable continuous backup + verify snapshot schedule

PROJECT_ID="abc123"
CLUSTER="prod-cluster"
API_BASE="https://cloud.mongodb.com/api/atlas/v1.0"

# Enable continuous cloud backups with 7-day point-in-time window
curl -s -u "$ATLAS_PUBLIC:$ATLAS_PRIVATE" --digest \
  -X PATCH "$API_BASE/groups/$PROJECT_ID/clusters/$CLUSTER" \
  -H "Content-Type: application/json" \
  -d '{
    "providerBackupEnabled": true,
    "pitEnabled": true
  }'

# List the most recent snapshots to verify backups are running
curl -s -u "$ATLAS_PUBLIC:$ATLAS_PRIVATE" --digest \
  "$API_BASE/groups/$PROJECT_ID/clusters/$CLUSTER/backup/snapshots" \
  | jq '.results[] | {id, createdAt, storageSizeBytes, status}'

{
  "id": "snap_20250310_020000",
  "createdAt": "2025-03-10T02:00:00Z",
  "storageSizeBytes": 214748364800,   // 200GB
  "status": "completed"
}
{
  "id": "snap_20250309_020000",
  "createdAt": "2025-03-09T02:00:00Z",
  "storageSizeBytes": 209715200000,
  "status": "completed"
}

Continuous backup: ENABLED  ✓
Point-in-time recovery window: 7 days  ✓
Latest snapshot: 2025-03-10T02:00:00Z  ✓

pitEnabled: true

Point-in-time recovery means you can restore to any second within the past 7 days — not just to a scheduled snapshot. Atlas continuously captures oplog data between snapshots. If a developer drops a collection at 14:37:22 UTC, you can restore to 14:37:21 UTC — one second before the incident — recovering every write right up to the moment of the accident.

providerBackupEnabled: true

Enables Atlas Cloud Provider Snapshots — block-level snapshots of the underlying EBS volumes taken by AWS directly. Because they operate at the storage layer rather than the MongoDB layer, a 200GB snapshot completes in minutes rather than hours. Restore spins up a new EBS volume from the snapshot — typically under 10 minutes to a running cluster.

Backup for Cassandra — nodetool and Snapshots

Cassandra's backup approach is fundamentally different from MongoDB's. Cassandra uses an LSM-tree storage engine — data is written to immutable SSTable files and never modified in place. This makes backups simpler in one way: you can copy SSTables directly because they do not change after they are written. The challenge is consistency across a multi-node cluster.

The scenario: You are a platform engineer managing a 6-node Cassandra cluster storing IoT sensor data. You need to take a consistent snapshot across all nodes simultaneously and back it up to object storage. A snapshot taken one node at a time would capture different points in time — useless for a consistent restore.

bash — Cassandra cluster-wide snapshot + S3 upload

SNAPSHOT_TAG="backup-$(date +%Y%m%d)"
NODES=("cass-node-1" "cass-node-2" "cass-node-3" "cass-node-4" "cass-node-5" "cass-node-6")
KEYSPACE="iot_sensors"

echo "Triggering snapshot on all nodes simultaneously..."

# Run nodetool snapshot on all nodes in parallel
# -t = tag name for the snapshot
for node in "${NODES[@]}"; do
  ssh "$node" "nodetool snapshot -t $SNAPSHOT_TAG $KEYSPACE" &
done
wait   # wait for all parallel snapshots to complete

echo "All nodes snapshotted. Uploading SSTables to S3..."

for node in "${NODES[@]}"; do
  # Cassandra stores snapshots inside the data directory
  ssh "$node" \
    "find /var/lib/cassandra/data/$KEYSPACE -name 'snapshots/$SNAPSHOT_TAG' \
     -exec tar -czf - {} \; \
     | aws s3 cp - s3://iot-backups/cassandra/$node-$SNAPSHOT_TAG.tar.gz" &
done
wait

echo "Snapshot $SNAPSHOT_TAG complete across all nodes."

Triggering snapshot on all nodes simultaneously...
[cass-node-1] Requested creating snapshot(s) for [iot_sensors]
[cass-node-2] Requested creating snapshot(s) for [iot_sensors]
[cass-node-3] Requested creating snapshot(s) for [iot_sensors]
[cass-node-4] Requested creating snapshot(s) for [iot_sensors]
[cass-node-5] Requested creating snapshot(s) for [iot_sensors]
[cass-node-6] Requested creating snapshot(s) for [iot_sensors]
All nodes snapshotted. Uploading SSTables to S3...
Uploaded: cass-node-1-backup-20250310.tar.gz  (34.2GB)
Uploaded: cass-node-2-backup-20250310.tar.gz  (33.9GB)
... (all 6 nodes)
Snapshot backup-20250310 complete across all nodes.
Total backup size: 204GB  |  Elapsed: 8m 21s

nodetool snapshot -t $SNAPSHOT_TAG

nodetool snapshot creates hard links to the current SSTable files on disk — it is essentially instant and does not block reads or writes. Because Cassandra SSTables are immutable, those hard links will continue to point to valid data even as new compactions create new SSTable files. The snapshot tag lets you identify and clean up a specific snapshot later without deleting current data files.

Running all nodes in parallel with & and wait

The & sends each SSH command to the background. wait blocks until all background jobs finish. This triggers the snapshot on all six nodes at as close to the same moment as possible — maximising consistency across the cluster. Running them sequentially would mean node-1 is snapshotted minutes before node-6, capturing different states.

Testing Your Recovery — The Most Important Step

A backup you have never restored from is not a backup — it is a hypothesis. The only way to know your backup works is to restore it, validate the data, and measure how long it took. This should be a scheduled, automated process — not something you discover is broken at 3am during an actual incident.

Recovery Testing Checklist — Run This Monthly

Restore to an isolated environment

Never test a restore on production. Spin up a dedicated restore host in a separate VPC with no outbound access to production systems. This also validates your restore process works on a cold machine.

Measure and record elapsed time

Log how long each step takes: download from S3, decompress, mongorestore, validation. This is your real RTO — not a theoretical estimate. If it takes 4 hours, update your incident runbook and your SLA commitments accordingly.

Validate data integrity, not just document count

Run application-level smoke tests against the restored instance. Count documents, check specific known records by ID, verify index existence, and run a sample of the queries your application uses most frequently.

Document and automate

The engineer who set up the backup should not be the only one who knows how to run the restore. Write a runbook. Automate the test restore as a monthly CI job. Alert if the job fails or if no successful backup exists within the past 25 hours.

Teacher's Note

The most expensive database recovery I have ever seen was not caused by a missing backup — the backups existed. It was caused by a restore process that had never been run against that version of MongoDB on that OS, with those indexes, on hardware with that amount of RAM. The restore failed halfway through with an obscure error. Then the secondary backup failed too. They recovered by manually replaying application logs from a message queue. The lesson: the backup matters, but the restore test matters more. Run it. Time it. Fix what breaks. Repeat every month.

Practice Questions — You're the Engineer

Scenario:

During a planning meeting, your CTO asks: "If we lost our database right now, how much data would we lose?" Your current backup runs once every 6 hours. If the database fails 5 hours and 59 minutes after the last backup completed, you would lose nearly 6 hours of user orders, transactions, and activity. Your CTO wants this number reduced to a maximum of 15 minutes. Which backup metric is your CTO asking you to improve?

Scenario:

Your nightly mongodump runs for 4 minutes. During that window, your application is writing thousands of new orders per minute. Without a specific flag, the dump captures the users collection at 02:00 UTC and the orders collection at 02:04 UTC — two different points in time. A restore from this dump would produce an inconsistent state where some orders reference users that did not exist yet at the time the user collection was captured. What mongodump flag captures the oplog changes during the dump to produce a point-in-time consistent snapshot?

Scenario:

A developer accidentally dropped the products collection at 11:42 UTC. Your last backup at 02:00 UTC included every collection in the ecommerce database. You need to restore only the products collection — not the users, orders, or any other collection, because those have had legitimate writes since 02:00 UTC that you cannot afford to lose. What mongorestore flag limits the restore to a single specified namespace?

Quiz — Backup & Recovery in Production

Up Next · Lesson 36

Monitoring NoSQL Systems

Metrics, alerts, and the dashboards that tell you your database is about to have a bad day — before it does.

← Previous Course Index Next →