NoSQL
Backup & Recovery
Every engineer knows backups are important. Far fewer have actually restored from one under pressure, at 2am, with a VP on the phone asking when the data will be back. A backup strategy that has never been tested is not a strategy — it is a hope. This lesson is about building something you can actually rely on when the worst happens.
Why NoSQL Backup Is Different
Relational databases have decades of mature backup tooling. NoSQL systems are more varied — a Cassandra cluster, a MongoDB replica set, and a DynamoDB table each require completely different backup approaches. Three concerns are common to all of them:
mongodump restored over a slow link could take 6 hours. Snapshot-based restore from the same cloud region can be under 10 minutes.Two key metrics every backup strategy must define upfront:
Method 1 — mongodump and mongorestore
mongodump is MongoDB's native logical backup tool. It connects to a running MongoDB instance and exports collections as BSON files. It is simple, portable, and works without any cloud infrastructure. The trade-off: for large databases it is slow, and restoring a large dump to a fresh instance takes significant time.
The scenario: You are the sole backend engineer at an early-stage startup. Your MongoDB database holds 3 months of user data — about 8GB. You have no cloud snapshot setup yet. You need a nightly backup script that runs from a secondary replica (to avoid hitting the primary), compresses the output, uploads it to S3, and retains the last 7 days. You need this running before you go to sleep tonight.
#!/bin/bash
set -euo pipefail # exit on error, unset vars, pipe failures
DATE=$(date +%Y-%m-%d)
BACKUP_DIR="/tmp/mongo-backup-$DATE"
S3_BUCKET="s3://myapp-backups/mongodb"
RETENTION_DAYS=7
echo "[$DATE] Starting mongodump from secondary..."
# --readPreference=secondary — dump from replica, not primary
# --gzip — compress BSON files inline (reduces size ~70%)
# --oplog — capture oplog entries for a point-in-time consistent backup
mongodump \
--uri="mongodb://backup_user:$MONGO_PASS@mongo-secondary:27017" \
--readPreference=secondary \
--gzip \
--oplog \
--out="$BACKUP_DIR"
echo "Dump complete. Uploading to S3..."
# Tar the directory and stream directly to S3 (no extra disk space needed)
tar -czf - "$BACKUP_DIR" | aws s3 cp - "$S3_BUCKET/backup-$DATE.tar.gz"
# Remove local temp files
rm -rf "$BACKUP_DIR"
# Delete S3 objects older than retention window
aws s3 ls "$S3_BUCKET/" \
| awk '{print $4}' \
| while read key; do
aws s3 rm "$S3_BUCKET/$key" \
--expires-before "$(date -d "-$RETENTION_DAYS days" +%Y-%m-%d)" \
2>/dev/null || true
done
echo "[$DATE] Backup complete. Stored at $S3_BUCKET/backup-$DATE.tar.gz"
[2025-03-10] Starting mongodump from secondary... 2025-03-10T02:00:01.441+0000 writing myapp.users to dump/myapp/users.bson 2025-03-10T02:00:04.112+0000 done dumping myapp.users (184203 documents) 2025-03-10T02:00:04.113+0000 writing myapp.orders to dump/myapp/orders.bson 2025-03-10T02:00:09.887+0000 done dumping myapp.orders (442891 documents) 2025-03-10T02:00:09.901+0000 writing captured oplog ✓ Dump complete. Uploading to S3... upload: - to s3://myapp-backups/mongodb/backup-2025-03-10.tar.gz [2025-03-10] Backup complete. Stored at s3://myapp-backups/mongodb/backup-2025-03-10.tar.gz Elapsed: 4m 12s | Size: 2.1GB (compressed from 8.3GB)
--readPreference=secondary
Directs mongodump to read from a replica set secondary. The backup scan reads every document in every collection — a full table scan that saturates disk I/O. Running this on the primary would degrade query performance for every user actively using the app during the backup window.
--oplog
Without --oplog, the dump is not point-in-time consistent — if writes arrive while the dump is running, some collections are captured before those writes and some after. The --oplog flag captures the oplog entries generated during the dump, allowing mongorestore to replay them and produce a consistent snapshot of a single moment in time.
tar -czf - "$BACKUP_DIR" | aws s3 cp -
Piping the tar output directly to the AWS CLI means the compressed archive is never written to disk locally — it streams straight to S3. On a server with limited disk space (common on small EC2 instances), this avoids needing to store both the dump directory and the compressed archive simultaneously.
Restoring from a mongodump Backup
Knowing how to take a backup matters. Knowing how to restore from one matters more. Here is the restore process for the backup we just created — including the oplog replay for point-in-time consistency.
The scenario: A developer accidentally ran a script that dropped the orders collection in production. The incident happened at 03:47 UTC. Your last backup was taken at 02:00 UTC and completed at 02:04 UTC. You need to restore the orders collection to its exact state at 02:04 UTC — the moment the backup completed — without touching any other collections.
# Step 1: download the backup from S3
aws s3 cp s3://myapp-backups/mongodb/backup-2025-03-10.tar.gz /tmp/
tar -xzf /tmp/backup-2025-03-10.tar.gz -C /tmp/
# Step 2: restore only the orders collection
# --nsInclude — restore only this namespace (db.collection)
# --oplogReplay — replay the captured oplog for consistency
# --drop — drop the existing (empty) collection before restore
mongorestore \
--uri="mongodb://admin:$MONGO_PASS@mongo-primary:27017" \
--nsInclude="myapp.orders" \
--oplogReplay \
--drop \
--gzip \
/tmp/mongo-backup-2025-03-10/
echo "Restore complete. Validating document count..."
# Step 3: sanity check — count should match pre-incident count
mongo myapp --eval "db.orders.countDocuments()"
2025-03-10T03:55:12+0000 restoring myapp.orders from orders.bson.gz 2025-03-10T03:55:18+0000 442891 document(s) restored successfully 2025-03-10T03:55:18+0000 replaying oplog — 1247 entries applied ✓ Restore complete. Validating document count... 442891 ✓ matches pre-incident count Total restore time: 6m 44s Orders collection live again at 04:01 UTC Incident window: 03:47 → 04:01 (14 minutes)
--nsInclude="myapp.orders"
Limits the restore to exactly one collection. Without this flag, mongorestore would restore every collection in the dump — overwriting current production data in users, products, and every other collection with their state from 02:04 UTC. That would undo 2 hours of legitimate writes across the entire database.
--oplogReplay
Replays the oplog entries captured during the original dump. This fast-forwards the restored data to the exact moment the backup completed — 02:04 UTC — ensuring the restored collection is consistent with the oplog-captured state rather than an mid-dump snapshot.
db.orders.countDocuments()
Always validate after a restore. A document count matching the pre-incident baseline is a fast sanity check that the restore completed without truncation. For critical data, follow up with spot-checks on specific known records before declaring the incident resolved.
Method 2 — Cloud Snapshots (Atlas / EBS)
Logical backups like mongodump are flexible but slow. For large databases, volume snapshots are faster by an order of magnitude — because they operate at the storage block level rather than reading every document through the database engine. MongoDB Atlas continuous cloud backups and AWS EBS snapshots both use this approach.
The scenario: Your team has migrated to MongoDB Atlas. Your database is now 200GB and growing. A mongodump takes 3 hours and restoration takes 5 hours — far beyond your 1-hour RTO. You are switching to Atlas continuous cloud backups with point-in-time recovery and need to verify the configuration via the Atlas API.
PROJECT_ID="abc123"
CLUSTER="prod-cluster"
API_BASE="https://cloud.mongodb.com/api/atlas/v1.0"
# Enable continuous cloud backups with 7-day point-in-time window
curl -s -u "$ATLAS_PUBLIC:$ATLAS_PRIVATE" --digest \
-X PATCH "$API_BASE/groups/$PROJECT_ID/clusters/$CLUSTER" \
-H "Content-Type: application/json" \
-d '{
"providerBackupEnabled": true,
"pitEnabled": true
}'
# List the most recent snapshots to verify backups are running
curl -s -u "$ATLAS_PUBLIC:$ATLAS_PRIVATE" --digest \
"$API_BASE/groups/$PROJECT_ID/clusters/$CLUSTER/backup/snapshots" \
| jq '.results[] | {id, createdAt, storageSizeBytes, status}'
{
"id": "snap_20250310_020000",
"createdAt": "2025-03-10T02:00:00Z",
"storageSizeBytes": 214748364800, // 200GB
"status": "completed"
}
{
"id": "snap_20250309_020000",
"createdAt": "2025-03-09T02:00:00Z",
"storageSizeBytes": 209715200000,
"status": "completed"
}
Continuous backup: ENABLED ✓
Point-in-time recovery window: 7 days ✓
Latest snapshot: 2025-03-10T02:00:00Z ✓pitEnabled: true
Point-in-time recovery means you can restore to any second within the past 7 days — not just to a scheduled snapshot. Atlas continuously captures oplog data between snapshots. If a developer drops a collection at 14:37:22 UTC, you can restore to 14:37:21 UTC — one second before the incident — recovering every write right up to the moment of the accident.
providerBackupEnabled: true
Enables Atlas Cloud Provider Snapshots — block-level snapshots of the underlying EBS volumes taken by AWS directly. Because they operate at the storage layer rather than the MongoDB layer, a 200GB snapshot completes in minutes rather than hours. Restore spins up a new EBS volume from the snapshot — typically under 10 minutes to a running cluster.
Backup for Cassandra — nodetool and Snapshots
Cassandra's backup approach is fundamentally different from MongoDB's. Cassandra uses an LSM-tree storage engine — data is written to immutable SSTable files and never modified in place. This makes backups simpler in one way: you can copy SSTables directly because they do not change after they are written. The challenge is consistency across a multi-node cluster.
The scenario: You are a platform engineer managing a 6-node Cassandra cluster storing IoT sensor data. You need to take a consistent snapshot across all nodes simultaneously and back it up to object storage. A snapshot taken one node at a time would capture different points in time — useless for a consistent restore.
SNAPSHOT_TAG="backup-$(date +%Y%m%d)"
NODES=("cass-node-1" "cass-node-2" "cass-node-3" "cass-node-4" "cass-node-5" "cass-node-6")
KEYSPACE="iot_sensors"
echo "Triggering snapshot on all nodes simultaneously..."
# Run nodetool snapshot on all nodes in parallel
# -t = tag name for the snapshot
for node in "${NODES[@]}"; do
ssh "$node" "nodetool snapshot -t $SNAPSHOT_TAG $KEYSPACE" &
done
wait # wait for all parallel snapshots to complete
echo "All nodes snapshotted. Uploading SSTables to S3..."
for node in "${NODES[@]}"; do
# Cassandra stores snapshots inside the data directory
ssh "$node" \
"find /var/lib/cassandra/data/$KEYSPACE -name 'snapshots/$SNAPSHOT_TAG' \
-exec tar -czf - {} \; \
| aws s3 cp - s3://iot-backups/cassandra/$node-$SNAPSHOT_TAG.tar.gz" &
done
wait
echo "Snapshot $SNAPSHOT_TAG complete across all nodes."
Triggering snapshot on all nodes simultaneously... [cass-node-1] Requested creating snapshot(s) for [iot_sensors] [cass-node-2] Requested creating snapshot(s) for [iot_sensors] [cass-node-3] Requested creating snapshot(s) for [iot_sensors] [cass-node-4] Requested creating snapshot(s) for [iot_sensors] [cass-node-5] Requested creating snapshot(s) for [iot_sensors] [cass-node-6] Requested creating snapshot(s) for [iot_sensors] All nodes snapshotted. Uploading SSTables to S3... Uploaded: cass-node-1-backup-20250310.tar.gz (34.2GB) Uploaded: cass-node-2-backup-20250310.tar.gz (33.9GB) ... (all 6 nodes) Snapshot backup-20250310 complete across all nodes. Total backup size: 204GB | Elapsed: 8m 21s
nodetool snapshot -t $SNAPSHOT_TAG
nodetool snapshot creates hard links to the current SSTable files on disk — it is essentially instant and does not block reads or writes. Because Cassandra SSTables are immutable, those hard links will continue to point to valid data even as new compactions create new SSTable files. The snapshot tag lets you identify and clean up a specific snapshot later without deleting current data files.
Running all nodes in parallel with & and wait
The & sends each SSH command to the background. wait blocks until all background jobs finish. This triggers the snapshot on all six nodes at as close to the same moment as possible — maximising consistency across the cluster. Running them sequentially would mean node-1 is snapshotted minutes before node-6, capturing different states.
Testing Your Recovery — The Most Important Step
A backup you have never restored from is not a backup — it is a hypothesis. The only way to know your backup works is to restore it, validate the data, and measure how long it took. This should be a scheduled, automated process — not something you discover is broken at 3am during an actual incident.
Recovery Testing Checklist — Run This Monthly
Teacher's Note
The most expensive database recovery I have ever seen was not caused by a missing backup — the backups existed. It was caused by a restore process that had never been run against that version of MongoDB on that OS, with those indexes, on hardware with that amount of RAM. The restore failed halfway through with an obscure error. Then the secondary backup failed too. They recovered by manually replaying application logs from a message queue. The lesson: the backup matters, but the restore test matters more. Run it. Time it. Fix what breaks. Repeat every month.
Practice Questions — You're the Engineer
Scenario:
Scenario:
mongodump runs for 4 minutes. During that window, your application is writing thousands of new orders per minute. Without a specific flag, the dump captures the users collection at 02:00 UTC and the orders collection at 02:04 UTC — two different points in time. A restore from this dump would produce an inconsistent state where some orders reference users that did not exist yet at the time the user collection was captured. What mongodump flag captures the oplog changes during the dump to produce a point-in-time consistent snapshot?
Scenario:
products collection at 11:42 UTC. Your last backup at 02:00 UTC included every collection in the ecommerce database. You need to restore only the products collection — not the users, orders, or any other collection, because those have had legitimate writes since 02:00 UTC that you cannot afford to lose. What mongorestore flag limits the restore to a single specified namespace?
Quiz — Backup & Recovery in Production
Scenario:
mongodump runs against the primary MongoDB node at 02:00 UTC. Since moving to a 3-node replica set last month, you have noticed that API response times spike from 12ms to 340ms every night during the backup window, and your on-call rotation is getting paged. Customers on the US East Coast are awake at 02:00 UTC and reporting slow checkouts. What is the correct fix?
Scenario:
mongodump now takes 6 hours and a full restore takes 9 hours. Your SLA requires a Recovery Time Objective of 30 minutes. You are in a production incident right now: a bad migration script has corrupted 3 collections and you need to restore them to their state from 2 hours ago. The mongodump approach will take 9 hours. What is the correct long-term architectural fix that would have made this incident recoverable within your SLA?
Scenario:
mongodump backups to S3 for 8 months. The cron job shows green in your monitoring dashboard every morning. A major incident occurs: an attacker deletes your database and you need to restore immediately. You download the latest backup, run mongorestore, and it fails with a version mismatch error — the dump was taken on MongoDB 5.0 but your replacement instance runs 6.0 with a different BSON format for certain index types. Recovery takes 11 hours instead of the planned 45 minutes. What is the fundamental backup strategy failure that caused this outcome?
Up Next · Lesson 36
Monitoring NoSQL Systems
Metrics, alerts, and the dashboards that tell you your database is about to have a bad day — before it does.