Mango DBLesson 35 – Backup & Restore | Dataplexa

Backup & Restore

Data loss is not a question of if — it is a question of when. Accidental deletes, hardware failures, software bugs, ransomware, and human error all cause data loss in real production systems. A database without a tested backup strategy is a liability waiting to materialise. MongoDB provides several backup mechanisms suited to different recovery needs: mongodump/mongorestore for logical backups of individual collections or databases, mongodump with oplog for point-in-time recovery, filesystem snapshots for fast full-cluster backups, and MongoDB Atlas Backup for automated cloud-managed snapshots with continuous oplog replay. This lesson covers all four approaches, when to use each, and how to implement a robust backup strategy in PyMongo — including verifying that backups are actually restorable.

1. mongodump — Logical Backups

mongodump is a command-line tool that reads documents from a running MongoDB instance and writes them to BSON files on disk. It produces a logical backup — a portable, human-inspectable directory of files that can be restored to any MongoDB instance regardless of the original hardware or operating system. It is the simplest backup tool and the right choice for individual collection exports, development snapshots, and migration between environments.

# mongodump — logical backups from Python using subprocess

import subprocess
import os
from datetime import datetime, timezone
from pathlib import Path

# Backup directory with timestamp
BACKUP_ROOT = Path("/var/backups/mongodb")
timestamp   = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
backup_dir  = BACKUP_ROOT / f"dataplexa_{timestamp}"

def run_mongodump(uri: str, backup_path: Path,
                  db: str = None, collection: str = None) -> bool:
    """Run mongodump and return True on success."""
    cmd = [
        "mongodump",
        f"--uri={uri}",
        f"--out={backup_path}",
        "--gzip",            # compress each BSON file with gzip
    ]
    if db:
        cmd += ["--db", db]
    if collection:
        cmd += ["--collection", collection]

    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.returncode == 0

# Backup 1: full database
print("Backup scenarios:\n")
print("1. Full database backup:")
cmd_full = (
    f"mongodump --uri=mongodb://localhost:27017/ "
    f"--db=dataplexa "
    f"--out=/var/backups/mongodb/dataplexa_{timestamp} "
    f"--gzip"
)
print(f"   {cmd_full}")
print(f"   Output: dataplexa_{timestamp}/dataplexa/")
print(f"     ├── users.bson.gz")
print(f"     ├── users.metadata.json")
print(f"     ├── products.bson.gz")
print(f"     ├── products.metadata.json")
print(f"     ├── orders.bson.gz")
print(f"     └── ...")

# Backup 2: single collection
print("\n2. Single collection backup (orders only):")
cmd_coll = (
    f"mongodump --uri=mongodb://localhost:27017/ "
    f"--db=dataplexa --collection=orders "
    f"--out=/var/backups/mongodb/orders_{timestamp} "
    f"--gzip"
)
print(f"   {cmd_coll}")

# Backup 3: filtered — only delivered orders
print("\n3. Filtered backup — delivered orders only:")
cmd_query = (
    f'mongodump --uri=mongodb://localhost:27017/ '
    f'--db=dataplexa --collection=orders '
    f'--query=\'{{"status":"delivered"}}\' '
    f'--out=/var/backups/mongodb/orders_delivered_{timestamp} '
    f'--gzip'
)
print(f"   {cmd_query}")

# Backup metadata — track what was backed up
import json
metadata = {
    "backup_id":   f"bkp_{timestamp}",
    "timestamp":   datetime.now(timezone.utc).isoformat(),
    "type":        "mongodump",
    "database":    "dataplexa",
    "collections": ["users", "products", "orders", "order_items", "reviews"],
    "compressed":  True,
    "uri":         "mongodb://localhost:27017/",
}
print(f"\nBackup metadata record:")
for k, v in metadata.items():
    print(f"  {k:14}  {v}")

Backup scenarios:

1. Full database backup:
mongodump --uri=mongodb://localhost:27017/ --db=dataplexa --out=/var/backups/mongodb/dataplexa_20240401_140522 --gzip
Output: dataplexa_20240401_140522/dataplexa/
├── users.bson.gz
├── users.metadata.json
├── products.bson.gz
├── products.metadata.json
├── orders.bson.gz
└── ...

2. Single collection backup (orders only):
mongodump --uri=mongodb://localhost:27017/ --db=dataplexa --collection=orders --out=/var/backups/mongodb/orders_20240401_140522 --gzip

3. Filtered backup — delivered orders only:
mongodump --uri=mongodb://localhost:27017/ --db=dataplexa --collection=orders --query='{"status":"delivered"}' --out=/var/backups/mongodb/orders_delivered_20240401_140522 --gzip

Backup metadata record:
backup_id bkp_20240401_140522
timestamp 2024-04-01T14:05:22+00:00
type mongodump
database dataplexa
collections ['users', 'products', 'orders', 'order_items', 'reviews']
compressed True
uri mongodb://localhost:27017/

--gzip compresses each BSON file individually — on typical MongoDB data this reduces backup size by 60–80% with no loss of fidelity
mongodump does not lock the database — it reads a consistent snapshot of each collection but operations continue during the backup. This means the backup is not guaranteed to be consistent across collections if writes occur during the dump
Always record backup metadata (timestamp, collections, version) in a separate tracking document or file — without it, knowing which backup to restore becomes a manual investigation

2. mongorestore — Restoring from a Dump

mongorestore reads BSON files produced by mongodump and inserts them back into a MongoDB instance. It can restore a full database, a single collection, or even a subset of documents from a backup. The most important option is --drop — it drops the existing collection before restoring, preventing duplicates when restoring to a running database.

# mongorestore — restoring from a dump

import subprocess
from pathlib import Path
from pymongo import MongoClient

BACKUP_DIR = Path("/var/backups/mongodb/dataplexa_20240401_140522")

# Restore scenario reference
print("mongorestore scenarios:\n")

# Scenario 1: restore full database to same instance
print("1. Restore full database (same instance, drop existing collections):")
print("   mongorestore \\")
print("     --uri=mongodb://localhost:27017/ \\")
print("     --db=dataplexa \\")
print("     --drop \\")           # drop collection before inserting
print("     --gzip \\")           # input files are gzip-compressed
print(f"     {BACKUP_DIR}/dataplexa/")

# Scenario 2: restore to a different database (e.g. staging)
print("\n2. Restore to a different database (dataplexa → dataplexa_staging):")
print("   mongorestore \\")
print("     --uri=mongodb://localhost:27017/ \\")
print("     --db=dataplexa_staging \\")   # restore into new DB name
print("     --drop --gzip \\")
print(f"     {BACKUP_DIR}/dataplexa/")

# Scenario 3: restore single collection only
print("\n3. Restore orders collection only:")
print("   mongorestore \\")
print("     --uri=mongodb://localhost:27017/ \\")
print("     --db=dataplexa \\")
print("     --collection=orders \\")
print("     --drop --gzip \\")
print(f"     {BACKUP_DIR}/dataplexa/orders.bson.gz")

# Scenario 4: restore with index rebuild
print("\n4. Restore without indexes then rebuild (faster for large restores):")
print("   mongorestore --noIndexRestore ...  # skip index creation during restore")
print("   # Then rebuild indexes afterwards:")

client = MongoClient("mongodb://localhost:27017/")
db     = client["dataplexa"]

# After restore — verify document counts match expected
print("\nPost-restore verification:")
expected = {
    "users":    5,
    "products": 7,
    "orders":   7,
    "reviews":  5,
}
all_ok = True
for coll_name, expected_count in expected.items():
    actual = db[coll_name].count_documents({})
    status = "✓" if actual == expected_count else "✗"
    print(f"  {status} {coll_name:12}  expected: {expected_count}  "
          f"actual: {actual}")
    if actual != expected_count:
        all_ok = False

print(f"\nRestore verification: {'PASSED' if all_ok else 'FAILED — investigate'}")

mongorestore scenarios:

1. Restore full database (same instance, drop existing collections):
mongorestore \
--uri=mongodb://localhost:27017/ \
--db=dataplexa \
--drop \
--gzip \
/var/backups/mongodb/dataplexa_20240401_140522/dataplexa/

2. Restore to a different database (dataplexa → dataplexa_staging):
mongorestore \
--uri=mongodb://localhost:27017/ \
--db=dataplexa_staging \
--drop --gzip \
/var/backups/mongodb/dataplexa_20240401_140522/dataplexa/

3. Restore orders collection only:
mongorestore \
--uri=mongodb://localhost:27017/ \
--db=dataplexa \
--collection=orders \
--drop --gzip \
/var/backups/mongodb/dataplexa_20240401_140522/dataplexa/orders.bson.gz

4. Restore without indexes then rebuild (faster for large restores):
mongorestore --noIndexRestore ... # skip index creation during restore

Post-restore verification:
✓ users expected: 5 actual: 5
✓ products expected: 7 actual: 7
✓ orders expected: 7 actual: 7
✓ reviews expected: 5 actual: 5

Restore verification: PASSED

Always use --drop when restoring to a database that already has data — without it, mongorestore inserts documents alongside existing ones, creating duplicates for any document whose _id already exists
For large restores, --noIndexRestore significantly speeds up the import — rebuild indexes after the data is loaded rather than maintaining them on every insert during the restore
Always run a post-restore verification — check document counts, spot-check key documents, and run application smoke tests before declaring a restore successful

3. Point-in-Time Recovery with Oplog

A nightly mongodump captures the database at one point in time — but what if you need to recover to 2 hours before the nightly backup, or undo an accidental bulk delete that happened 20 minutes ago? Oplog-based point-in-time recovery solves this. You restore the last full backup and then replay the oplog up to the exact timestamp just before the damaging event.

# Point-in-time recovery — oplog backup and replay

import subprocess
from datetime import datetime, timezone
from pathlib import Path
from pymongo import MongoClient
from bson import Timestamp

BACKUP_DIR = Path("/var/backups/mongodb")
client     = MongoClient("mongodb://localhost:27017/")

# STEP 1: capture the oplog alongside the data dump
# --oplog captures all oplog entries that occur DURING the dump
# This makes the dump consistent — a single point in time
print("Step 1: Full backup WITH oplog capture:")
print("  mongodump \\")
print("    --uri=mongodb://localhost:27017/ \\")
print("    --oplog \\")       # capture oplog entries during dump
print("    --gzip \\")
print("    --out=/var/backups/mongodb/full_with_oplog_20240401_020000/")
print("\n  Output includes: oplog.bson.gz in the backup root")
print("  This file contains all operations that ran during the dump")
print("  Together: base data + oplog = consistent point-in-time snapshot\n")

# STEP 2: replay oplog to a specific point in time
# Find the timestamp just before the damaging event

# Example: accidental delete happened at 2024-04-01 14:30:00 UTC
# We want to recover to 14:29:59 — one second before the delete

accident_time = datetime(2024, 4, 1, 14, 29, 59, tzinfo=timezone.utc)
# Convert to BSON Timestamp (seconds since epoch, increment)
oplog_ts = Timestamp(int(accident_time.timestamp()), 0)

print("Step 2: Restore base backup + replay oplog up to accident time:")
print("  # 2a: restore the base dump")
print("  mongorestore \\")
print("    --uri=mongodb://localhost:27017/ \\")
print("    --drop --gzip \\")
print("    --oplogReplay \\")  # replay the oplog.bson captured in step 1
print("    /var/backups/mongodb/full_with_oplog_20240401_020000/\n")

print("  # 2b: apply continuous oplog backup up to the accident timestamp")
print("  mongorestore \\")
print("    --uri=mongodb://localhost:27017/ \\")
print(f"    --oplogReplay \\")
print(f"    --oplogLimit={int(accident_time.timestamp())}:0 \\")
print("    /var/backups/mongodb/oplog_continuous/oplog.bson\n")

# Continuous oplog backup — copy the oplog regularly
print("Step 3: Continuous oplog backup strategy:")
strategy = [
    ("Frequency",  "Copy oplog every 15–60 minutes to a backup location"),
    ("Retention",  "Keep oplog copies for at least as long as your RPO window"),
    ("RPO",        "Recovery Point Objective — how much data loss is acceptable"),
    ("RTO",        "Recovery Time Objective — how long recovery can take"),
    ("Validation", "Test the restore + oplog replay process monthly"),
]
for label, desc in strategy:
    print(f"  {label:12}  {desc}")

print(f"\nAccident timestamp for oplog limit: {int(accident_time.timestamp())}:0")
print(f"Readable:                           {accident_time.isoformat()}")

Step 1: Full backup WITH oplog capture:
mongodump \
--uri=mongodb://localhost:27017/ \
--oplog \
--gzip \
--out=/var/backups/mongodb/full_with_oplog_20240401_020000/

Output includes: oplog.bson.gz in the backup root
This file contains all operations that ran during the dump
Together: base data + oplog = consistent point-in-time snapshot

Step 2: Restore base backup + replay oplog up to accident time:
# 2a: restore the base dump
mongorestore \
--uri=mongodb://localhost:27017/ \
--drop --gzip \
--oplogReplay \
/var/backups/mongodb/full_with_oplog_20240401_020000/

# 2b: apply continuous oplog backup up to the accident timestamp
mongorestore \
--uri=mongodb://localhost:27017/ \
--oplogReplay \
--oplogLimit=1711981799:0 \
/var/backups/mongodb/oplog_continuous/oplog.bson

Step 3: Continuous oplog backup strategy:
Frequency Copy oplog every 15–60 minutes to a backup location
Retention Keep oplog copies for at least as long as your RPO window
RPO Recovery Point Objective — how much data loss is acceptable
RTO Recovery Time Objective — how long recovery can take
Validation Test the restore + oplog replay process monthly

Accident timestamp for oplog limit: 1711981799:0
Readable: 2024-04-01T14:29:59+00:00

--oplog in mongodump and --oplogReplay in mongorestore must be used together — they are the mechanism that makes a dump consistent to a single point in time rather than a rolling snapshot
--oplogLimit takes a Unix timestamp followed by a colon and an increment — set the increment to 0 to replay all operations up to that second
Oplog-based PITR only works on replica sets — standalone servers have no oplog. This is one of the strongest reasons to always run MongoDB as a replica set, even in development

4. Automated Backup Strategy from PyMongo

A backup that runs manually is a backup that eventually gets forgotten. Automating the backup process — scheduling, rotation, verification, and alerting — transforms backup from a risky manual task into a reliable operational process. This section shows a complete backup automation pattern implementable from Python.

# Automated backup strategy — scheduling, rotation, and verification

import subprocess
import json
import hashlib
from datetime import datetime, timezone, timedelta
from pathlib import Path
from pymongo import MongoClient

BACKUP_ROOT    = Path("/var/backups/mongodb")
MONGO_URI      = "mongodb://localhost:27017/"
RETENTION_DAYS = 7    # keep backups for 7 days
client         = MongoClient(MONGO_URI)
db             = client["dataplexa"]

def create_backup() -> dict:
    """Create a timestamped compressed backup and return metadata."""
    ts         = datetime.now(timezone.utc)
    backup_id  = f"bkp_{ts.strftime('%Y%m%d_%H%M%S')}"
    backup_dir = BACKUP_ROOT / backup_id
    backup_dir.mkdir(parents=True, exist_ok=True)

    cmd = [
        "mongodump",
        f"--uri={MONGO_URI}",
        "--db=dataplexa",
        "--oplog",
        "--gzip",
        f"--out={backup_dir}",
    ]
    result = subprocess.run(cmd, capture_output=True, text=True)
    success = (result.returncode == 0)

    metadata = {
        "backup_id":   backup_id,
        "timestamp":   ts.isoformat(),
        "success":     success,
        "size_bytes":  sum(f.stat().st_size for f in backup_dir.rglob("*") if f.is_file()),
        "stderr":      result.stderr[-500:] if not success else "",
    }
    # Save metadata alongside the backup
    (backup_dir / "backup_metadata.json").write_text(
        json.dumps(metadata, indent=2)
    )
    return metadata

def verify_backup(backup_id: str) -> bool:
    """
    Restore the backup into a temporary database and
    verify document counts match the source.
    """
    backup_dir = BACKUP_ROOT / backup_id
    temp_db    = f"verify_{backup_id}"

    # Restore to a temporary database
    cmd = [
        "mongorestore",
        f"--uri={MONGO_URI}",
        f"--db={temp_db}",
        "--drop", "--gzip",
        str(backup_dir / "dataplexa"),
    ]
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.returncode != 0:
        return False

    # Compare document counts
    temp  = client[temp_db]
    ok    = True
    for coll in ["users", "products", "orders", "reviews"]:
        src_count  = db[coll].count_documents({})
        test_count = temp[coll].count_documents({})
        if src_count != test_count:
            ok = False

    # Clean up temp database
    client.drop_database(temp_db)
    return ok

def rotate_old_backups(retention_days: int):
    """Delete backups older than retention_days."""
    cutoff = datetime.now(timezone.utc) - timedelta(days=retention_days)
    deleted = 0
    for meta_file in BACKUP_ROOT.glob("*/backup_metadata.json"):
        meta = json.loads(meta_file.read_text())
        bkp_time = datetime.fromisoformat(meta["timestamp"])
        if bkp_time < cutoff:
            import shutil
            shutil.rmtree(meta_file.parent)
            deleted += 1
    return deleted

# Simulate a full backup cycle
print("Automated backup cycle:\n")
print("1. create_backup()    → mongodump + oplog + metadata saved")
print("2. verify_backup(id)  → restore to temp DB + count verification")
print("3. rotate_old_backups → delete backups older than 7 days")
print("4. alert if any step fails\n")

# Backup schedule recommendation
print("Recommended backup schedule:")
schedule = [
    ("Daily 02:00 UTC",   "Full mongodump --oplog of all databases"),
    ("Every 15 minutes",  "Continuous oplog copy to backup storage"),
    ("Weekly Sunday",     "Full backup + restore verification test"),
    ("Monthly",           "Full disaster recovery drill — restore to staging"),
]
for time, action in schedule:
    print(f"  {time:22}  {action}")

Automated backup cycle:

1. create_backup() → mongodump + oplog + metadata saved
2. verify_backup(id) → restore to temp DB + count verification
3. rotate_old_backups → delete backups older than 7 days
4. alert if any step fails

Recommended backup schedule:
Daily 02:00 UTC Full mongodump --oplog of all databases
Every 15 minutes Continuous oplog copy to backup storage
Weekly Sunday Full backup + restore verification test
Monthly Full disaster recovery drill — restore to staging

A backup that has never been restored is not a backup — it is a hypothesis. Run a full restore verification at least weekly and a complete disaster recovery drill monthly
Store backups on separate infrastructure from the MongoDB servers — a fire, hardware failure, or ransomware attack that takes down your servers should not also destroy your backups
Implement alerting on backup failures immediately — a silent backup failure leaves you with no recovery option until the next successful backup

5. MongoDB Atlas Backup

For Atlas-hosted clusters, MongoDB provides a fully managed backup service — Cloud Backup — that takes automated snapshots of the entire cluster without any manual configuration. Snapshots are stored in cloud object storage and can be restored with a few clicks or a single API call. The Atlas continuous backup option extends this with oplog-based point-in-time recovery to any second within the retention window.

# Atlas Backup — querying backup status and restore via Atlas Admin API

import requests
from datetime import datetime, timezone

# Atlas API credentials (use environment variables in production)
PUBLIC_KEY  = "your_atlas_public_key"
PRIVATE_KEY = "your_atlas_private_key"
PROJECT_ID  = "your_project_id"
CLUSTER     = "DataplexaCluster"

BASE_URL    = f"https://cloud.mongodb.com/api/atlas/v1.0"

# Atlas backup approaches
print("MongoDB Atlas Backup options:\n")
options = [
    {
        "name":        "Cloud Backup (Snapshots)",
        "frequency":   "Hourly, daily, weekly, monthly",
        "retention":   "Configurable 1–365 days",
        "restore":     "Full cluster restore or single collection restore",
        "pitr":        "No — only restores to snapshot times",
        "best_for":    "Standard backup requirement, restore to known good state",
    },
    {
        "name":        "Continuous Cloud Backup",
        "frequency":   "Continuous oplog capture",
        "retention":   "Up to 35 days of point-in-time recovery",
        "restore":     "Restore to any second within retention window",
        "pitr":        "Yes — any second within 35 days",
        "best_for":    "Financial data, low RPO requirement, accidental write recovery",
    },
]
for opt in options:
    print(f"  {opt['name']}")
    for k, v in opt.items():
        if k != "name":
            print(f"    {k:12}  {v}")
    print()

# Atlas API: list recent snapshots
print("Atlas API — list snapshots:")
print(f"  GET {BASE_URL}/groups/{PROJECT_ID}/clusters/{CLUSTER}/backup/snapshots")
print()

# Simulated snapshot list response
snapshots = [
    {"id": "snap_001", "createdAt": "2024-04-01T02:00:00Z",
     "status": "completed", "storageSizeBytes": 52428800, "type": "scheduled"},
    {"id": "snap_002", "createdAt": "2024-04-02T02:00:00Z",
     "status": "completed", "storageSizeBytes": 54525952, "type": "scheduled"},
    {"id": "snap_003", "createdAt": "2024-04-03T02:00:00Z",
     "status": "completed", "storageSizeBytes": 53477376, "type": "scheduled"},
]
print("Recent snapshots:")
print(f"  {'ID':10}  {'Created':22}  {'Status':10}  {'Size':10}  Type")
print(f"  {'─'*10}  {'─'*22}  {'─'*10}  {'─'*10}  {'─'*10}")
for s in snapshots:
    size_mb = s["storageSizeBytes"] / (1024 * 1024)
    print(f"  {s['id']:10}  {s['createdAt']:22}  "
          f"{s['status']:10}  {size_mb:>7.1f} MB  {s['type']}")

# Atlas API: trigger a point-in-time restore
restore_time = "2024-04-01T14:29:59Z"   # one second before the accident
print(f"\nAtlas API — trigger PITR restore to {restore_time}:")
print(f"  POST {BASE_URL}/groups/{PROJECT_ID}/clusters/{CLUSTER}/backup/restoreJobs")
restore_payload = {
    "deliveryType":      "automated",
    "targetClusterName": "DataplexaCluster",
    "targetGroupId":     PROJECT_ID,
    "oplogTs":           1711981799,   # Unix timestamp
    "oplogInc":          0,
}
print(f"  Payload: {restore_payload}")

MongoDB Atlas Backup options:

Cloud Backup (Snapshots)
frequency Hourly, daily, weekly, monthly
retention Configurable 1–365 days
restore Full cluster restore or single collection restore
pitr No — only restores to snapshot times
best_for Standard backup requirement, restore to known good state

Continuous Cloud Backup
frequency Continuous oplog capture
retention Up to 35 days of point-in-time recovery
restore Restore to any second within retention window
pitr Yes — any second within 35 days
best_for Financial data, low RPO requirement, accidental write recovery

Atlas API — list snapshots:
GET https://cloud.mongodb.com/api/atlas/v1.0/groups/.../clusters/DataplexaCluster/backup/snapshots

Recent snapshots:
ID Created Status Size Type
────────── ────────────────────── ────────── ────────── ──────────
snap_001 2024-04-01T02:00:00Z completed 50.0 MB scheduled
snap_002 2024-04-02T02:00:00Z completed 52.0 MB scheduled
snap_003 2024-04-03T02:00:00Z completed 51.0 MB scheduled

Atlas API — trigger PITR restore to 2024-04-01T14:29:59Z:
POST https://cloud.mongodb.com/api/atlas/v1.0/groups/.../clusters/DataplexaCluster/backup/restoreJobs
Payload: {'deliveryType': 'automated', 'targetClusterName': 'DataplexaCluster', ...}

Atlas Continuous Cloud Backup stores snapshots in the same cloud region as your cluster — always also configure cross-region backup copies for genuine disaster recovery protection
Snapshot restores are non-destructive in Atlas — you restore to a new cluster and verify before switching traffic, giving you a safety net if the restore reveals unexpected issues
Atlas backup costs are based on storage consumed — enable snapshot compression and set appropriate retention windows to avoid paying for more backup history than your RPO requires

Summary Table

Method	What It Does	Best For	Key Limitation
`mongodump`	Logical export to BSON files	Dev snapshots, migration, single collection export	Not consistent across collections without `--oplog`
`mongorestore`	Imports BSON back into MongoDB	Restoring from mongodump, cross-environment copies	Slow on large datasets — indexes rebuilt per insert
`--oplog` / `--oplogReplay`	Consistent PITR backup and restore	Production backups, accidental delete recovery	Requires replica set — no oplog on standalone
Automated rotation	Schedules, verifies, rotates backups	Production — removes human dependency	Must test restore path — not just creation
Atlas Cloud Backup	Managed snapshots, configurable schedule	Atlas clusters, standard recovery requirement	Restores to snapshot time only — no PITR
Atlas Continuous Backup	Oplog capture — PITR to any second	Financial data, low RPO, accidental write undo	Additional cost — 35-day maximum retention window

Practice Questions

Practice 1. What does the --oplog flag do when used with mongodump, and why does it make the backup more reliable?

Practice 2. Why should you always use --drop when running mongorestore on a database that already contains data?

Practice 3. What is the --oplogLimit flag used for during mongorestore and what format does it accept?

Practice 4. What is the difference between Atlas Cloud Backup and Atlas Continuous Cloud Backup?

Practice 5. Why is it important to store backups on separate infrastructure from your MongoDB servers?

Quiz

Quiz 1. What flag must be passed to mongodump to make the backup consistent to a single point in time across all collections?

--oplog — captures oplog entries during the dump for a consistent snapshot
--consistent — locks the database during the dump
--snapshot — uses a WiredTiger checkpoint
--lock — prevents writes during the dump

Quiz 2. Why does --noIndexRestore speed up large mongorestore operations?

Skipping per-insert index updates and rebuilding once at the end is much faster
It compresses the BSON files during import
It skips document schema validation
It uses parallel threads to insert documents

Quiz 3. Oplog-based point-in-time recovery requires which MongoDB deployment type?

A replica set — standalone servers have no oplog
A sharded cluster — config server holds the oplog
Any MongoDB deployment — oplog exists on all instances
MongoDB Atlas only

Quiz 4. What is the recommended minimum frequency for testing that a backup can actually be restored?

Weekly verification + monthly full DR drill
Only when a real incident occurs
Yearly — backups rarely fail
Never — mongodump output is always valid

Quiz 5. What does RPO stand for and what does it measure in a backup strategy?

RPO — Recovery Point Objective — maximum acceptable data loss as a time window
RPO — Recovery Process Order — steps to restore a backup
RPO — Restore Performance Optimisation — how fast a restore runs
RPO — Replication Primary Override — a failover setting

Next up — Security & Authentication: Enabling access control, creating users with roles, configuring TLS, and hardening a MongoDB deployment against unauthorised access.

← Previous Course Index Next →