MongoDB
MongoDB in the Cloud
Running MongoDB on your own servers gives you full control but comes with a maintenance burden — patching, hardware provisioning, backup management, monitoring setup, and scaling all fall on your team. MongoDB Atlas is MongoDB's fully managed cloud database service that handles all of this automatically. You deploy a cluster with a few clicks or an API call, and Atlas manages the underlying infrastructure, automatic backups, monitoring, security patching, and horizontal scaling. Atlas runs on AWS, Google Cloud, and Azure and is the recommended deployment path for new applications. This lesson covers deploying a cluster, connecting from PyMongo, configuring network access and database users, using the Atlas Data API, Atlas Search for full-text search, and the Atlas CLI for infrastructure-as-code deployments.
1. Atlas Cluster Tiers and Deployment
Atlas organises clusters into tiers — from a free shared tier for development to dedicated multi-region clusters for enterprise workloads. Each tier specifies the RAM, storage, and vCPUs available to the cluster. Choosing the right tier for your workload is the first decision when deploying to Atlas.
# Atlas cluster tiers — reference guide and tier selection logic
cluster_tiers = {
"M0 (Free)": {
"ram": "Shared",
"storage": "512 MB",
"vcpus": "Shared",
"ops_limit": "100 ops/sec shared",
"region": "Single region only",
"best_for": "Learning, prototyping, small hobby projects",
"cost": "Free forever",
},
"M2 / M5 (Shared)": {
"ram": "Shared",
"storage": "2 GB / 5 GB",
"vcpus": "Shared",
"ops_limit": "Moderate shared throughput",
"region": "Single region",
"best_for": "Development and staging environments",
"cost": "$9 / $25 per month",
},
"M10 (Dedicated)": {
"ram": "2 GB",
"storage": "10 GB",
"vcpus": "2 vCPUs",
"ops_limit": "Dedicated — no noisy neighbours",
"region": "Single or multi-region replica set",
"best_for": "Small production apps, low-traffic APIs",
"cost": "~$0.10/hour",
},
"M30 (Dedicated)": {
"ram": "8 GB",
"storage": "40 GB",
"vcpus": "2 vCPUs",
"ops_limit": "Suitable for moderate production load",
"region": "Multi-region replica set available",
"best_for": "Production APIs, e-commerce, SaaS applications",
"cost": "~$0.54/hour",
},
"M50+ (High Performance)": {
"ram": "16–768 GB",
"storage": "80 GB – 4 TB NVMe",
"vcpus": "4–96 vCPUs",
"ops_limit": "High throughput, suitable for 100K+ ops/sec",
"region": "Global multi-region, sharding available",
"best_for": "High-traffic production, analytics, global apps",
"cost": "$1.04/hour and up",
},
}
print("Atlas Cluster Tier Reference:\n")
for tier, specs in cluster_tiers.items():
print(f" {tier}")
for k, v in specs.items():
print(f" {k:12} {v}")
print()
# Tier selection guide
print("Tier selection guide:\n")
decisions = [
("Learning / prototype", "M0 Free — zero cost, start immediately"),
("Dev / staging", "M2 or M5 — cheap, no performance guarantees needed"),
("Small production", "M10 — dedicated resources, backups enabled"),
("Mid-traffic production", "M30 — 8 GB RAM handles most workloads"),
("High-traffic / analytics", "M50+ — scale RAM and vCPUs to match working set"),
("Global / multi-region", "M30+ with cross-region replica set members"),
]
for scenario, recommendation in decisions:
print(f" {scenario:30} {recommendation}")M0 (Free)
ram Shared
storage 512 MB
ops_limit 100 ops/sec shared
best_for Learning, prototyping, small hobby projects
cost Free forever
M10 (Dedicated)
ram 2 GB
storage 10 GB
vcpus 2 vCPUs
best_for Small production apps, low-traffic APIs
cost ~$0.10/hour
M30 (Dedicated)
ram 8 GB
storage 40 GB
best_for Production APIs, e-commerce, SaaS applications
cost ~$0.54/hour
Tier selection guide:
Learning / prototype M0 Free — zero cost, start immediately
Dev / staging M2 or M5 — cheap, no performance guarantees
Small production M10 — dedicated resources, backups enabled
Mid-traffic production M30 — 8 GB RAM handles most workloads
High-traffic / analytics M50+ — scale RAM and vCPUs to match working set
Global / multi-region M30+ with cross-region replica set members
- M0, M2, and M5 clusters are shared infrastructure — other tenants run on the same hardware, so performance can vary. Always move to a dedicated tier (M10+) before going to production
- Atlas clusters can be scaled up or down with zero downtime by changing the tier in the UI or via the Atlas API — start smaller and scale when you have real traffic data
- Storage auto-scaling is available on dedicated tiers — Atlas automatically increases disk when utilisation exceeds 90%, preventing disk-full incidents without manual intervention
2. Connecting PyMongo to Atlas
An Atlas cluster exposes a connection string that includes the cluster hostname, authentication credentials, and configuration options. The Atlas connection string format is slightly different from a local MongoDB URI — it uses DNS seed list discovery (mongodb+srv://) which automatically resolves all replica set members from a single hostname.
# Connecting PyMongo to Atlas — connection string, options, and best practices
from pymongo import MongoClient
from pymongo.server_api import ServerApi
import os
# Atlas connection string format:
# mongodb+srv://username:password@cluster.mongodb.net/
# The +srv scheme uses DNS SRV records to discover all replica set members
# automatically — no need to list all three nodes manually
# ── Best practice: read credentials from environment variables ────────────
ATLAS_URI = os.environ.get(
"ATLAS_URI",
"mongodb+srv://app_user:@dataplexa.abc12.mongodb.net/"
)
# ── Recommended connection options for Atlas ──────────────────────────────
client = MongoClient(
ATLAS_URI,
# Enforce Stable API version — protects against breaking changes
server_api=ServerApi("1"),
# TLS is always on for Atlas — no extra config needed
# (Atlas rejects non-TLS connections by default)
# Connection pool sizing
maxPoolSize=50, # max connections in pool (default 100)
minPoolSize=5, # keep at least 5 connections open
maxIdleTimeMS=60000, # close idle connections after 60s
# Timeouts
connectTimeoutMS=10000, # 10s to establish a connection
serverSelectionTimeoutMS=5000, # 5s to find a server
# Retry
retryWrites=True, # retry writes on network errors
retryReads=True, # retry reads on network errors
# Compression — reduces bandwidth between app and Atlas
compressors=["zstd", "snappy", "zlib"],
)
# ── Verify connection ─────────────────────────────────────────────────────
try:
client.admin.command("ping")
print("✓ Connected to MongoDB Atlas successfully")
except Exception as e:
print(f"✗ Connection failed: {e}")
db = client["dataplexa"]
# ── Stable API — version 1 behaviour ─────────────────────────────────────
print("\nStable API version 1 guarantees:")
guarantees = [
"Commands behave consistently regardless of MongoDB server version",
"New server features never silently change existing command behaviour",
"Deprecated commands raise errors rather than silently degrading",
"Safe to upgrade the Atlas cluster version without re-testing all queries",
]
for g in guarantees:
print(f" ✓ {g}")
# ── Connection string components explained ────────────────────────────────
print("\nAtlas connection string anatomy:")
parts = [
("mongodb+srv://", "DNS SRV scheme — auto-discovers all replica members"),
("app_user:password@", "Credentials — use env vars, never hardcode"),
("cluster.abc12.mongodb.net","Atlas cluster hostname — unique per cluster"),
("/?retryWrites=true", "Options — can also be set in MongoClient kwargs"),
("&w=majority", "Write concern — majority is Atlas default"),
]
for part, desc in parts:
print(f" {part:35} {desc}") Stable API version 1 guarantees:
✓ Commands behave consistently regardless of MongoDB server version
✓ New server features never silently change existing command behaviour
✓ Deprecated commands raise errors rather than silently degrading
✓ Safe to upgrade the Atlas cluster version without re-testing all queries
Atlas connection string anatomy:
mongodb+srv:// DNS SRV scheme — auto-discovers all replica members
app_user:password@ Credentials — use env vars, never hardcode
cluster.abc12.mongodb.net Atlas cluster hostname — unique per cluster
/?retryWrites=true Options — can also be set in MongoClient kwargs
&w=majority Write concern — majority is Atlas default
- Always use
mongodb+srv://for Atlas connections — the DNS SRV scheme automatically discovers all replica set members and updates when Atlas replaces nodes during maintenance or scaling - Never hardcode credentials in connection strings in source code — use environment variables, a
.envfile excluded from version control, or a secrets manager like AWS Secrets Manager or HashiCorp Vault ServerApi("1")is strongly recommended for Atlas — it protects your application from unexpected behaviour changes when MongoDB releases new server versions
3. Network Access and Database Users via Atlas API
Atlas enforces two access control layers before a connection reaches your data: the IP access list (which source IPs are allowed to connect) and database users (which credentials are accepted). Both can be managed through the Atlas UI, the Atlas CLI, or the Atlas Administration API — making them automatable in CI/CD pipelines and infrastructure-as-code workflows.
# Atlas network access and database users via Admin API
import requests
from requests.auth import HTTPDigestAuth
import os
# Atlas Admin API credentials
PUBLIC_KEY = os.environ.get("ATLAS_PUBLIC_KEY", "your_public_key")
PRIVATE_KEY = os.environ.get("ATLAS_PRIVATE_KEY", "your_private_key")
PROJECT_ID = os.environ.get("ATLAS_PROJECT_ID", "your_project_id")
BASE = "https://cloud.mongodb.com/api/atlas/v1.0"
auth = HTTPDigestAuth(PUBLIC_KEY, PRIVATE_KEY)
def atlas_get(path: str) -> dict:
r = requests.get(f"{BASE}{path}", auth=auth,
headers={"Content-Type": "application/json"})
return r.json()
def atlas_post(path: str, payload: dict) -> dict:
r = requests.post(f"{BASE}{path}", auth=auth,
headers={"Content-Type": "application/json"},
json=payload)
return r.json()
# ── 1. Add an IP address to the access list ───────────────────────────────
print("1. Add IP to Atlas access list:")
ip_payload = [
{
"ipAddress": "203.0.113.50", # your app server's public IP
"comment": "Production app server — dataplexa-api-01"
}
]
print(f" POST /groups/{PROJECT_ID}/accessList")
print(f" Payload: {ip_payload[0]}")
# ── 2. Allow access from anywhere (development only) ─────────────────────
print("\n2. Allow access from anywhere (0.0.0.0/0) — DEV ONLY:")
dev_ip_payload = [{"cidrBlock": "0.0.0.0/0", "comment": "Dev only — remove before production"}]
print(f" ⚠ Never use 0.0.0.0/0 in production — add specific IPs or use VPC peering")
# ── 3. Create a database user ─────────────────────────────────────────────
print("\n3. Create database user via Atlas API:")
user_payload = {
"username": "dataplexa_app",
"password": os.environ.get("DB_APP_PASSWORD", "StrongPass!123"),
"databaseName": "admin",
"roles": [
{"roleName": "readWrite", "databaseName": "dataplexa"}
],
"scopes": [
{"name": "DataplexaCluster", "type": "CLUSTER"}
]
}
print(f" POST /groups/{PROJECT_ID}/databaseUsers")
print(f" username: {user_payload['username']}")
print(f" roles: readWrite@dataplexa")
print(f" scopes: DataplexaCluster only")
# ── 4. VPC Peering — private network access ───────────────────────────────
print("\n4. VPC Peering (production recommended):")
vpc_benefits = [
"Traffic between your VPC and Atlas never traverses the public internet",
"No IP allowlist needed — network-level isolation replaces it",
"Lower latency — same cloud region, private routing",
"Required for compliance in many regulated industries (PCI, HIPAA)",
]
for b in vpc_benefits:
print(f" ✓ {b}")
print("\n Setup: Atlas UI → Network Access → Peering → Add Peering Connection")
print(" Supports AWS VPC, Google Cloud VPC, and Azure VNet peering")POST /groups/{PROJECT_ID}/accessList
Payload: {'ipAddress': '203.0.113.50', 'comment': 'Production app server — dataplexa-api-01'}
2. Allow access from anywhere (0.0.0.0/0) — DEV ONLY:
⚠ Never use 0.0.0.0/0 in production — add specific IPs or use VPC peering
3. Create database user via Atlas API:
POST /groups/{PROJECT_ID}/databaseUsers
username: dataplexa_app
roles: readWrite@dataplexa
scopes: DataplexaCluster only
4. VPC Peering (production recommended):
✓ Traffic between your VPC and Atlas never traverses the public internet
✓ No IP allowlist needed — network-level isolation replaces it
✓ Lower latency — same cloud region, private routing
✓ Required for compliance in many regulated industries (PCI, HIPAA)
- Use
scopeswhen creating Atlas database users to restrict a user to a specific cluster — without scopes the user can connect to any cluster in the project - VPC peering is the production-grade alternative to IP allowlisting — it creates a private network route between your cloud VPC and the Atlas cluster, eliminating the need to maintain allowlist entries for every app server IP
- The Atlas Administration API uses HTTP Digest Authentication — store your public and private keys in a secrets manager and rotate them regularly, never commit them to version control
4. Atlas Search — Full-Text Search
Atlas Search is a full-text search engine built into MongoDB Atlas, powered by Apache Lucene under the hood. It enables relevance-ranked search, autocomplete, fuzzy matching, and faceted navigation without requiring a separate Elasticsearch deployment. Searches run as aggregation pipeline stages using $search — fully integrated with the rest of the aggregation framework.
# Atlas Search — full-text search on the Dataplexa products collection
from pymongo import MongoClient
from pymongo.server_api import ServerApi
import os
client = MongoClient(os.environ.get("ATLAS_URI"), server_api=ServerApi("1"))
db = client["dataplexa"]
# ── Step 1: create a search index (done once via Atlas UI or API) ─────────
# Atlas Search index definition for products collection:
search_index_def = {
"name": "products_search",
"mappings": {
"dynamic": False, # only index explicitly listed fields
"fields": {
"name": {
"type": "string",
"analyzer": "lucene.standard"
},
"category": {
"type": "string",
"analyzer": "lucene.keyword"
},
"price": {
"type": "number"
},
"rating": {
"type": "number"
},
}
}
}
print("Atlas Search index definition (create once in Atlas UI or API):")
print(f" index name: products_search")
print(f" fields: name (standard), category (keyword), price, rating\n")
# ── Step 2: full-text search with $search ─────────────────────────────────
print("1. Basic text search — products matching 'wireless keyboard':")
results = list(db.products.aggregate([
{"$search": {
"index": "products_search",
"text": {
"query": "wireless keyboard",
"path": "name", # search in the name field
"fuzzy": {"maxEdits": 1} # allow one character difference (typo tolerance)
}
}},
{"$project": {
"name": 1,
"price": 1,
"rating": 1,
"score": {"$meta": "searchScore"}, # relevance score from Lucene
"_id": 0
}},
{"$sort": {"score": -1}},
{"$limit": 3}
]))
for r in results:
print(f" score: {r.get('score', 0):.3f} {r['name']:25} ${r['price']:.2f} ★{r['rating']}")
# ── Step 3: autocomplete search ───────────────────────────────────────────
print("\n2. Autocomplete — products starting with 'mec':")
# Requires autocomplete type in the index mapping for the name field
results_ac = list(db.products.aggregate([
{"$search": {
"index": "products_search",
"autocomplete": {
"query": "mec",
"path": "name",
"tokenOrder": "sequential"
}
}},
{"$project": {"name": 1, "price": 1, "_id": 0}},
{"$limit": 3}
]))
for r in results_ac:
print(f" {r['name']:25} ${r['price']:.2f}")
# ── Step 4: compound search — text + filter + range ───────────────────────
print("\n3. Compound search — Electronics under $100 matching 'mouse hub':")
results_c = list(db.products.aggregate([
{"$search": {
"index": "products_search",
"compound": {
"must": [
{"text": {"query": "mouse hub", "path": "name"}}
],
"filter": [
{"text": {"query": "Electronics", "path": "category"}},
{"range": {"path": "price", "lte": 100}}
]
}
}},
{"$project": {
"name": 1, "category": 1, "price": 1,
"score": {"$meta": "searchScore"}, "_id": 0
}},
{"$sort": {"score": -1}}
]))
for r in results_c:
print(f" score: {r.get('score',0):.3f} {r['name']:25} ${r['price']:.2f}")index name: products_search
fields: name (standard), category (keyword), price, rating
1. Basic text search — products matching 'wireless keyboard':
score: 1.847 Wireless Mouse $29.99 ★4.5
score: 1.203 Mechanical Keyboard $89.99 ★4.7
2. Autocomplete — products starting with 'mec':
Mechanical Keyboard $89.99
3. Compound search — Electronics under $100 matching 'mouse hub':
score: 1.654 Wireless Mouse $29.99
score: 1.231 USB-C Hub $49.99
{"$meta": "searchScore"}in a$projectstage returns the Lucene relevance score — use it to sort results by relevance and to understand why certain documents ranked higher- Fuzzy matching with
maxEdits: 1tolerates one character insertion, deletion, or substitution — this catches common typos like "keybaord" matching "keyboard" - Atlas Search indexes are separate from regular MongoDB indexes and are built asynchronously — newly inserted documents appear in search results within a few seconds of insertion, not immediately
5. Atlas CLI — Infrastructure as Code
The Atlas CLI is a command-line tool that lets you create clusters, configure access, deploy search indexes, and manage your Atlas project entirely from the terminal or CI/CD pipeline — no UI clicks required. This makes Atlas deployments repeatable, version-controllable, and automatable.
# Atlas CLI — common commands and automation patterns
# Atlas CLI commands are shell commands — shown here as reference strings
# Run them in your terminal after installing: brew install mongodb-atlas-cli
# Or: npm install -g @mongodb-js/atlas-cli
import subprocess
import json
atlas_commands = {
"Setup & auth": [
("atlas auth login",
"Authenticate with your Atlas account"),
("atlas config set project_id ",
"Set default project for all subsequent commands"),
],
"Cluster management": [
("atlas clusters create DataplexaCluster \\\n"
" --provider AWS --region EU_WEST_1 \\\n"
" --tier M10 --mdbVersion 7.0",
"Create an M10 cluster on AWS in eu-west-1"),
("atlas clusters describe DataplexaCluster",
"Show cluster details including connection string"),
("atlas clusters pause DataplexaCluster",
"Pause cluster to stop billing (dev clusters only)"),
("atlas clusters delete DataplexaCluster --force",
"Delete cluster permanently"),
],
"Database users": [
("atlas dbusers create \\\n"
" --username app_user \\\n"
" --password '$APP_PASSWORD' \\\n"
" --role readWrite@dataplexa",
"Create app_user with readWrite on dataplexa"),
("atlas dbusers list",
"List all database users in the project"),
],
"Network access": [
("atlas accessLists create 203.0.113.50/32 \\\n"
" --type ipAddress --comment 'App server'",
"Allow a specific IP address"),
("atlas accessLists list",
"Show all IP access list entries"),
],
"Backups & restore": [
("atlas backups snapshots list DataplexaCluster",
"List available snapshots"),
("atlas backups restores start automated \\\n"
" --snapshotId --targetClusterName DataplexaStaging",
"Restore a snapshot to a staging cluster"),
],
}
print("Atlas CLI command reference:\n")
for category, commands in atlas_commands.items():
print(f" {category}:")
for cmd, desc in commands:
print(f" # {desc}")
for line in cmd.split("\n"):
print(f" {line}")
print()
# Atlas CLI in CI/CD — example GitHub Actions step
print("Atlas CLI in GitHub Actions (CI/CD example):\n")
gha_step = """
- name: Create Atlas cluster for PR environment
env:
MONGODB_ATLAS_PUBLIC_API_KEY: ${{ secrets.ATLAS_PUBLIC_KEY }}
MONGODB_ATLAS_PRIVATE_API_KEY: ${{ secrets.ATLAS_PRIVATE_KEY }}
MONGODB_ATLAS_ORG_ID: ${{ secrets.ATLAS_ORG_ID }}
run: |
atlas clusters create pr-${{ github.event.pull_request.number }} \\
--provider AWS --region EU_WEST_1 --tier M0
atlas dbusers create \\
--username ci_user --password $CI_DB_PASS \\
--role readWrite@dataplexa
"""
print(gha_step) Setup & auth:
# Authenticate with your Atlas account
atlas auth login
# Set default project for all subsequent commands
atlas config set project_id <PROJECT_ID>
Cluster management:
# Create an M10 cluster on AWS in eu-west-1
atlas clusters create DataplexaCluster \
--provider AWS --region EU_WEST_1 \
--tier M10 --mdbVersion 7.0
# Show cluster details including connection string
atlas clusters describe DataplexaCluster
Database users:
# Create app_user with readWrite on dataplexa
atlas dbusers create \
--username app_user \
--password '$APP_PASSWORD' \
--role readWrite@dataplexa
Network access:
# Allow a specific IP address
atlas accessLists create 203.0.113.50/32 \
--type ipAddress --comment 'App server'
Atlas CLI in GitHub Actions (CI/CD example):
- name: Create Atlas cluster for PR environment
run: |
atlas clusters create pr-42 --provider AWS --region EU_WEST_1 --tier M0
atlas dbusers create --username ci_user --role readWrite@dataplexa
- The Atlas CLI reads credentials from environment variables
MONGODB_ATLAS_PUBLIC_API_KEYandMONGODB_ATLAS_PRIVATE_API_KEY— set these in your CI/CD secrets store rather than in config files - Use the Atlas CLI to spin up ephemeral M0 clusters for pull request testing environments and tear them down after the PR merges — this gives every PR its own isolated database at zero cost
- Combine the Atlas CLI with Terraform's MongoDB Atlas provider for full infrastructure-as-code — clusters, users, network access, and backup policies all defined in
.tffiles and applied withterraform apply
Summary Table
| Feature | What It Does | Key Rule |
|---|---|---|
| Cluster tiers | M0 free → M50+ high-performance | Never use shared tiers (M0–M5) in production |
mongodb+srv:// |
DNS SRV — auto-discovers all replica members | Always use for Atlas — never hardcode replica IPs |
ServerApi("1") |
Stable API — consistent behaviour across versions | Include in every Atlas MongoClient constructor |
| IP access list | First network access layer — only listed IPs connect | Never use 0.0.0.0/0 in production — use VPC peering |
| VPC peering | Private network route — no public internet traffic | Required for production and compliance workloads |
| Atlas Search | Lucene-powered full-text search via $search |
Use compound for combined text + filter + range |
| Atlas CLI | Automate clusters, users, access from terminal / CI | Use in CI/CD for ephemeral PR test environments |
Practice Questions
Practice 1. What is the difference between the mongodb:// and mongodb+srv:// connection schemes and why does Atlas use the SRV scheme?
Practice 2. Why should you use M10 or higher for production Atlas workloads rather than M0, M2, or M5?
Practice 3. What does the $search stage's compound operator allow you to do that a simple text search cannot?
Practice 4. What is VPC peering and why is it preferred over IP allowlisting for production Atlas deployments?
Practice 5. What does ServerApi("1") do when included in a PyMongo MongoClient constructor for Atlas?
Quiz
Quiz 1. Which Atlas cluster tier provides dedicated resources and is the minimum recommended tier for production workloads?
Quiz 2. What does the fuzzy option in an Atlas Search text query do?
Quiz 3. How do you retrieve the Lucene relevance score from an Atlas Search query in a $project stage?
Quiz 4. Which Atlas CLI command creates a new M10 cluster on AWS in the eu-west-1 region?
Quiz 5. What environment variables does the Atlas CLI use for authentication in a CI/CD pipeline?
Next up — Mini Project: Build a complete Dataplexa Store command-line application using everything covered in this course — CRUD, aggregation, indexes, transactions, and Atlas deployment.