MongoDB
What is MongoDB
MongoDB is the world's most popular NoSQL database and one of the most widely used databases overall. It stores data as flexible, JSON-like documents rather than rigid rows in tables — making it a natural fit for modern application development where data structures evolve rapidly and scale requirements are enormous. The name comes from "humongous" — a nod to its design goal of handling massive amounts of data.
This lesson covers MongoDB's origin, its core architecture, the document model, how it compares to traditional databases conceptually, and why it has become the database of choice for startups and enterprises alike.
The Origin of MongoDB
MongoDB was created by Dwight Merriman, Eliot Horowitz, and Kevin Ryan — the same team behind DoubleClick, one of the internet's largest advertising platforms. While building DoubleClick, they experienced first-hand the limitations of relational databases at web scale — painful schema migrations, difficulty scaling horizontally, and a mismatch between the object-oriented code they wrote and the tables they had to store it in.
In 2007 they started 10gen (later renamed MongoDB Inc.) to build the database they wished they had. MongoDB was first released as open source in 2009. By 2023 it had over 40,000 customers in more than 100 countries, and MongoDB Inc. became a publicly traded company on NASDAQ.
- 2007 — development begins at 10gen
- 2009 — first open-source release
- 2013 — company renamed MongoDB Inc.
- 2016 — MongoDB Atlas (fully managed cloud service) launched
- 2017 — IPO on NASDAQ
- 2022 — MongoDB 6.0 released with time-series collections and cluster-to-cluster sync
The Document Model
The central idea in MongoDB is the document. A document is a record stored in BSON format (Binary JSON) — essentially a rich JSON object that can contain strings, numbers, booleans, arrays, nested objects, dates, and more. Documents map directly to objects in your programming language, eliminating the translation layer between application code and the database.
Why it exists: in a relational database, a user with multiple addresses, multiple orders, and multiple preferences requires four or five tables and complex JOINs just to reassemble one user object. In MongoDB, one document holds everything — the way you naturally think about a user in your code.
Real-world use: MongoDB powers the user profiles of Forbes, the content management of The Guardian newspaper, the data platform of Adobe, and the backend of EA Games — all cases where rich, varied documents outperform rigid tables.
# A MongoDB document — rich, nested, self-contained
user_document = {
"_id": "64a1f2e3b4c5d6e7f8a9b0c1", # unique identifier (ObjectId)
"name": "Alice Johnson",
"email": "alice@example.com",
"age": 30,
"verified": True,
"created_at": "2024-01-15T09:30:00Z",
"address": { # embedded sub-document
"street": "42 Baker Street",
"city": "London",
"country": "UK",
"postcode": "W1U 6RY"
},
"tags": ["premium", "early_adopter"], # array of strings
"orders": [ # array of sub-documents
{
"order_id": "ORD-001",
"product": "Laptop",
"amount": 999.99,
"status": "delivered"
},
{
"order_id": "ORD-002",
"product": "Mouse",
"amount": 29.99,
"status": "shipped"
}
],
"settings": { # flexible nested config
"theme": "dark",
"language": "en-GB",
"notifications": True
}
}
print(f"User: {user_document['name']}")
print(f"City: {user_document['address']['city']}")
print(f"Orders: {len(user_document['orders'])}")
print(f"Tags: {user_document['tags']}")City: London
Orders: 2
Tags: ['premium', 'early_adopter']
- Every document has a unique
_idfield — automatically generated as an ObjectId if not provided - Documents can nest objects and arrays to any depth — no artificial flattening into tables
- The document structure mirrors how developers think about objects in their code
- Two documents in the same collection can have completely different fields — the schema is flexible
Core MongoDB Terminology
MongoDB uses its own vocabulary. Understanding the mapping between MongoDB terms and relational database terms makes everything click faster.
| SQL Term | MongoDB Term | Description |
|---|---|---|
| Database | Database | Top-level namespace containing collections |
| Table | Collection | Group of documents (like a table but schema-free) |
| Row | Document | A single record stored as BSON |
| Column | Field | A key-value pair inside a document |
| Primary Key | _id field |
Unique identifier for every document |
| JOIN | $lookup |
Aggregation stage to join collections |
| Index | Index | Same concept — speeds up queries |
How MongoDB Stores Data — BSON
While you write and read data as JSON, MongoDB stores it internally as BSON (Binary JSON). BSON extends JSON with additional data types — dates, binary data, 64-bit integers, and the ObjectId type — and is designed for efficient encoding and decoding.
# BSON data types — richer than plain JSON
from bson import ObjectId
from datetime import datetime
# ObjectId — 12-byte unique identifier generated automatically
doc_id = ObjectId()
print("ObjectId:", doc_id)
print("Timestamp embedded in ObjectId:", doc_id.generation_time)
# A BSON document with rich data types
bson_document = {
"_id": ObjectId(), # BSON ObjectId (not just a string)
"name": "Alice Johnson", # String
"age": 30, # Int32
"balance": 9999.99, # Double
"verified": True, # Boolean
"created_at": datetime(2024, 1, 15, 9, 30), # BSON Date (not a string)
"scores": [95, 87, 92], # Array
"address": {"city": "London"}, # Embedded document
"avatar": None # Null
}
print("\nDocument fields:")
for key, value in bson_document.items():
print(f" {key}: {type(value).__name__} = {value}")Timestamp embedded in ObjectId: 2024-01-15 09:30:00+00:00
Document fields:
_id: ObjectId = 64a1f2e3b4c5d6e7f8a9b0c1
name: str = Alice Johnson
age: int = 30
balance: float = 9999.99
verified: bool = True
created_at: datetime = 2024-01-15 09:30:00
scores: list = [95, 87, 92]
address: dict = {'city': 'London'}
avatar: NoneType = None
- ObjectId is 12 bytes: 4-byte timestamp + 5-byte random + 3-byte incrementing counter — globally unique without a central coordinator
- The timestamp embedded in ObjectId means documents are implicitly sortable by creation time
- BSON Date stores milliseconds since epoch — far more reliable than storing dates as strings
- BSON is faster to encode/decode than plain JSON and supports richer types
The MongoDB Query Language (MQL)
MongoDB uses its own query language — MQL — expressed as JSON-like objects rather than SQL strings. This keeps queries consistent with the document model and easy to build programmatically.
# MQL vs SQL — same questions, different syntax
# SQL: SELECT * FROM users WHERE age > 25 AND city = 'London'
# MQL:
mql_find = {
"age": {"$gt": 25},
"address.city": "London" # dot notation for nested fields
}
# SQL: SELECT name, email FROM users WHERE verified = true ORDER BY age DESC LIMIT 5
# MQL:
mql_query = {"verified": True}
mql_project = {"name": 1, "email": 1, "_id": 0} # 1 = include, 0 = exclude
mql_sort = [("age", -1)] # -1 = descending
mql_limit = 5
# SQL: UPDATE users SET age = 31 WHERE name = 'Alice'
# MQL:
mql_update_filter = {"name": "Alice"}
mql_update_op = {"$set": {"age": 31}}
# SQL: INSERT INTO users (name, email) VALUES ('Bob', 'bob@example.com')
# MQL:
mql_insert = {"name": "Bob", "email": "bob@example.com"}
print("MQL queries are Python dictionaries — no string parsing, no SQL injection risk.")
print("Queries are built programmatically — easy to compose dynamically in code.")Queries are built programmatically — easy to compose dynamically in code.
- MQL queries are JSON objects — they compose naturally in code without string concatenation
- Dot notation (
"address.city") queries nested fields directly — no joins or aliases needed - Operators like
$gt,$lt,$in,$setfollow a consistent$-prefix convention - Because queries are data structures, not strings, SQL injection is structurally impossible in MQL
MongoDB Editions
MongoDB is available in several editions to suit different needs and budgets.
- MongoDB Community Server — free, open-source, self-hosted. All core features. The right choice for development, small projects, and learning.
- MongoDB Enterprise — paid, self-hosted. Adds advanced security (LDAP, Kerberos), encrypted storage engine, audit logging, and commercial support.
- MongoDB Atlas — fully managed cloud database service. Runs on AWS, GCP, or Azure. No infrastructure management — MongoDB Inc. handles backups, scaling, patching, and monitoring. Free tier available.
- MongoDB Atlas Serverless — consumption-based pricing, scales to zero. Pay only for operations performed — ideal for intermittent workloads.
Summary Table
| Concept | Detail | Key Point |
|---|---|---|
| Document | BSON record — nested, flexible | Core unit of storage in MongoDB |
| Collection | Group of documents | Equivalent to a SQL table |
| BSON | Binary JSON with richer types | Internal storage format — faster than plain JSON |
| ObjectId | 12-byte unique identifier | Auto-generated, timestamp-embedded, globally unique |
| MQL | MongoDB Query Language | JSON-based queries — no SQL strings |
| Atlas | Managed cloud service | No infrastructure management — free tier available |
Practice Questions
Practice 1. What does the name "MongoDB" derive from and what was its design goal?
Practice 2. What is the MongoDB equivalent of a SQL table?
Practice 3. What format does MongoDB use internally to store documents, and how does it differ from JSON?
Practice 4. What is an ObjectId and what information is embedded in it?
Practice 5. What syntax does MQL use to query a nested field such as the city inside an address sub-document?
Quiz
Quiz 1. In what year was MongoDB first released as open source?
Quiz 2. What is the unique identifier field automatically added to every MongoDB document?
Quiz 3. What does the $lookup stage in MongoDB correspond to in SQL?
Quiz 4. Which MongoDB edition is fully managed and runs on AWS, GCP, or Azure?
Quiz 5. Why is SQL injection structurally impossible in MongoDB's MQL?
Next up — MongoDB Features & Use Cases: the full feature set and the real-world scenarios where MongoDB excels.