Mango DBLesson 3 – What is MongoDB | Dataplexa

What is MongoDB

MongoDB is the world's most popular NoSQL database and one of the most widely used databases overall. It stores data as flexible, JSON-like documents rather than rigid rows in tables — making it a natural fit for modern application development where data structures evolve rapidly and scale requirements are enormous. The name comes from "humongous" — a nod to its design goal of handling massive amounts of data.

This lesson covers MongoDB's origin, its core architecture, the document model, how it compares to traditional databases conceptually, and why it has become the database of choice for startups and enterprises alike.

The Origin of MongoDB

MongoDB was created by Dwight Merriman, Eliot Horowitz, and Kevin Ryan — the same team behind DoubleClick, one of the internet's largest advertising platforms. While building DoubleClick, they experienced first-hand the limitations of relational databases at web scale — painful schema migrations, difficulty scaling horizontally, and a mismatch between the object-oriented code they wrote and the tables they had to store it in.

In 2007 they started 10gen (later renamed MongoDB Inc.) to build the database they wished they had. MongoDB was first released as open source in 2009. By 2023 it had over 40,000 customers in more than 100 countries, and MongoDB Inc. became a publicly traded company on NASDAQ.

2007 — development begins at 10gen
2009 — first open-source release
2013 — company renamed MongoDB Inc.
2016 — MongoDB Atlas (fully managed cloud service) launched
2017 — IPO on NASDAQ
2022 — MongoDB 6.0 released with time-series collections and cluster-to-cluster sync

The Document Model

The central idea in MongoDB is the document. A document is a record stored in BSON format (Binary JSON) — essentially a rich JSON object that can contain strings, numbers, booleans, arrays, nested objects, dates, and more. Documents map directly to objects in your programming language, eliminating the translation layer between application code and the database.

Why it exists: in a relational database, a user with multiple addresses, multiple orders, and multiple preferences requires four or five tables and complex JOINs just to reassemble one user object. In MongoDB, one document holds everything — the way you naturally think about a user in your code.

Real-world use: MongoDB powers the user profiles of Forbes, the content management of The Guardian newspaper, the data platform of Adobe, and the backend of EA Games — all cases where rich, varied documents outperform rigid tables.

# A MongoDB document — rich, nested, self-contained

user_document = {
    "_id": "64a1f2e3b4c5d6e7f8a9b0c1",  # unique identifier (ObjectId)
    "name": "Alice Johnson",
    "email": "alice@example.com",
    "age": 30,
    "verified": True,
    "created_at": "2024-01-15T09:30:00Z",
    "address": {                            # embedded sub-document
        "street": "42 Baker Street",
        "city": "London",
        "country": "UK",
        "postcode": "W1U 6RY"
    },
    "tags": ["premium", "early_adopter"],   # array of strings
    "orders": [                             # array of sub-documents
        {
            "order_id": "ORD-001",
            "product": "Laptop",
            "amount": 999.99,
            "status": "delivered"
        },
        {
            "order_id": "ORD-002",
            "product": "Mouse",
            "amount": 29.99,
            "status": "shipped"
        }
    ],
    "settings": {                           # flexible nested config
        "theme": "dark",
        "language": "en-GB",
        "notifications": True
    }
}

print(f"User: {user_document['name']}")
print(f"City: {user_document['address']['city']}")
print(f"Orders: {len(user_document['orders'])}")
print(f"Tags: {user_document['tags']}")

User: Alice Johnson
City: London
Orders: 2
Tags: ['premium', 'early_adopter']

Every document has a unique _id field — automatically generated as an ObjectId if not provided
Documents can nest objects and arrays to any depth — no artificial flattening into tables
The document structure mirrors how developers think about objects in their code
Two documents in the same collection can have completely different fields — the schema is flexible

Core MongoDB Terminology

MongoDB uses its own vocabulary. Understanding the mapping between MongoDB terms and relational database terms makes everything click faster.

SQL Term	MongoDB Term	Description
Database	Database	Top-level namespace containing collections
Table	Collection	Group of documents (like a table but schema-free)
Row	Document	A single record stored as BSON
Column	Field	A key-value pair inside a document
Primary Key	`_id` field	Unique identifier for every document
JOIN	`$lookup`	Aggregation stage to join collections
Index	Index	Same concept — speeds up queries

How MongoDB Stores Data — BSON

While you write and read data as JSON, MongoDB stores it internally as BSON (Binary JSON). BSON extends JSON with additional data types — dates, binary data, 64-bit integers, and the ObjectId type — and is designed for efficient encoding and decoding.

# BSON data types — richer than plain JSON

from bson import ObjectId
from datetime import datetime

# ObjectId — 12-byte unique identifier generated automatically
doc_id = ObjectId()
print("ObjectId:", doc_id)
print("Timestamp embedded in ObjectId:", doc_id.generation_time)

# A BSON document with rich data types
bson_document = {
    "_id":        ObjectId(),                    # BSON ObjectId (not just a string)
    "name":       "Alice Johnson",               # String
    "age":        30,                            # Int32
    "balance":    9999.99,                       # Double
    "verified":   True,                          # Boolean
    "created_at": datetime(2024, 1, 15, 9, 30), # BSON Date (not a string)
    "scores":     [95, 87, 92],                  # Array
    "address":    {"city": "London"},            # Embedded document
    "avatar":     None                           # Null
}

print("\nDocument fields:")
for key, value in bson_document.items():
    print(f"  {key}: {type(value).__name__} = {value}")

ObjectId: 64a1f2e3b4c5d6e7f8a9b0c1
Timestamp embedded in ObjectId: 2024-01-15 09:30:00+00:00

Document fields:
_id: ObjectId = 64a1f2e3b4c5d6e7f8a9b0c1
name: str = Alice Johnson
age: int = 30
balance: float = 9999.99
verified: bool = True
created_at: datetime = 2024-01-15 09:30:00
scores: list = [95, 87, 92]
address: dict = {'city': 'London'}
avatar: NoneType = None

ObjectId is 12 bytes: 4-byte timestamp + 5-byte random + 3-byte incrementing counter — globally unique without a central coordinator
The timestamp embedded in ObjectId means documents are implicitly sortable by creation time
BSON Date stores milliseconds since epoch — far more reliable than storing dates as strings
BSON is faster to encode/decode than plain JSON and supports richer types

The MongoDB Query Language (MQL)

MongoDB uses its own query language — MQL — expressed as JSON-like objects rather than SQL strings. This keeps queries consistent with the document model and easy to build programmatically.

# MQL vs SQL — same questions, different syntax

# SQL:  SELECT * FROM users WHERE age > 25 AND city = 'London'
# MQL:
mql_find = {
    "age":              {"$gt": 25},
    "address.city":     "London"       # dot notation for nested fields
}

# SQL:  SELECT name, email FROM users WHERE verified = true ORDER BY age DESC LIMIT 5
# MQL:
mql_query  = {"verified": True}
mql_project = {"name": 1, "email": 1, "_id": 0}   # 1 = include, 0 = exclude
mql_sort   = [("age", -1)]    # -1 = descending
mql_limit  = 5

# SQL:  UPDATE users SET age = 31 WHERE name = 'Alice'
# MQL:
mql_update_filter = {"name": "Alice"}
mql_update_op     = {"$set": {"age": 31}}

# SQL:  INSERT INTO users (name, email) VALUES ('Bob', 'bob@example.com')
# MQL:
mql_insert = {"name": "Bob", "email": "bob@example.com"}

print("MQL queries are Python dictionaries — no string parsing, no SQL injection risk.")
print("Queries are built programmatically — easy to compose dynamically in code.")

MQL queries are Python dictionaries — no string parsing, no SQL injection risk.
Queries are built programmatically — easy to compose dynamically in code.

MQL queries are JSON objects — they compose naturally in code without string concatenation
Dot notation ("address.city") queries nested fields directly — no joins or aliases needed
Operators like $gt, $lt, $in, $set follow a consistent $-prefix convention
Because queries are data structures, not strings, SQL injection is structurally impossible in MQL

MongoDB Editions

MongoDB is available in several editions to suit different needs and budgets.

MongoDB Community Server — free, open-source, self-hosted. All core features. The right choice for development, small projects, and learning.
MongoDB Enterprise — paid, self-hosted. Adds advanced security (LDAP, Kerberos), encrypted storage engine, audit logging, and commercial support.
MongoDB Atlas — fully managed cloud database service. Runs on AWS, GCP, or Azure. No infrastructure management — MongoDB Inc. handles backups, scaling, patching, and monitoring. Free tier available.
MongoDB Atlas Serverless — consumption-based pricing, scales to zero. Pay only for operations performed — ideal for intermittent workloads.

Summary Table

Concept	Detail	Key Point
Document	BSON record — nested, flexible	Core unit of storage in MongoDB
Collection	Group of documents	Equivalent to a SQL table
BSON	Binary JSON with richer types	Internal storage format — faster than plain JSON
ObjectId	12-byte unique identifier	Auto-generated, timestamp-embedded, globally unique
MQL	MongoDB Query Language	JSON-based queries — no SQL strings
Atlas	Managed cloud service	No infrastructure management — free tier available

Practice Questions

Practice 1. What does the name "MongoDB" derive from and what was its design goal?

Practice 2. What is the MongoDB equivalent of a SQL table?

Practice 3. What format does MongoDB use internally to store documents, and how does it differ from JSON?

Practice 4. What is an ObjectId and what information is embedded in it?

Practice 5. What syntax does MQL use to query a nested field such as the city inside an address sub-document?

Quiz

Quiz 1. In what year was MongoDB first released as open source?

2007
2009
2013
2016

Quiz 2. What is the unique identifier field automatically added to every MongoDB document?

_id
id
document_id
uuid

Quiz 3. What does the $lookup stage in MongoDB correspond to in SQL?

JOIN
WHERE
GROUP BY
UNION

Quiz 4. Which MongoDB edition is fully managed and runs on AWS, GCP, or Azure?

MongoDB Atlas
MongoDB Enterprise
MongoDB Community Server
MongoDB Serverless

Quiz 5. Why is SQL injection structurally impossible in MongoDB's MQL?

Because MQL queries are data structures (dictionaries), not strings — there is nothing to inject into
Because MongoDB automatically escapes all input
Because MongoDB does not support user input
Because MQL queries run in a sandbox

Next up — MongoDB Features & Use Cases: the full feature set and the real-world scenarios where MongoDB excels.

← Previous Course Index Next →