Mango DBLesson 3 – What is MongoDB | Dataplexa

What is MongoDB

MongoDB is the world's most popular NoSQL database and one of the most widely used databases overall. It stores data as flexible, JSON-like documents rather than rigid rows in tables — making it a natural fit for modern application development where data structures evolve rapidly and scale requirements are enormous. The name comes from "humongous" — a nod to its design goal of handling massive amounts of data.

This lesson covers MongoDB's origin, its core architecture, the document model, how it compares to traditional databases conceptually, and why it has become the database of choice for startups and enterprises alike.

The Origin of MongoDB

MongoDB was created by Dwight Merriman, Eliot Horowitz, and Kevin Ryan — the same team behind DoubleClick, one of the internet's largest advertising platforms. While building DoubleClick, they experienced first-hand the limitations of relational databases at web scale — painful schema migrations, difficulty scaling horizontally, and a mismatch between the object-oriented code they wrote and the tables they had to store it in.

In 2007 they started 10gen (later renamed MongoDB Inc.) to build the database they wished they had. MongoDB was first released as open source in 2009. By 2023 it had over 40,000 customers in more than 100 countries, and MongoDB Inc. became a publicly traded company on NASDAQ.

  • 2007 — development begins at 10gen
  • 2009 — first open-source release
  • 2013 — company renamed MongoDB Inc.
  • 2016 — MongoDB Atlas (fully managed cloud service) launched
  • 2017 — IPO on NASDAQ
  • 2022 — MongoDB 6.0 released with time-series collections and cluster-to-cluster sync

The Document Model

The central idea in MongoDB is the document. A document is a record stored in BSON format (Binary JSON) — essentially a rich JSON object that can contain strings, numbers, booleans, arrays, nested objects, dates, and more. Documents map directly to objects in your programming language, eliminating the translation layer between application code and the database.

Why it exists: in a relational database, a user with multiple addresses, multiple orders, and multiple preferences requires four or five tables and complex JOINs just to reassemble one user object. In MongoDB, one document holds everything — the way you naturally think about a user in your code.

Real-world use: MongoDB powers the user profiles of Forbes, the content management of The Guardian newspaper, the data platform of Adobe, and the backend of EA Games — all cases where rich, varied documents outperform rigid tables.

# A MongoDB document — rich, nested, self-contained

user_document = {
    "_id": "64a1f2e3b4c5d6e7f8a9b0c1",  # unique identifier (ObjectId)
    "name": "Alice Johnson",
    "email": "alice@example.com",
    "age": 30,
    "verified": True,
    "created_at": "2024-01-15T09:30:00Z",
    "address": {                            # embedded sub-document
        "street": "42 Baker Street",
        "city": "London",
        "country": "UK",
        "postcode": "W1U 6RY"
    },
    "tags": ["premium", "early_adopter"],   # array of strings
    "orders": [                             # array of sub-documents
        {
            "order_id": "ORD-001",
            "product": "Laptop",
            "amount": 999.99,
            "status": "delivered"
        },
        {
            "order_id": "ORD-002",
            "product": "Mouse",
            "amount": 29.99,
            "status": "shipped"
        }
    ],
    "settings": {                           # flexible nested config
        "theme": "dark",
        "language": "en-GB",
        "notifications": True
    }
}

print(f"User: {user_document['name']}")
print(f"City: {user_document['address']['city']}")
print(f"Orders: {len(user_document['orders'])}")
print(f"Tags: {user_document['tags']}")
User: Alice Johnson
City: London
Orders: 2
Tags: ['premium', 'early_adopter']
  • Every document has a unique _id field — automatically generated as an ObjectId if not provided
  • Documents can nest objects and arrays to any depth — no artificial flattening into tables
  • The document structure mirrors how developers think about objects in their code
  • Two documents in the same collection can have completely different fields — the schema is flexible

Core MongoDB Terminology

MongoDB uses its own vocabulary. Understanding the mapping between MongoDB terms and relational database terms makes everything click faster.

SQL Term MongoDB Term Description
Database Database Top-level namespace containing collections
Table Collection Group of documents (like a table but schema-free)
Row Document A single record stored as BSON
Column Field A key-value pair inside a document
Primary Key _id field Unique identifier for every document
JOIN $lookup Aggregation stage to join collections
Index Index Same concept — speeds up queries

How MongoDB Stores Data — BSON

While you write and read data as JSON, MongoDB stores it internally as BSON (Binary JSON). BSON extends JSON with additional data types — dates, binary data, 64-bit integers, and the ObjectId type — and is designed for efficient encoding and decoding.

# BSON data types — richer than plain JSON

from bson import ObjectId
from datetime import datetime

# ObjectId — 12-byte unique identifier generated automatically
doc_id = ObjectId()
print("ObjectId:", doc_id)
print("Timestamp embedded in ObjectId:", doc_id.generation_time)

# A BSON document with rich data types
bson_document = {
    "_id":        ObjectId(),                    # BSON ObjectId (not just a string)
    "name":       "Alice Johnson",               # String
    "age":        30,                            # Int32
    "balance":    9999.99,                       # Double
    "verified":   True,                          # Boolean
    "created_at": datetime(2024, 1, 15, 9, 30), # BSON Date (not a string)
    "scores":     [95, 87, 92],                  # Array
    "address":    {"city": "London"},            # Embedded document
    "avatar":     None                           # Null
}

print("\nDocument fields:")
for key, value in bson_document.items():
    print(f"  {key}: {type(value).__name__} = {value}")
ObjectId: 64a1f2e3b4c5d6e7f8a9b0c1
Timestamp embedded in ObjectId: 2024-01-15 09:30:00+00:00

Document fields:
_id: ObjectId = 64a1f2e3b4c5d6e7f8a9b0c1
name: str = Alice Johnson
age: int = 30
balance: float = 9999.99
verified: bool = True
created_at: datetime = 2024-01-15 09:30:00
scores: list = [95, 87, 92]
address: dict = {'city': 'London'}
avatar: NoneType = None
  • ObjectId is 12 bytes: 4-byte timestamp + 5-byte random + 3-byte incrementing counter — globally unique without a central coordinator
  • The timestamp embedded in ObjectId means documents are implicitly sortable by creation time
  • BSON Date stores milliseconds since epoch — far more reliable than storing dates as strings
  • BSON is faster to encode/decode than plain JSON and supports richer types

The MongoDB Query Language (MQL)

MongoDB uses its own query language — MQL — expressed as JSON-like objects rather than SQL strings. This keeps queries consistent with the document model and easy to build programmatically.

# MQL vs SQL — same questions, different syntax

# SQL:  SELECT * FROM users WHERE age > 25 AND city = 'London'
# MQL:
mql_find = {
    "age":              {"$gt": 25},
    "address.city":     "London"       # dot notation for nested fields
}

# SQL:  SELECT name, email FROM users WHERE verified = true ORDER BY age DESC LIMIT 5
# MQL:
mql_query  = {"verified": True}
mql_project = {"name": 1, "email": 1, "_id": 0}   # 1 = include, 0 = exclude
mql_sort   = [("age", -1)]    # -1 = descending
mql_limit  = 5

# SQL:  UPDATE users SET age = 31 WHERE name = 'Alice'
# MQL:
mql_update_filter = {"name": "Alice"}
mql_update_op     = {"$set": {"age": 31}}

# SQL:  INSERT INTO users (name, email) VALUES ('Bob', 'bob@example.com')
# MQL:
mql_insert = {"name": "Bob", "email": "bob@example.com"}

print("MQL queries are Python dictionaries — no string parsing, no SQL injection risk.")
print("Queries are built programmatically — easy to compose dynamically in code.")
MQL queries are Python dictionaries — no string parsing, no SQL injection risk.
Queries are built programmatically — easy to compose dynamically in code.
  • MQL queries are JSON objects — they compose naturally in code without string concatenation
  • Dot notation ("address.city") queries nested fields directly — no joins or aliases needed
  • Operators like $gt, $lt, $in, $set follow a consistent $-prefix convention
  • Because queries are data structures, not strings, SQL injection is structurally impossible in MQL

MongoDB Editions

MongoDB is available in several editions to suit different needs and budgets.

  • MongoDB Community Server — free, open-source, self-hosted. All core features. The right choice for development, small projects, and learning.
  • MongoDB Enterprise — paid, self-hosted. Adds advanced security (LDAP, Kerberos), encrypted storage engine, audit logging, and commercial support.
  • MongoDB Atlas — fully managed cloud database service. Runs on AWS, GCP, or Azure. No infrastructure management — MongoDB Inc. handles backups, scaling, patching, and monitoring. Free tier available.
  • MongoDB Atlas Serverless — consumption-based pricing, scales to zero. Pay only for operations performed — ideal for intermittent workloads.

Summary Table

Concept Detail Key Point
Document BSON record — nested, flexible Core unit of storage in MongoDB
Collection Group of documents Equivalent to a SQL table
BSON Binary JSON with richer types Internal storage format — faster than plain JSON
ObjectId 12-byte unique identifier Auto-generated, timestamp-embedded, globally unique
MQL MongoDB Query Language JSON-based queries — no SQL strings
Atlas Managed cloud service No infrastructure management — free tier available

Practice Questions

Practice 1. What does the name "MongoDB" derive from and what was its design goal?



Practice 2. What is the MongoDB equivalent of a SQL table?



Practice 3. What format does MongoDB use internally to store documents, and how does it differ from JSON?



Practice 4. What is an ObjectId and what information is embedded in it?



Practice 5. What syntax does MQL use to query a nested field such as the city inside an address sub-document?



Quiz

Quiz 1. In what year was MongoDB first released as open source?






Quiz 2. What is the unique identifier field automatically added to every MongoDB document?






Quiz 3. What does the $lookup stage in MongoDB correspond to in SQL?






Quiz 4. Which MongoDB edition is fully managed and runs on AWS, GCP, or Azure?






Quiz 5. Why is SQL injection structurally impossible in MongoDB's MQL?






Next up — MongoDB Features & Use Cases: the full feature set and the real-world scenarios where MongoDB excels.