MongoDB
Introduction to Databases
Every application that stores information — a social media platform, a banking system, an e-commerce store, a hospital record system — relies on a database. A database is an organised collection of data stored and accessed electronically. Without databases, every time you closed an app, all your data would vanish. Databases are what make information persist, scale, and stay consistent across millions of users.
Before diving into MongoDB, you need a solid understanding of what databases are, why they exist, how they evolved, and the core concepts that underpin every database system you will ever work with.
What is a Database?
A database is more than just a place to store data. It is a structured system that allows you to store, retrieve, update, and delete data efficiently and reliably — even when thousands of users are accessing it simultaneously.
Why it exists: before databases, applications stored data in flat files — plain text or CSV files on disk. These worked for tiny amounts of data but fell apart quickly: no way to search efficiently, no protection against two users writing at the same time, no way to link related data, and no recovery if the file got corrupted.
Real-world use: when you log into Instagram, a database looks up your account. When you post a photo, a database stores it. When you like a post, a database records that interaction. Every tap, swipe, and search on every major app is powered by database operations happening in milliseconds.
Key Database Concepts
These terms appear in every database system — relational or otherwise — and are worth understanding from the start.
- Data — raw facts and figures: a name, a price, a timestamp, a location coordinate
- Information — data that has been processed and given meaning: "Alice spent $49.99 on 3rd March 2025"
- Database — an organised collection of related data stored together
- DBMS (Database Management System) — the software that manages the database: MySQL, PostgreSQL, MongoDB, Oracle. You interact with the DBMS, and it handles the data on disk.
- Query — a request to retrieve or manipulate data. "Find all orders over $100 placed in the last 7 days" is a query.
- Schema — the structure or blueprint of how data is organised: what fields exist, what types they are, what rules they follow
A Brief History of Databases
Understanding how databases evolved helps you understand why different types exist today and when to use each one.
- 1960s — File systems: data stored in flat files. No structure, no search, no relationships. Reading data meant reading the entire file.
- 1970s — Relational model: Edgar Codd at IBM proposed organising data into tables with rows and columns, connected by relationships. SQL was born. This became the dominant paradigm for the next 40 years.
- 1980s–90s — RDBMS dominance: Oracle, IBM DB2, Microsoft SQL Server, and later MySQL and PostgreSQL became the backbone of enterprise software.
- 2000s — The web explosion: the internet scaled to billions of users. Traditional relational databases struggled with the volume, velocity, and variety of web-scale data. New requirements emerged — flexible schemas, horizontal scaling, unstructured data.
- 2009 — NoSQL movement: new database types emerged to address web-scale challenges — document stores (MongoDB), key-value stores (Redis), column stores (Cassandra), graph databases (Neo4j). MongoDB was first released in 2009.
- Today — Polyglot persistence: modern systems use multiple database types together. A single application might use PostgreSQL for financial records, MongoDB for user profiles, Redis for caching, and Elasticsearch for search.
Types of Databases
Databases are not one-size-fits-all. Different problems call for different database designs.
- Relational (SQL) — data stored in tables with fixed schemas and relationships. Best for structured, consistent data with complex relationships. Examples: MySQL, PostgreSQL, Oracle, SQL Server.
- Document — data stored as flexible JSON-like documents. Best for varied, nested, or rapidly changing data. Examples: MongoDB, CouchDB, Firestore.
- Key-Value — data stored as simple key→value pairs. Extremely fast for lookups. Best for caching, sessions, counters. Examples: Redis, DynamoDB.
- Column-Family — data stored in columns rather than rows. Best for analytical workloads and time-series data. Examples: Apache Cassandra, HBase.
- Graph — data stored as nodes and edges. Best for relationship-heavy data like social networks, fraud detection. Examples: Neo4j, Amazon Neptune.
- Time-Series — optimised for data indexed by time. Best for metrics, IoT sensors, financial ticks. Examples: InfluxDB, TimescaleDB.
- Search Engines — optimised for full-text search. Examples: Elasticsearch, Solr.
The ACID Properties
ACID is the set of guarantees that make a database transaction reliable. Understanding ACID tells you what a database promises to protect your data.
- Atomicity — a transaction is all-or-nothing. If you transfer $500 from account A to account B, either both the debit and credit happen, or neither does. No half-transactions.
- Consistency — a transaction brings the database from one valid state to another. Rules and constraints are never violated. You cannot end up with a negative bank balance if the rules forbid it.
- Isolation — concurrent transactions do not interfere with each other. Two users booking the last seat on a flight at the same time will not both succeed.
- Durability — once a transaction is committed, it is permanent. Even if the server crashes immediately after, the data is not lost.
The CAP Theorem
The CAP theorem states that a distributed database can guarantee at most two of the following three properties at the same time — never all three.
- Consistency — every read receives the most recent write
- Availability — every request receives a response (not necessarily the most recent data)
- Partition Tolerance — the system continues operating even when network partitions occur between nodes
Since network partitions are inevitable in distributed systems, real systems choose between CP (consistency + partition tolerance — may be unavailable during partition) or AP (availability + partition tolerance — may return stale data during partition). MongoDB is configurable — it leans CP by default but can be tuned toward AP.
How a DBMS Works — The Big Picture
When your application queries a database, a lot happens behind the scenes.
- Query parser — receives the query string and checks syntax
- Query optimiser — finds the most efficient way to execute the query, deciding whether to use an index, which order to join tables, etc.
- Storage engine — reads and writes data to/from disk in an efficient format
- Buffer pool / cache — keeps recently accessed data in memory to avoid slow disk reads
- Transaction manager — ensures ACID properties are maintained during concurrent access
- Lock manager — controls which transactions can read/write specific data simultaneously
Why MongoDB?
MongoDB fits a specific set of problems exceptionally well — and understanding those problems helps you know when to reach for it.
- Flexible schema — documents in the same collection can have different fields. Perfect when your data structure evolves rapidly during development or varies per record.
- JSON-native — data is stored as BSON (Binary JSON). Your application objects map directly to documents — no complex translation layer between code and storage.
- Horizontal scaling — MongoDB scales out by distributing data across many servers (sharding), not just up by buying a bigger server.
- Developer speed — the document model matches how developers think and how APIs send data. Less time mapping, more time building.
- Rich queries — despite being a NoSQL database, MongoDB supports complex queries, aggregation pipelines, geospatial queries, and full-text search.
Summary Table
| Concept | What It Means | Why It Matters |
|---|---|---|
| Database | Organised collection of persistent data | Foundation of every application that stores state |
| DBMS | Software managing the database | Handles storage, retrieval, concurrency, and recovery |
| Schema | Blueprint of data structure | Defines what data is valid and how it is organised |
| ACID | Atomicity, Consistency, Isolation, Durability | Guarantees data integrity during transactions |
| CAP Theorem | Consistency, Availability, Partition Tolerance — pick two | Explains trade-offs in distributed database design |
| NoSQL | Non-relational databases (document, key-value, graph…) | Built for flexibility, scale, and unstructured data |
Practice Questions
Practice 1. What is a DBMS and how does it differ from a database?
Practice 2. What does the A in ACID stand for and what does it mean?
Practice 3. Which database type is best suited for relationship-heavy data like social networks?
Practice 4. What format does MongoDB use to store data?
Practice 5. What does the CAP theorem state about distributed databases?
Quiz
Quiz 1. Which decade saw the birth of the relational database model?
Quiz 2. What does Durability in ACID guarantee?
Quiz 3. Which component of a DBMS decides the most efficient way to execute a query?
Quiz 4. What is polyglot persistence?
Quiz 5. Which property of MongoDB makes it particularly developer-friendly compared to relational databases?
Next up — SQL vs NoSQL Databases: understanding the fundamental differences, trade-offs, and when to choose each.