Mango DBLesson 1 – Introduction to Databases | Dataplexa

Introduction to Databases

Every application that stores information — a social media platform, a banking system, an e-commerce store, a hospital record system — relies on a database. A database is an organised collection of data stored and accessed electronically. Without databases, every time you closed an app, all your data would vanish. Databases are what make information persist, scale, and stay consistent across millions of users.

Before diving into MongoDB, you need a solid understanding of what databases are, why they exist, how they evolved, and the core concepts that underpin every database system you will ever work with.

What is a Database?

A database is more than just a place to store data. It is a structured system that allows you to store, retrieve, update, and delete data efficiently and reliably — even when thousands of users are accessing it simultaneously.

Why it exists: before databases, applications stored data in flat files — plain text or CSV files on disk. These worked for tiny amounts of data but fell apart quickly: no way to search efficiently, no protection against two users writing at the same time, no way to link related data, and no recovery if the file got corrupted.

Real-world use: when you log into Instagram, a database looks up your account. When you post a photo, a database stores it. When you like a post, a database records that interaction. Every tap, swipe, and search on every major app is powered by database operations happening in milliseconds.

Key Database Concepts

These terms appear in every database system — relational or otherwise — and are worth understanding from the start.

Data — raw facts and figures: a name, a price, a timestamp, a location coordinate
Information — data that has been processed and given meaning: "Alice spent $49.99 on 3rd March 2025"
Database — an organised collection of related data stored together
DBMS (Database Management System) — the software that manages the database: MySQL, PostgreSQL, MongoDB, Oracle. You interact with the DBMS, and it handles the data on disk.
Query — a request to retrieve or manipulate data. "Find all orders over $100 placed in the last 7 days" is a query.
Schema — the structure or blueprint of how data is organised: what fields exist, what types they are, what rules they follow

A Brief History of Databases

Understanding how databases evolved helps you understand why different types exist today and when to use each one.

1960s — File systems: data stored in flat files. No structure, no search, no relationships. Reading data meant reading the entire file.
1970s — Relational model: Edgar Codd at IBM proposed organising data into tables with rows and columns, connected by relationships. SQL was born. This became the dominant paradigm for the next 40 years.
1980s–90s — RDBMS dominance: Oracle, IBM DB2, Microsoft SQL Server, and later MySQL and PostgreSQL became the backbone of enterprise software.
2000s — The web explosion: the internet scaled to billions of users. Traditional relational databases struggled with the volume, velocity, and variety of web-scale data. New requirements emerged — flexible schemas, horizontal scaling, unstructured data.
2009 — NoSQL movement: new database types emerged to address web-scale challenges — document stores (MongoDB), key-value stores (Redis), column stores (Cassandra), graph databases (Neo4j). MongoDB was first released in 2009.
Today — Polyglot persistence: modern systems use multiple database types together. A single application might use PostgreSQL for financial records, MongoDB for user profiles, Redis for caching, and Elasticsearch for search.

Types of Databases

Databases are not one-size-fits-all. Different problems call for different database designs.

Relational (SQL) — data stored in tables with fixed schemas and relationships. Best for structured, consistent data with complex relationships. Examples: MySQL, PostgreSQL, Oracle, SQL Server.
Document — data stored as flexible JSON-like documents. Best for varied, nested, or rapidly changing data. Examples: MongoDB, CouchDB, Firestore.
Key-Value — data stored as simple key→value pairs. Extremely fast for lookups. Best for caching, sessions, counters. Examples: Redis, DynamoDB.
Column-Family — data stored in columns rather than rows. Best for analytical workloads and time-series data. Examples: Apache Cassandra, HBase.
Graph — data stored as nodes and edges. Best for relationship-heavy data like social networks, fraud detection. Examples: Neo4j, Amazon Neptune.
Time-Series — optimised for data indexed by time. Best for metrics, IoT sensors, financial ticks. Examples: InfluxDB, TimescaleDB.
Search Engines — optimised for full-text search. Examples: Elasticsearch, Solr.

The ACID Properties

ACID is the set of guarantees that make a database transaction reliable. Understanding ACID tells you what a database promises to protect your data.

Atomicity — a transaction is all-or-nothing. If you transfer $500 from account A to account B, either both the debit and credit happen, or neither does. No half-transactions.
Consistency — a transaction brings the database from one valid state to another. Rules and constraints are never violated. You cannot end up with a negative bank balance if the rules forbid it.
Isolation — concurrent transactions do not interfere with each other. Two users booking the last seat on a flight at the same time will not both succeed.
Durability — once a transaction is committed, it is permanent. Even if the server crashes immediately after, the data is not lost.

The CAP Theorem

The CAP theorem states that a distributed database can guarantee at most two of the following three properties at the same time — never all three.

Consistency — every read receives the most recent write
Availability — every request receives a response (not necessarily the most recent data)
Partition Tolerance — the system continues operating even when network partitions occur between nodes

Since network partitions are inevitable in distributed systems, real systems choose between CP (consistency + partition tolerance — may be unavailable during partition) or AP (availability + partition tolerance — may return stale data during partition). MongoDB is configurable — it leans CP by default but can be tuned toward AP.

How a DBMS Works — The Big Picture

When your application queries a database, a lot happens behind the scenes.

Query parser — receives the query string and checks syntax
Query optimiser — finds the most efficient way to execute the query, deciding whether to use an index, which order to join tables, etc.
Storage engine — reads and writes data to/from disk in an efficient format
Buffer pool / cache — keeps recently accessed data in memory to avoid slow disk reads
Transaction manager — ensures ACID properties are maintained during concurrent access
Lock manager — controls which transactions can read/write specific data simultaneously

Why MongoDB?

MongoDB fits a specific set of problems exceptionally well — and understanding those problems helps you know when to reach for it.

Flexible schema — documents in the same collection can have different fields. Perfect when your data structure evolves rapidly during development or varies per record.
JSON-native — data is stored as BSON (Binary JSON). Your application objects map directly to documents — no complex translation layer between code and storage.
Horizontal scaling — MongoDB scales out by distributing data across many servers (sharding), not just up by buying a bigger server.
Developer speed — the document model matches how developers think and how APIs send data. Less time mapping, more time building.
Rich queries — despite being a NoSQL database, MongoDB supports complex queries, aggregation pipelines, geospatial queries, and full-text search.

Summary Table

Concept	What It Means	Why It Matters
Database	Organised collection of persistent data	Foundation of every application that stores state
DBMS	Software managing the database	Handles storage, retrieval, concurrency, and recovery
Schema	Blueprint of data structure	Defines what data is valid and how it is organised
ACID	Atomicity, Consistency, Isolation, Durability	Guarantees data integrity during transactions
CAP Theorem	Consistency, Availability, Partition Tolerance — pick two	Explains trade-offs in distributed database design
NoSQL	Non-relational databases (document, key-value, graph…)	Built for flexibility, scale, and unstructured data

Practice Questions

Practice 1. What is a DBMS and how does it differ from a database?

Practice 2. What does the A in ACID stand for and what does it mean?

Practice 3. Which database type is best suited for relationship-heavy data like social networks?

Practice 4. What format does MongoDB use to store data?

Practice 5. What does the CAP theorem state about distributed databases?

Quiz

Quiz 1. Which decade saw the birth of the relational database model?

1960s
1970s
1980s
2000s

Quiz 2. What does Durability in ACID guarantee?

Once a transaction is committed it is permanent even if the server crashes
Data is never deleted from disk
Concurrent transactions never conflict
All reads return the same value

Quiz 3. Which component of a DBMS decides the most efficient way to execute a query?

Query optimiser
Storage engine
Lock manager
Buffer pool

Quiz 4. What is polyglot persistence?

Using multiple different database types together in one application
Storing data in multiple languages
A database that supports SQL and NoSQL queries
Replicating data across multiple servers

Quiz 5. Which property of MongoDB makes it particularly developer-friendly compared to relational databases?

Flexible schema — documents in the same collection can have different fields
It uses SQL for all queries
It only runs in the cloud
It stores data in CSV format

Next up — SQL vs NoSQL Databases: understanding the fundamental differences, trade-offs, and when to choose each.

Course Index Next →