PostgreSQL
PostgreSQL Architecture
Most people use PostgreSQL without ever thinking about what is happening underneath. That is fine for writing basic queries — but once you start optimising performance, debugging slow queries, or designing production systems, understanding the architecture becomes essential. This lesson walks you through how PostgreSQL is structured internally, how it processes your queries, and how all the pieces fit together.
The Big Picture
PostgreSQL follows a client-server architecture. The server runs continuously in the background, managing data and waiting for connections. Clients — whether your application, psql, or pgAdmin — connect to the server, send SQL queries, and receive results back. The client and server can be on the same machine or on completely different machines across a network.
Every time a client connects, PostgreSQL spawns a dedicated backend process just for that connection. This process handles all queries from that client until the connection closes. This design means each connection is fully isolated — one slow query from one user does not directly interfere with another user's connection.
The Postmaster — The Master Process
When PostgreSQL starts, the first thing that runs is the postmaster process (also called postgres in modern versions). This is the master supervisor process. Its job is to listen for incoming connections on port 5432 and spawn a new backend process for each one. It also monitors all running processes and restarts any that crash unexpectedly.
You never interact with the postmaster directly — it works silently in the background managing everything. Think of it as the front desk of a hotel. You walk in (connect), the front desk assigns you a room (backend process), and from that point your room handles everything for your stay.
Background Worker Processes
Beyond the postmaster and backend processes, PostgreSQL runs several specialised background workers that keep the database healthy and performant at all times.
WAL Writer continuously writes changes to the Write-Ahead Log — a durability mechanism that ensures no committed data is ever lost even if the server crashes. Before any change is applied to the actual data files, it is written to the WAL first.
Checkpointer periodically flushes dirty data pages from memory to disk. This keeps the database files up to date and limits how much WAL needs to be replayed during crash recovery.
Autovacuum is one of PostgreSQL's most important background workers. When rows are updated or deleted in PostgreSQL, the old versions are not immediately removed — they are marked as dead. Autovacuum runs quietly in the background, cleaning up these dead rows and reclaiming space. Without it, tables would grow endlessly and queries would slow down.
Background Writer writes shared buffer pages to disk proactively, reducing the work the checkpointer needs to do in big bursts.
Stats Collector tracks detailed statistics about table usage, query activity, and index hits. These statistics are what the query planner uses to make smart decisions about how to execute your queries efficiently.
Shared Memory — The Shared Buffer Cache
PostgreSQL allocates a block of shared memory when it starts, and the most important part of this is the shared buffer cache. When PostgreSQL reads data from disk, it keeps a copy in this cache. The next time the same data is needed, it comes from memory instead of disk — which is orders of magnitude faster.
The size of the shared buffer cache is controlled by the shared_buffers configuration parameter. By default it is set conservatively low (128MB). For a production server, the recommended value is around 25% of total RAM. A database that fits largely in the buffer cache will be dramatically faster than one that constantly reads from disk.
The Query Lifecycle — From SQL to Result
Every query you write goes through a precise pipeline inside PostgreSQL before results come back to you. Understanding this pipeline helps you write better SQL and understand why some queries are fast and others are slow.
Step 1 — Parser: Your SQL text is received and checked for syntax errors. If you have a typo or invalid SQL, the error is caught here before anything else happens. The parser converts your SQL into a parse tree — an internal structured representation of what you asked for.
Step 2 — Rewriter: The parse tree is passed through the rule system. If you are querying a view, the rewriter replaces the view reference with the actual underlying query. Rules and view definitions are applied at this stage.
Step 3 — Planner / Optimizer: This is one of the most sophisticated parts of PostgreSQL. The planner looks at your query and considers dozens of possible ways to execute it — which indexes to use, which order to join tables, whether to scan a full table or use an index. It uses the statistics collected by the Stats Collector to estimate the cost of each approach and picks the most efficient plan. The quality of this plan is what separates a fast query from a slow one.
Step 4 — Executor: The chosen plan is handed to the executor, which actually runs the operations — reading rows, applying filters, joining tables, sorting, aggregating — and collects the final result rows to return to the client.
-- You can see the query plan PostgreSQL chooses using EXPLAIN
-- This shows you exactly what the planner decided to do
EXPLAIN SELECT * FROM customers WHERE city = 'New York';
Filter: ((city)::text = 'New York'::text)
This output tells you PostgreSQL chose a sequential scan (reading every row) because the table is small. On a large table with an index on city, it would choose an Index Scan instead — far faster. You will use EXPLAIN regularly from Lesson 36 onwards when we cover query optimisation.
The Write-Ahead Log (WAL)
The Write-Ahead Log is PostgreSQL's crash recovery mechanism and one of the most important parts of its architecture. Every change to the database — every INSERT, UPDATE, DELETE — is first written to the WAL before it touches the actual data files. This is the "write-ahead" part: the log is always written before the data.
If the server crashes mid-operation, PostgreSQL replays the WAL on startup to recover any changes that were committed but not yet fully written to the data files. This is how PostgreSQL guarantees the Durability part of ACID — committed data survives crashes.
The WAL also powers replication — streaming the WAL to standby servers in real time is how PostgreSQL replication works, keeping replica databases in sync with the primary.
Physical Storage — How Data Lives on Disk
PostgreSQL stores all its data in a directory called the data directory (also called PGDATA). Inside this directory, each database has its own subdirectory. Each table and index is stored as one or more files on disk called heap files. PostgreSQL divides these files into fixed-size pages — each page is 8KB by default.
When PostgreSQL reads or writes data, it works in page-sized chunks. Loading a single row from a large table still requires loading the entire 8KB page that contains it. This is why good indexing matters so much — an index lets PostgreSQL load only the pages it actually needs instead of scanning every page in the table.
Architecture at a Glance
| Component | Role | Why It Matters |
|---|---|---|
| Postmaster | Master process, manages connections | Entry point for every client connection |
| Backend process | Dedicated process per connection | Isolates each client session |
| Shared buffer cache | In-memory data cache | Avoids slow disk reads for hot data |
| WAL | Change log written before data files | Crash recovery and replication |
| Query planner | Chooses the most efficient execution plan | Difference between fast and slow queries |
| Autovacuum | Cleans up dead rows automatically | Keeps tables lean and queries fast |
| Data pages | 8KB storage units on disk | Unit of I/O — indexes reduce pages read |
🧪 Practice Questions
Answer based on what you learned in this lesson.
1. What is the name of the master PostgreSQL process that listens for connections and spawns backend processes?
2. What three-letter abbreviation refers to the log that PostgreSQL writes before making changes to data files?
3. Which background process cleans up dead rows left behind by updates and deletes?
4. What PostgreSQL configuration parameter controls the size of the shared buffer cache?
5. Which SQL command shows you the execution plan PostgreSQL chose for a query?
🎯 Quiz — Test Your Understanding
Q1. What does the postmaster do when a new client connects?
Q2. Which stage of the query lifecycle decides which indexes to use and how to join tables?
Q3. What is the default size of a PostgreSQL data page on disk?
Q4. What are the two main purposes of the Write-Ahead Log?
Q5. What is the recommended size for shared_buffers on a production PostgreSQL server?
Next up: The tools you will use every day — psql and pgAdmin explored in full.