MS SQL Lesson 6 – SQL Server Architecture | Dataplexa

SQL Server Architecture

Every time you type a query and press Execute, something remarkable happens inside SQL Server. Your plain English-like text travels through five distinct internal stages — getting parsed, planned, optimised, executed, and returned — all within milliseconds. Understanding this journey is not just theoretical. It explains why some queries are instant and others crawl, why more RAM directly improves performance, why certain index choices matter enormously, and how SQL Server can recover a consistent database even after a sudden power failure. This lesson is the blueprint of the engine you will be working with for the rest of this course.

We will use the DataplexaStore dataset throughout every lesson in this course. Download it below and run it against your SQL Server 2022 Developer Edition instance before continuing.

The Big Picture — Five Layers of the Engine

SQL Server is not one monolithic program — it is a layered system where each layer has a single, clear responsibility. The outermost layer receives your connection and speaks the network protocol. The next layer translates your SQL text into an internal logical structure. The optimiser turns that structure into an efficient physical execution plan. The storage engine carries out the plan by reading data from disk or memory. And underneath everything, the transaction log records every change before it happens, making recovery possible. These five layers work in sequence on every single query, and understanding each one unlocks a completely different category of performance tuning skill.

-- Check the version, edition, and uptime of your SQL Server instance
-- This is always the first query to run on any server you connect to

SELECT
    @@VERSION                                          AS full_version,
    SERVERPROPERTY('Edition')                          AS edition,
    SERVERPROPERTY('EngineEdition')                   AS engine_edition_code,
    sqlserver_start_time                               AS service_started,
    DATEDIFF(HOUR, sqlserver_start_time, GETDATE())   AS uptime_hours
FROM sys.dm_os_sys_info;
full_version edition engine_edition_code service_started uptime_hours
──────────────────────────────────────────────────────────────────── ───────────────── ─────────────────── ─────────────────────── ────────────
Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 Developer Edition (64-bit) Developer Edition 3 2024-03-28 09:14:22.000 87
  • EngineEdition 3 confirms Developer Edition, which is technically identical to Enterprise — every lesson in this course works without restrictions
  • sys.dm_os_sys_info is a Dynamic Management View — SQL Server's built-in window into its own running state, covered in depth in the Monitoring lesson
  • uptime_hours reveals how long SQL Server has been running since its last restart — longer uptime usually means a warmer, faster Buffer Pool

Layer 1 — The Protocol Layer

When your application sends a query, it does not teleport directly into the SQL Server engine. It travels over the network using a Microsoft protocol called TDS (Tabular Data Stream) — the same protocol used since the very first version of SQL Server. TDS wraps your T-SQL text in a binary packet and delivers it to SQL Server's Protocol Layer, which unpacks it and hands the raw text to the next stage. SSMS, Python's pyodbc, .NET's SqlClient, and every other SQL Server client all speak TDS. The default port is 1433 for the default instance, and named instances use dynamically assigned ports resolved by the SQL Server Browser service. The query below shows all active connections and which protocol each one is using.

-- View all active connections to this SQL Server instance
-- Each row is one open session — your own session appears as @@SPID

SELECT
    session_id,                    -- unique ID for this connection (your SPID)
    login_name,                    -- who connected
    host_name,                     -- which machine they connected from
    program_name,                  -- which client they used (SSMS, pyodbc...)
    net_transport,                  -- TCP, Named Pipes, Shared Memory
    client_net_address,            -- IP address of the connecting client
    connect_time,                  -- when this session was established
    status                         -- running, sleeping, background
FROM sys.dm_exec_sessions s
JOIN sys.dm_exec_connections c ON s.session_id = c.session_id
WHERE s.is_user_process = 1       -- exclude SQL Server's own internal sessions
ORDER BY connect_time DESC;
session_id login_name host_name program_name net_transport client_net_address connect_time status
────────── ────────────────────── ────────────── ──────────────────────────── ───────────── ────────────────── ─────────────────────── ───────
53 DATAPLEXA-DEV\Student DATAPLEXA-DEV Microsoft SQL Server Mngmt Shared Memory <local machine> 2024-03-28 11:42:18.000 running
52 DATAPLEXA-DEV\Student DATAPLEXA-DEV pyodbc TCP 127.0.0.1 2024-03-28 11:40:05.000 sleeping
  • Shared Memory is the fastest transport — it bypasses the network entirely and is used when both client and server are on the same machine (typical for SSMS during development)
  • TCP is used for all network connections — any application running on a different machine connects this way
  • @@SPID returns your current session_id — the number you would give a DBA who needs to identify or terminate your specific connection

Layer 2 — Parsing and Binding

Once the Protocol Layer delivers the raw T-SQL text, the Parser reads it character by character and checks whether it is syntactically valid T-SQL. A missing comma, an unclosed parenthesis, or a misspelled keyword causes the Parser to reject the query immediately with an error message — the query never touches the database. If the syntax is valid, the Parser hands off to the Algebrizer (also called the Binder), which resolves every name in the query to a real database object. It checks that every table exists, every column belongs to the table referenced, and that data types are compatible. Only after both stages succeed does the query move forward to the Optimiser. This is why a query referencing a non-existent column fails even before a single data page is read.

-- Demonstrate Parser vs Algebrizer errors — two different failure modes

-- This fails at the PARSER stage (syntax error — missing parenthesis)
-- SQL Server rejects it before even looking at the database
SELECT ProductName Price          -- missing comma causes a parse error
FROM Products;

-- This passes the Parser but fails at the ALGEBRIZER stage
-- Syntax is valid T-SQL, but 'ProductCost' column does not exist
SELECT ProductName, ProductCost   -- ProductCost is not a column in Products
FROM Products;

-- This passes both stages and reaches the Optimiser
SELECT ProductName, Price         -- correct table, correct columns
FROM Products
WHERE Category = 'Electronics';
-- Query 1 (Parser error):
Msg 102, Level 15, State 1
Incorrect syntax near 'Price'.

-- Query 2 (Algebrizer error):
Msg 207, Level 16, State 1
Invalid column name 'ProductCost'.

-- Query 3 (success — reaches Optimiser):
ProductName Price
─────────────────── ─────
Wireless Mouse 29.99
Mechanical Keyboard 89.99
USB-C Hub 49.99
Webcam HD 79.99

(4 rows affected)
  • Parser errors (Msg 102) mean the T-SQL text itself is not grammatically valid — check for missing commas, brackets, or keywords
  • Algebrizer errors (Msg 207, 208) mean the syntax is fine but a named object does not exist — check table names, column names, and schema prefixes
  • Both types of error are caught before any data is read — they cost essentially no server resources to fail

Layer 3 — The Query Optimiser

The Query Optimiser is the most sophisticated component in SQL Server and the one that has the greatest impact on performance. It receives the logical query tree from the Algebrizer and must decide how to physically execute it. For even a moderately complex query there can be thousands of possible execution plans — different join orders, different join algorithms, different index choices. The Optimiser evaluates these candidates using statistics — histograms SQL Server builds automatically describing the distribution of values in each column — and assigns a cost estimate to each plan. The plan with the lowest estimated cost wins and is compiled into bytecode called an execution plan. To avoid repeating this expensive work, the chosen plan is cached in the Plan Cache so the next identical query reuses it instantly. You can view the plan SQL Server chose for any query using the Show Actual Execution Plan button in SSMS, or by querying the DMV below.

-- View the most expensive cached query plans on this server
-- total_logical_reads reveals which queries hit disk the most

SELECT TOP 10
    qs.execution_count,                                -- how many times this plan has been used
    qs.total_logical_reads,                            -- total 8 KB pages read across all executions
    qs.total_logical_reads / qs.execution_count
        AS avg_logical_reads,                          -- average pages read per execution
    qs.total_elapsed_time / qs.execution_count / 1000
        AS avg_duration_ms,                            -- average duration in milliseconds
    SUBSTRING(qt.text, 1, 100)
        AS query_text                                  -- first 100 characters of the cached query
FROM   sys.dm_exec_query_stats     qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt   -- join to get the actual query text
ORDER  BY qs.total_logical_reads DESC;                 -- worst offenders first
execution_count total_logical_reads avg_logical_reads avg_duration_ms query_text
─────────────── ─────────────────── ───────────────── ─────────────── ──────────────────────────────────────────────────────────────────
847 12483 14 2 SELECT ProductName, Price FROM Products WHERE Category = 'Electr
312 8821 28 5 SELECT o.OrderID, u.FullName FROM Orders o JOIN Users u ON o.User
156 4102 26 4 SELECT * FROM Reviews WHERE ProductID = 1
89 2341 26 3 UPDATE Products SET StockQty = StockQty - 1 WHERE ProductID = 1
44 1823 41 8 SELECT UserID, FullName FROM Users WHERE MembershipTier = 'Premium'
  • avg_logical_reads is the key metric — high reads on a simple query almost always means a missing index that would let SQL Server jump directly to the data instead of scanning the whole table
  • The Plan Cache stores compiled plans in memory — the first execution pays the Optimiser cost, every subsequent identical execution reuses the cached plan for free
  • Statistics go stale as data changes — SQL Server updates them automatically but you can force an update with UPDATE STATISTICS tablename when a plan suddenly degrades after a large data load

Layer 4 — The Storage Engine and Buffer Pool

After the Optimiser produces a plan, the Storage Engine carries it out. Its first responsibility is finding the data. SQL Server stores all data in 8 KB blocks called pages, grouped into 64 KB clusters called extents. Rather than reading from disk every time, the Storage Engine checks the Buffer Pool first — a region of RAM where recently accessed pages are held in memory. A page found in the Buffer Pool is a logical read (fast, memory access). A page that must be fetched from disk is a physical read (slow, disk access). SQL Server tries to keep frequently accessed pages in the Buffer Pool as long as possible. The more RAM available to SQL Server, the more of your working dataset fits in the Buffer Pool — which is why adding RAM is often the single most impactful hardware upgrade for a SQL Server that is doing too many physical reads. The query below reveals exactly how the Buffer Pool is being used right now.

-- Inspect the Buffer Pool — how much RAM is SQL Server using and what is in it?

SELECT
    physical_memory_in_use_kb / 1024           AS ram_used_by_sql_mb,
    memory_utilization_percentage               AS sql_memory_utilisation_pct,
    page_fault_count                            AS page_faults_since_start
FROM sys.dm_os_process_memory;

-- Break down Buffer Pool usage by database — which database owns the most cached pages?
SELECT
    DB_NAME(database_id)                        AS database_name,
    COUNT(*) * 8 / 1024                        AS buffer_pool_mb,
    SUM(CASE WHEN is_modified = 1
             THEN 1 ELSE 0 END)                AS dirty_pages,  -- modified but not yet written to disk
    SUM(CASE WHEN is_modified = 0
             THEN 1 ELSE 0 END)                AS clean_pages   -- in sync with disk
FROM sys.dm_os_buffer_descriptors
WHERE database_id > 4                           -- exclude system databases
GROUP BY database_id
ORDER BY buffer_pool_mb DESC;
ram_used_by_sql_mb sql_memory_utilisation_pct page_faults_since_start
────────────────── ────────────────────────── ───────────────────────
1842 67 12847

database_name buffer_pool_mb dirty_pages clean_pages
────────────── ────────────── ─────────── ───────────
DataplexaStore 312 24 1458
  • Dirty pages are pages that have been modified in memory but not yet written back to the data file — a background process called a checkpoint periodically flushes them to disk
  • If ram_used_by_sql_mb is close to the total physical RAM on the server, SQL Server is memory-pressured — pages are being evicted before they are needed again, causing extra physical reads
  • Adding RAM lets SQL Server keep a larger working set in the Buffer Pool — the result is fewer physical reads and dramatically faster query response

Layer 5 — Write-Ahead Logging and the Transaction Log

The final and most critical architectural component is the Transaction Log — the .ldf file that lives alongside every database. SQL Server uses a pattern called Write-Ahead Logging (WAL): before any change is made to a data page in the Buffer Pool, a description of that change is written to the transaction log. This is not optional and cannot be disabled — it is the foundation of SQL Server's crash recovery capability and the Durability guarantee in ACID. If the server loses power mid-transaction, SQL Server replays the log on the next startup to roll forward any changes that were committed and roll back any that were not, leaving the database in a consistent state as if the crash never happened. The log also enables point-in-time restore — with a full backup and an unbroken chain of log backups, you can restore a database to any specific minute in its history. The query below shows the current state of the transaction log for DataplexaStore.

-- Inspect the transaction log state for the DataplexaStore database

USE DataplexaStore;

-- DBCC SQLPERF shows log space usage across all databases at a glance
DBCC SQLPERF(LOGSPACE);

-- For detailed log file information including VLF count
SELECT
    name                                        AS file_name,
    type_desc                                   AS file_type,
    physical_name                               AS path_on_disk,
    size * 8 / 1024                            AS size_mb,
    FILEPROPERTY(name, 'SpaceUsed') * 8 / 1024 AS used_mb,
    growth                                      AS auto_growth_setting
FROM sys.database_files;
-- DBCC SQLPERF(LOGSPACE):
Database Name Log Size (MB) Log Space Used (%) Status
──────────────── ───────────── ────────────────── ──────
DataplexaStore 8.00 12.50 0

-- sys.database_files:
file_name file_type path_on_disk size_mb used_mb auto_growth_setting
───────────────── ───────── ──────────────────────────────────────────────────────── ─────── ─────── ───────────────────
DataplexaStore ROWS C:\...MSSQL\DATA\DataplexaStore.mdf 8 1 1024
DataplexaStore_log LOG C:\...MSSQL\DATA\DataplexaStore_log.ldf 8 1 10
  • Log Space Used below 70% is healthy — above 70% it is time to investigate whether log backups are running regularly, because uncollected log records cause the log to grow until it fills the disk
  • The auto_growth_setting of 10 for the log means it grows by 10% each time it fills up — in production, set a fixed MB growth size instead, because percentage growth on a large log causes increasingly long pause spikes
  • Write-Ahead Logging explains why database writes are split across two files: data goes to the .mdf, the log record goes to the .ldf first — putting them on separate physical drives removes the I/O contention between the two write patterns

How the Five Layers Work Together — A Complete Query Journey

To make the architecture concrete, here is a single SELECT query traced through every layer from keypress to result. When you run the query below in SSMS: the Protocol Layer receives the TDS packet. The Parser validates the SQL syntax and the Algebrizer confirms Products and Category exist. The Optimiser checks the Plan Cache — if this exact query ran before it reuses the cached plan; if not, it generates a new plan choosing between a table scan and any available index on Category. The Storage Engine checks the Buffer Pool for the Products pages — if they are cached the result returns in microseconds from RAM; if not it reads from the .mdf file first. The result rows are packaged back into a TDS response and sent to SSMS. No part of this journey involves writing to the transaction log because SELECT statements do not modify data.

-- A query whose full journey through all five layers we can observe

-- Step 1: enable IO statistics so we can see Buffer Pool activity
SET STATISTICS IO ON;
SET STATISTICS TIME ON;

-- Step 2: run the query
SELECT
    ProductID,
    ProductName,
    Price,
    StockQty
FROM Products
WHERE Category = 'Electronics'
ORDER BY Price DESC;

-- Step 3: disable statistics
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;
ProductID ProductName Price StockQty
───────── ─────────────────── ───── ────────
6 Webcam HD 79.99 18
2 Mechanical Keyboard 89.99 15
3 USB-C Hub 49.99 30
1 Wireless Mouse 29.99 42

(4 rows affected)

Table 'Products'. Scan count 1, logical reads 1, physical reads 0.
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms.
  • Physical reads = 0 means the Products pages were already in the Buffer Pool — the query was served entirely from RAM, which is why it finished in 1 ms
  • Logical reads = 1 means the entire Products table fits on a single 8 KB page at this dataset size — at millions of rows the same query without an index on Category could touch thousands of pages
  • CPU time = 0 ms reflects that a single-page table scan is trivially cheap — as the table grows and logical reads climb, CPU time becomes a meaningful signal of query complexity

Summary Table

Layer Component Responsibility Failure Symptom
1 Protocol Layer (TDS) Receive network connection on port 1433 Connection timeout, port blocked by firewall
2 Parser + Algebrizer Validate syntax, resolve object names Msg 102 (syntax), Msg 207/208 (invalid object)
3 Query Optimiser Generate and select the cheapest execution plan Slow queries from stale statistics or missing indexes
4 Storage Engine + Buffer Pool Execute plan, serve data from RAM or disk High physical reads from insufficient RAM
5 Transaction Log (WAL) Record all changes before applying them Log full error if backups are not running

Practice Questions

Practice 1. What is Write-Ahead Logging and which ACID guarantee does it directly enable?



Practice 2. What is the difference between a logical read and a physical read in SQL Server?



Practice 3. What is the Plan Cache and why does it matter for performance?



Practice 4. A query that was fast yesterday is now slow after a large overnight data load. Which architectural component is most likely responsible and what is the fix?



Practice 5. What is the difference between a Parser error and an Algebrizer error, and at what point in the query pipeline does each occur?



Quiz

Quiz 1. What network protocol does SQL Server use to communicate with all client applications?






Quiz 2. What does the Query Optimiser use to estimate the cost of different execution plans?






Quiz 3. What is the size of a single SQL Server data page?






Quiz 4. Which DMV combination reveals which cached queries are generating the most disk reads?






Quiz 5. Why does SQL Server write changes to the transaction log BEFORE modifying the data page?






Next up - Databases & File Structure - Understand how SQL Server organises your data into files, pages, and filegroups on disk.