MS SQL Server
Query Optimisation
A query that returns the right answer in 30 seconds is not a correct query in production — it is a problem waiting for a deadline. SQL Server's query optimiser does significant work automatically, but it can only work with what you give it: accurate statistics, usable indexes, and queries written in ways that allow the optimiser to reason about them. The difference between a query that runs in 2 milliseconds and one that runs in 20 seconds is almost never the business logic — it is the shape of the query, the presence or absence of the right index, a function wrapping a column in a WHERE clause, or a data type mismatch that silently disables index use. This lesson covers the practical techniques that reliably improve query performance: understanding what the optimiser needs, identifying the patterns that prevent it from doing its job, reading the signals that tell you where the problem is, and applying the fixes — all using the DataplexaStore schema.
How the Optimiser Chooses a Plan
Before executing any query, SQL Server's query optimiser generates one or more candidate execution plans and estimates the cost of each based on statistics — metadata about the distribution of values in each column and index. It picks the plan with the lowest estimated cost and caches it for reuse. The optimiser is cost-based, not rule-based: it does not have a fixed algorithm for every query shape; it explores the search space of possible join orders, index choices, and operator selections and tries to find a good enough plan within a time budget. The phrase "good enough" is important — the optimiser does not guarantee the optimal plan, only a plan that is estimated to be reasonably efficient based on the statistics it has. When statistics are stale, when data has unusual distributions, or when the query shape prevents the optimiser from reasoning correctly, the resulting plan can be dramatically suboptimal.
-- Inspect statistics on the Products table
USE DataplexaStore;
-- List all statistics objects on Products
SELECT
s.name AS stats_name,
s.auto_created,
s.user_created,
sp.last_updated,
sp.rows,
sp.rows_sampled,
sp.modification_counter
FROM sys.stats s
CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) sp
WHERE s.object_id = OBJECT_ID('Products')
ORDER BY sp.last_updated DESC;
-- Update statistics manually — forces the optimiser to re-examine data distribution
UPDATE STATISTICS Products; -- updates all stats on the table
UPDATE STATISTICS Products PK_Products; -- updates one specific stats object
-- Check when statistics were last updated across all DataplexaStore tables
SELECT
OBJECT_NAME(s.object_id) AS table_name,
s.name AS stats_name,
sp.last_updated,
sp.rows,
sp.modification_counter AS modifications_since_last_update
FROM sys.stats s
CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) sp
WHERE OBJECT_NAME(s.object_id) IN ('Users','Products','Orders','OrderItems','Reviews')
ORDER BY sp.modification_counter DESC, OBJECT_NAME(s.object_id);stats_name auto_created user_created last_updated rows rows_sampled modification_counter
─────────────────────────────── ──────────── ──────────── ─────────────────────── ──── ──────────── ────────────────────
PK_Products 0 0 2024-03-28 15:00:00.000 10 10 0
IX_Products_Category_Price 0 1 2024-03-28 15:10:00.000 10 10 0
IX_Products_Category_Covering 0 1 2024-03-28 15:10:00.000 10 10 0
_WA_Sys_00000003_Products 1 0 2024-03-28 15:00:00.000 10 10 2
-- Cross-table stats freshness:
table_name stats_name last_updated rows modifications
─────────── ─────────────────────── ─────────────────────── ──── ─────────────
Products _WA_Sys_00000003_Price 2024-03-28 15:00:00.000 10 2
Orders IX_Orders_Status 2024-03-28 15:10:00.000 4 0
Orders PK_Orders 2024-03-28 15:10:00.000 4 0
- modification_counter = 2 on the auto-created Price stats means two rows were modified since the last update — SQL Server auto-updates stats when roughly 20% of rows change, so on a 10-row table this threshold is hit very quickly; on a million-row table 20% means 200,000 modifications before auto-update triggers
- Auto-created statistics (_WA_Sys_...) are created automatically by SQL Server when a column is used in a WHERE clause for the first time and no existing stats cover it — they are a sign the optimiser needed information it did not have
- In production, stale statistics on large tables are one of the most common causes of sudden query regressions — a query that ran in 50ms for months suddenly takes 30 seconds after a bulk load because the statistics no longer reflect the actual data distribution
SARGability — Writing WHERE Clauses the Optimiser Can Use
A SARGable predicate (Search ARGument able) is a WHERE condition that SQL Server can use to perform an index seek — navigating directly to the matching rows without scanning the entire index. A predicate is SARGable when the indexed column appears alone on one side of the comparison and the value or expression it is compared to is on the other side. Wrapping the column in a function, performing arithmetic on it, or comparing it to a value of a different data type all make the predicate non-SARGable, forcing a scan. Non-SARGable predicates are one of the most common and most fixable sources of unnecessary full-index scans in production queries.
-- SARGable vs non-SARGable predicates — each pair produces the same result
-- but only the SARGable form allows an index seek
-- NON-SARGable: function wrapping the column
SELECT ProductName, Price
FROM Products
WHERE ROUND(Price, 0) = 30; -- ROUND applied to column — forces scan
-- SARGable equivalent: move the logic to the value side
SELECT ProductName, Price
FROM Products
WHERE Price >= 29.50 -- column is bare — index seek possible
AND Price < 30.50;
-- NON-SARGable: implicit type conversion — column is INT, literal is VARCHAR
-- This converts every UserID to VARCHAR before comparing — forces scan
SELECT UserID, FullName
FROM Users
WHERE CAST(UserID AS VARCHAR) = '2';
-- SARGable equivalent: use the correct type
SELECT UserID, FullName
FROM Users
WHERE UserID = 2; -- INT compared to INT — seek on clustered index
-- NON-SARGable: arithmetic on the column
SELECT OrderID, TotalAmount
FROM Orders
WHERE TotalAmount / 2 > 50; -- divides every row then compares — scan
-- SARGable equivalent: rearrange the expression to isolate the column
SELECT OrderID, TotalAmount
FROM Orders
WHERE TotalAmount > 100; -- column is bare — index seek possible
-- NON-SARGable: leading wildcard
SELECT ProductName FROM Products WHERE ProductName LIKE '%Mouse'; -- scan
-- SARGable equivalent: trailing wildcard
SELECT ProductName FROM Products WHERE ProductName LIKE 'Wire%'; -- seekProductName Price
────────────── ─────
Wireless Mouse 29.99
-- Range equivalent (SARGable):
ProductName Price
────────────── ─────
Wireless Mouse 29.99
-- INT = INT (SARGable):
UserID FullName
────── ────────────
2 Bob Williams
-- TotalAmount > 100 (SARGable):
OrderID TotalAmount
─────── ───────────
2 512.92
3 159.91
7 104.37
- Every non-SARGable form forces a full scan of the index or table — on a 10-row Products table this is imperceptible; on a Products table with 5 million rows each non-SARGable scan reads every row and every page, making the query orders of magnitude slower than the seek equivalent
- The ROUND workaround (using a range instead) is slightly different in semantics — it returns any price that rounds to 30, not exactly 30; the query intent should confirm whether this is the right interpretation before replacing the non-SARGable form
- Type mismatch is particularly dangerous because it is invisible —
WHERE UserID = '2'looks correct, works correctly, and returns the right result, but forces SQL Server to convert every UserID in the table before comparing, disabling the clustered index seek silently
Query Rewriting Techniques
Beyond SARGability, several structural query patterns consistently produce worse plans than their equivalents. OR conditions on different columns are often better served by UNION ALL because OR prevents the optimiser from using separate indexes on each column independently. SELECT DISTINCT used to eliminate duplicates caused by incorrect joins is a symptom of a structural problem and often hides a missing GROUP BY or an incorrect relationship — fixing the root cause is better than masking it with DISTINCT. Scalar subqueries in SELECT that run once per row (correlated) can often be replaced by a single JOIN or window function that the optimiser can process in a single pass rather than row by row.
-- Query rewriting — structural improvements for better plan quality
-- OR on different indexed columns — may not use either index efficiently
SELECT UserID, FullName, Country
FROM Users
WHERE Country = 'UK'
OR MembershipTier = 'VIP';
-- UNION ALL equivalent — each branch can independently use its own index seek
SELECT UserID, FullName, Country, MembershipTier
FROM Users
WHERE Country = 'UK'
UNION ALL
SELECT UserID, FullName, Country, MembershipTier
FROM Users
WHERE MembershipTier = 'VIP'
AND Country <> 'UK'; -- exclude the overlap to avoid duplicates
-- Correlated scalar subquery — re-executed once per product row
SELECT
p.ProductName,
p.Price,
(SELECT COUNT(*) FROM OrderItems oi WHERE oi.ProductID = p.ProductID) AS times_ordered
FROM Products p
ORDER BY times_ordered DESC;
-- JOIN + GROUP BY equivalent — single pass, no per-row subquery execution
SELECT
p.ProductName,
p.Price,
COUNT(oi.OrderItemID) AS times_ordered
FROM Products p
LEFT JOIN OrderItems oi ON p.ProductID = oi.ProductID
GROUP BY p.ProductName, p.Price
ORDER BY times_ordered DESC;UserID FullName Country MembershipTier
────── ──────────── ─────── ──────────────
2 Bob Williams UK Premium
7 Grace Patel India VIP
3 Clara Davis Canada VIP
-- UNION ALL result (same rows, no duplicates):
UserID FullName Country MembershipTier
────── ──────────── ─────── ──────────────
2 Bob Williams UK Premium
7 Grace Patel India VIP
3 Clara Davis Canada VIP
-- JOIN + GROUP BY (same result as correlated subquery, single pass):
ProductName Price times_ordered
──────────────────── ────── ─────────────
Monitor 27 inch 359.99 2
Notebook A5 12.99 2
Mechanical Keyboard 81.80 2
Desk Lamp 36.74 2
USB-C Hub 45.44 2
Wireless Mouse 29.99 1
Webcam HD 71.99 0
Standing Desk 367.49 0
Ergonomic Chair 262.49 0
Ballpoint Pen Set 8.54 0
- The OR and UNION ALL forms return identical results — on small tables the optimiser may transform the OR into a UNION internally anyway, but on large tables with separate indexes on Country and MembershipTier, UNION ALL gives the optimiser an explicit path to use both indexes independently
- The JOIN + GROUP BY form replaces the correlated subquery — the optimiser processes all products and all order items in a single join pass rather than running a separate COUNT query for each of the 10 product rows; at 10 products the difference is negligible, at 100,000 products the correlated form would run 100,000 subqueries
- Prices have drifted from earlier lesson values due to the accumulated price changes applied in L17, L19, and L29 — the DataplexaStore data reflects all cumulative modifications across the lesson series
Identifying Slow Queries — sys.dm_exec_query_stats
SQL Server caches every query plan that executes and tracks cumulative performance statistics for each one — total elapsed time, CPU time, logical reads, and execution count. This data is available in sys.dm_exec_query_stats and is one of the first places to look when investigating a performance problem. Sorting by total logical reads identifies the queries consuming the most I/O across all executions. Sorting by total elapsed time finds the queries responsible for the most wall-clock time. Dividing by execution count reveals the per-execution cost, distinguishing a query that runs once and takes 10 seconds from one that runs 10,000 times and takes 1ms each — both are performance concerns but they require different solutions.
-- Top queries by total logical reads — most I/O-intensive queries in the plan cache
SELECT TOP 10
qs.execution_count,
qs.total_logical_reads,
qs.total_logical_reads / qs.execution_count
AS avg_logical_reads,
qs.total_elapsed_time / 1000 AS total_elapsed_ms,
qs.total_elapsed_time / qs.execution_count / 1000
AS avg_elapsed_ms,
qs.total_worker_time / 1000 AS total_cpu_ms,
SUBSTRING(qt.text, 1, 150) AS query_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
WHERE qt.dbid = DB_ID()
ORDER BY qs.total_logical_reads DESC;
-- Queries with the worst average elapsed time — slowest per execution
SELECT TOP 10
qs.execution_count,
qs.total_elapsed_time / qs.execution_count / 1000
AS avg_elapsed_ms,
qs.total_logical_reads / qs.execution_count
AS avg_reads,
SUBSTRING(qt.text, 1, 150) AS query_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
WHERE qt.dbid = DB_ID()
AND qs.execution_count > 0
ORDER BY avg_elapsed_ms DESC;execution_count total_logical_reads avg_logical_reads total_elapsed_ms avg_elapsed_ms query_text
─────────────── ─────────────────── ───────────────── ──────────────── ────────────── ───────────────────────────────────────
4 48 12 112 28 SELECT p.ProductName, p.Category, p...
3 36 12 84 28 SELECT u.UserID, u.FullName, u.Email...
2 18 9 52 26 SELECT o.OrderID, o.Status, o.TotalA...
1 12 12 31 31 UPDATE Products SET Price = ROUND(Pri...
1 8 8 22 22 SELECT ProductName, Category, Price F...
- avg_logical_reads is the key metric — logical reads measure pages read from the buffer pool, directly reflecting how much of the table was scanned; a query with 12 logical reads on a 10-row table is reading nearly every page, which is expected here but would be alarming on a million-row table
- sys.dm_exec_query_stats resets when the plan is evicted from cache (due to memory pressure, statistics updates, or server restart) — in production the data represents recent history, not all-time history
- The plan cache on a development dataset shows queries from the current session only; in production with hundreds of concurrent sessions this view reveals the true workload profile — which queries run most often and which cost the most per execution
The Missing Index DMV
SQL Server tracks situations where it estimated a query would have run faster if a particular index existed, and stores these recommendations in sys.dm_db_missing_index_details. When the optimiser builds an execution plan and chooses a scan where it could have used a seek — if only an index existed on the right columns — it records the columns that would have helped and an estimate of the improvement. These recommendations are a starting point for index analysis, not a prescription. The optimiser recommends indexes based on individual queries in isolation and does not account for the write overhead of adding indexes or the possibility that a single composite index could serve multiple recommendations simultaneously.
-- Missing index recommendations from the optimiser
SELECT
mid.statement AS table_name,
migs.avg_total_user_cost AS avg_query_cost_reduction,
migs.avg_user_impact AS avg_pct_improvement,
migs.user_seeks,
migs.user_scans,
mid.equality_columns, -- columns used in = predicates
mid.inequality_columns, -- columns used in <, >, BETWEEN predicates
mid.included_columns -- columns to INCLUDE for covering
FROM sys.dm_db_missing_index_groups mig
JOIN sys.dm_db_missing_index_group_stats migs
ON mig.index_group_handle = migs.group_handle
JOIN sys.dm_db_missing_index_details mid
ON mig.index_handle = mid.index_handle
WHERE mid.database_id = DB_ID()
ORDER BY migs.avg_total_user_cost * migs.avg_user_impact DESC;
-- Translate a recommendation into a CREATE INDEX statement
-- Example: if equality_columns = 'Status', included_columns = 'TotalAmount, OrderDate'
CREATE NONCLUSTERED INDEX IX_Orders_Status_Covering
ON Orders (Status)
INCLUDE (TotalAmount, OrderDate, UserID); -- covering for common order status queriestable_name avg_query_cost_reduction avg_pct_improvement user_seeks equality_columns inequality_columns included_columns
──────────────────────────── ──────────────────────── ─────────────────── ────────── ──────────────── ────────────────── ─────────────────
[DataplexaStore].[dbo].[Users] 0.008 78.23 2 Country NULL FullName, Email
-- Translated index created:
Command(s) completed successfully.
- avg_pct_improvement = 78.23 means the optimiser estimates a 78% reduction in query cost if an index on Users.Country were created — a high-impact recommendation worth investigating
- The recommendation separates equality_columns (used in =) from inequality_columns (used in <, >, BETWEEN) because composite index key order matters: equality columns should come first in the index key, then inequality columns, then INCLUDE non-key columns
- The missing index DMVs reset on SQL Server restart — in production they should be queried after a representative workload period, not immediately after startup when the optimiser has had little opportunity to gather recommendations
Summary Table
| Technique | Problem It Solves | Key Rule |
|---|---|---|
| Update statistics | Stale row count estimates | Run after bulk loads or large modifications |
| SARGable predicates | Functions on columns disable index seeks | Column bare on one side — value or expression on the other |
| Match literal types to column types | Implicit conversion forces scan | INT column = INT literal, not VARCHAR |
| UNION ALL instead of OR | OR on different columns blocks individual index use | Each branch seeks independently |
| JOIN instead of correlated subquery | Per-row subquery re-execution | Single pass replaces N subquery executions |
| sys.dm_exec_query_stats | Identify high-cost queries | Sort by total_logical_reads or avg_elapsed_ms |
| sys.dm_db_missing_index_details | Find indexes the optimiser wanted | Equality cols first, inequality next, INCLUDE the rest |
Practice Questions
Practice 1. What makes a WHERE clause predicate non-SARGable, and why does it hurt performance?
Practice 2. Statistics on a large Orders table have not been updated after a bulk load of 2 million new rows. What problem might occur?
Practice 3. A query has WHERE YEAR(OrderDate) = 2024. Why is this non-SARGable and how would you rewrite it?
Practice 4. What does avg_logical_reads in sys.dm_exec_query_stats measure and why is it useful?
Practice 5. The missing index DMV recommends an index with equality_columns = 'Category' and included_columns = 'ProductName, Price'. Write the CREATE INDEX statement.
Quiz
Quiz 1. WHERE CAST(UserID AS VARCHAR) = '2' prevents an index seek. What is the correct rewrite?
Quiz 2. In a missing index recommendation, equality_columns should come before inequality_columns in the index key. Why?
Quiz 3. sys.dm_exec_query_stats resets in which circumstance?
Quiz 4. Why is replacing a correlated scalar subquery with a LEFT JOIN + GROUP BY often faster?
Quiz 5. SQL Server auto-updates statistics when approximately what percentage of rows in a table are modified?
Next up - Execution Plans - Read the graphical and text execution plans SQL Server generates, understand the operators they contain, and use them to pinpoint exactly where a query is spending its time.