MS SQL Server
SQL Server Monitoring
Why Monitor SQL Server?
A SQL Server instance running in production is under constant load — queries executing, transactions committing, indexes being read, memory being allocated and released. Without monitoring, the first sign of a problem is often a phone call reporting that the application is slow or unresponsive. By the time that call comes, the root cause may have been building for hours. Proactive monitoring lets you detect problems while they are still small: a query that doubled in execution time last night, a table that is now missing an index, a blocking chain that is holding up dozens of sessions.
SQL Server exposes its internals through Dynamic Management Views (DMVs) — system views that reflect the live state of the engine. Querying DMVs requires no third-party tools and no special setup. Every DBA and developer with appropriate permissions can query them directly in SSMS or any SQL client. The key DMV families covered in this lesson are: wait statistics, query performance, index usage, blocking, and database file I/O.
- DMVs expose real-time engine state — connections, waits, query plans, index usage, blocking
- No additional tools are required — DMV queries run in any SQL client with VIEW SERVER STATE permission
- Wait statistics reveal what the engine is waiting for — the single most diagnostic signal in SQL Server
- Missing index DMVs show SQL Server's own recommendations based on actual query patterns
- Blocking DMVs identify sessions holding locks that are stalling other work
- DMV data resets when SQL Server restarts — it reflects only the period since the last service start
-- Server health snapshot: connections, memory, CPU, and uptime.
-- A quick first look when investigating a performance complaint.
SELECT
@@SERVERNAME AS ServerName,
@@VERSION AS SqlVersion,
DATEDIFF(HOUR, sqlserver_start_time, GETDATE())
AS UptimeHours,
cpu_count AS LogicalCPUs,
physical_memory_kb / 1024 AS PhysicalMemoryMB,
(SELECT COUNT(*) FROM sys.dm_exec_sessions
WHERE is_user_process = 1) AS ActiveUserSessions,
(SELECT COUNT(*) FROM sys.dm_exec_requests
WHERE status = 'running') AS ActiveRequests,
(SELECT COUNT(*) FROM sys.dm_exec_requests
WHERE blocking_session_id > 0) AS BlockedRequests
FROM sys.dm_os_sys_info;
-- Active sessions with their current statement and wait type
SELECT TOP 20
s.session_id,
s.login_name,
s.host_name,
s.program_name,
r.status,
r.wait_type,
r.wait_time / 1000.0 AS WaitSecs,
r.cpu_time / 1000.0 AS CpuSecs,
r.total_elapsed_time / 1000.0 AS ElapsedSecs,
SUBSTRING(t.text, (r.statement_start_offset/2)+1,
((CASE r.statement_end_offset WHEN -1 THEN DATALENGTH(t.text)
ELSE r.statement_end_offset END - r.statement_start_offset)/2)+1)
AS CurrentStatement
FROM sys.dm_exec_sessions s
JOIN sys.dm_exec_requests r ON s.session_id = r.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE s.is_user_process = 1
ORDER BY r.total_elapsed_time DESC;ServerName | UptimeHours | LogicalCPUs | PhysicalMemoryMB | ActiveUserSessions | ActiveRequests | BlockedRequests
-------------|-------------|-------------|------------------|--------------------|----------------|----------------
DPLX-SQL-01 | 312 | 8 | 16384 | 23 | 4 | 1
-- Active sessions (top 5 by elapsed time)
session_id | login_name | status | wait_type | WaitSecs | CpuSecs | ElapsedSecs | CurrentStatement
-----------|--------------|---------|--------------|----------|---------|-------------|------------------
55 | DPLX\svc_app | running | PAGEIOLATCH_SH | 12.4 | 8.2 | 14.1 | SELECT o.OrderId...
62 | app_readonly | suspended | LCK_M_S | 8.1 | 0.0 | 8.1 | SELECT * FROM...
41 | DPLX\svc_app | running | CXPACKET | 3.2 | 18.4 | 5.6 | INSERT INTO...
(3 rows shown)
- One blocked request is visible in the health snapshot — session 62 is waiting on a shared lock (
LCK_M_S) PAGEIOLATCH_SHon session 55 means the engine is waiting for a data page to be read from disk — a storage I/O signalCXPACKETon session 41 indicates parallel query exchange — common on large aggregation or sort operationsCROSS APPLY sys.dm_exec_sql_textextracts the exact T-SQL text being executed by each active request
Wait Statistics — The Most Diagnostic Signal
Every time SQL Server cannot immediately proceed with work it records a wait. It logs what it was waiting for and for how long. The cumulative totals in sys.dm_os_wait_stats reveal the dominant bottleneck across the entire instance since the last restart. High CPU waits point to query plan problems. High I/O waits point to missing indexes or insufficient storage. High lock waits point to contention between concurrent transactions. Reading wait statistics is the fastest way to identify the category of a performance problem before drilling into individual queries.
-- Top wait types since last SQL Server restart.
-- Exclude known benign waits (idle waits that are always high).
-- This reveals the dominant bottleneck on the instance.
SELECT TOP 15
wait_type,
waiting_tasks_count AS WaitCount,
wait_time_ms / 1000.0 AS TotalWaitSecs,
max_wait_time_ms / 1000.0 AS MaxWaitSecs,
ROUND(100.0 * wait_time_ms /
NULLIF(SUM(wait_time_ms) OVER (), 0), 2) AS PctOfTotal
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN ( -- exclude benign idle waits
'SLEEP_TASK', 'BROKER_TO_FLUSH', 'BROKER_TASK_STOP',
'CLR_AUTO_EVENT', 'DISPATCHER_QUEUE_SEMAPHORE',
'FT_IFTS_SCHEDULER_IDLE_WAIT', 'HADR_WORK_QUEUE',
'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
'LAZYWRITER_SLEEP', 'LOGMGR_QUEUE',
'REQUEST_FOR_DEADLOCK_SEARCH', 'RESOURCE_QUEUE',
'SERVER_IDLE_CHECK', 'SLEEP_DBSTARTUP', 'SLEEP_DCOMSTARTUP',
'SLEEP_MASTERDBREADY', 'SLEEP_MASTERMDREADY',
'SLEEP_MASTERUPGRADED', 'SLEEP_MSDBSTARTUP',
'SLEEP_SYSTEMTASK', 'SLEEP_TEMPDBSTARTUP',
'SNI_HTTP_ACCEPT', 'SP_SERVER_DIAGNOSTICS_SLEEP',
'SQLTRACE_BUFFER_FLUSH', 'WAITFOR',
'XE_DISPATCHER_WAIT', 'XE_TIMER_EVENT'
)
AND wait_time_ms > 0
ORDER BY wait_time_ms DESC;-------------------|-----------|---------------|-------------|------------
PAGEIOLATCH_SH | 48210 | 1842.30 | 14.10 | 34.2
LCK_M_S | 1240 | 980.50 | 28.40 | 18.2
CXPACKET | 8820 | 642.10 | 5.20 | 11.9
WRITELOG | 92100 | 480.20 | 0.80 | 8.9
SOS_SCHEDULER_YIELD| 124800 | 310.40 | 0.01 | 5.8
(5 rows shown -- truncated)
PAGEIOLATCH_SHat 34% of total waits means the engine is spending a third of its wait time reading data pages from disk — missing indexes or insufficient buffer pool memory are the likely causesLCK_M_Sat 18% confirms significant lock contention — long-running transactions holding locks while others wait for themWRITELOGindicates transaction log write latency — the log disk may be too slow or shared with other workloadsSOS_SCHEDULER_YIELDat high counts suggests CPU pressure — queries competing for CPU time and yielding their scheduler slot
Top Queries by CPU and Elapsed Time
Once wait statistics identify the category of problem, sys.dm_exec_query_stats identifies the specific queries responsible. This DMV records the cumulative resource consumption of every query plan in the cache — total CPU, total elapsed time, total reads, total executions. Dividing by execution count gives the average cost per run, which reveals whether a query is slow because it is inefficient or simply because it runs thousands of times per minute.
-- Top 10 queries by total CPU time since last restart.
-- Divide by execution_count to find the average cost per run.
SELECT TOP 10
qs.total_worker_time / 1000 AS TotalCPUms,
qs.total_worker_time / qs.execution_count / 1000
AS AvgCPUms,
qs.total_elapsed_time / qs.execution_count / 1000
AS AvgElapsedMs,
qs.total_logical_reads / qs.execution_count
AS AvgLogicalReads,
qs.execution_count AS Executions,
qs.total_logical_reads AS TotalLogicalReads,
SUBSTRING(qt.text, (qs.statement_start_offset/2)+1,
((CASE qs.statement_end_offset WHEN -1 THEN DATALENGTH(qt.text)
ELSE qs.statement_end_offset END - qs.statement_start_offset)/2)+1)
AS QueryText,
DB_NAME(qt.dbid) AS DatabaseName
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
WHERE qt.dbid = DB_ID() -- current database only
ORDER BY qs.total_worker_time DESC;-----------|----------|--------------|-----------------|------------|----------
184230 | 1842 | 2104 | 48200 | 100 | SELECT o.OrderId, u.FirstName...
94820 | 9482 | 11240 | 312000 | 10 | SELECT p.Name, SUM(oi.Quantity)...
42100 | 421 | 380 | 8420 | 100 | UPDATE dbo.Orders SET Status...
18400 | 184 | 142 | 1240 | 100 | SELECT COUNT(*) FROM Orders...
(4 rows shown -- truncated)
-- Row 2 stands out: 9,482ms avg CPU and 312,000 avg logical reads per execution
-- Run only 10 times but each run is extremely expensive -- investigate its plan
- Row 2 runs only 10 times but averages 9,482ms CPU and 312,000 logical reads per execution — a table scan on a large join is the likely cause
- Row 1 has lower average cost but runs 100 times — its total CPU dominates because of frequency, not inefficiency per run
total_logical_readsis the most reliable indicator of missing index impact — each logical read is a page pulled from the buffer pool or diskCROSS APPLY sys.dm_exec_sql_textretrieves the query text from the plan cache for each stats row
Missing Index Recommendations
SQL Server's query optimiser records every time it estimates that an index would have significantly improved a query it just compiled. These estimates accumulate in sys.dm_db_missing_index_details and related DMVs. The missing index DMVs do not tell you to blindly create every suggested index — they are a starting point. Cross-reference with sys.dm_db_index_usage_stats to confirm that similar indexes are not already unused before adding new ones.
-- Missing index recommendations ordered by estimated improvement impact.
-- The impact score is a rough guide -- investigate each suggestion before creating.
SELECT TOP 10
ROUND(migs.avg_total_user_cost *
migs.avg_user_impact *
(migs.user_seeks + migs.user_scans), 0) AS ImpactScore,
migs.user_seeks AS Seeks,
migs.user_scans AS Scans,
ROUND(migs.avg_user_impact, 1) AS AvgImpactPct,
mid.statement AS TableName,
mid.equality_columns AS EqualityColumns,
mid.inequality_columns AS InequalityColumns,
mid.included_columns AS IncludedColumns,
-- generate the CREATE INDEX statement automatically
'CREATE INDEX IX_' +
REPLACE(REPLACE(mid.statement, '[', ''), ']', '') +
'_' + REPLACE(ISNULL(mid.equality_columns,'') +
ISNULL('_' + mid.inequality_columns,''), ', ', '_') +
' ON ' + mid.statement +
' (' + ISNULL(mid.equality_columns, '') +
CASE WHEN mid.inequality_columns IS NOT NULL
THEN ISNULL(', ' + mid.inequality_columns, '') ELSE '' END + ')' +
ISNULL(' INCLUDE (' + mid.included_columns + ')', '')
AS SuggestedCreateStatement
FROM sys.dm_db_missing_index_groups mig
JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle
JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
WHERE mid.database_id = DB_ID()
ORDER BY ImpactScore DESC;------------|-------|-------|--------------|------------------|-----------------|-------------------|----------------
9842100 | 8420 | 0 | 94.2 | [dbo].[Orders] | [UserId] | [OrderDate] | [Total],[Status]
1240800 | 1240 | 0 | 82.4 | [dbo].[OrderItems]| [ProductId] | NULL | [Quantity],[Price]
384200 | 384 | 12 | 71.8 | [dbo].[Products] | [Category] | [Price] | [Name],[Stock]
(3 rows shown)
- The top suggestion has an impact score of 9,842,100 — the optimiser estimates adding this index on
Orders(UserId, OrderDate)would reduce query cost by 94.2% on average - The generated CREATE INDEX statement is a starting template — review it, rename it to your convention, and test it in a non-production environment first
- Do not create every suggested index blindly — each index adds write overhead and maintenance cost; a table with 20 indexes may write slower than one with 5 well-chosen indexes
- Combine missing index data with
sys.dm_db_index_usage_statsto ensure you are not adding an index that duplicates an existing unused one
Detecting Blocking and Deadlocks
Blocking occurs when one session holds a lock and another session needs the same lock to proceed. The waiting session is suspended until the blocking session commits or rolls back. In a busy OLTP system, short blocks lasting a few hundred milliseconds are normal. Blocks lasting seconds or minutes indicate transactions that are too long, isolation levels that are too strict, or missing indexes that cause lock escalation. sys.dm_exec_requests combined with sys.dm_os_waiting_tasks identifies active blocking chains in real time.
-- Real-time blocking chain: who is blocking whom and for how long?
-- blocking_session_id > 0 means the session is waiting for a lock.
SELECT
r.session_id AS BlockedSession,
r.blocking_session_id AS BlockingSession,
r.wait_type,
r.wait_time / 1000.0 AS WaitSecs,
r.status,
SUBSTRING(bt.text, 1, 200) AS BlockedStatement,
s.login_name AS BlockedLogin,
bs.login_name AS BlockingLogin,
bs.last_request_start_time AS BlockingStarted
FROM sys.dm_exec_requests r
JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id
JOIN sys.dm_exec_sessions bs ON r.blocking_session_id = bs.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) bt
WHERE r.blocking_session_id > 0
ORDER BY r.wait_time DESC;
-- Kill a blocking session (use with care -- rolls back its open transaction):
-- KILL 55; -- terminates session 55 and rolls back any uncommitted work---------------|-----------------|-----------|----------|--------------------------|--------------|---------------|------------------
62 | 55 | LCK_M_S | 28.4 | SELECT * FROM dbo.Orders | app_readonly | DPLX\svc_app | 2024-07-01 14:22:08
71 | 55 | LCK_M_U | 15.2 | UPDATE dbo.Orders SET... | DPLX\svc_app | DPLX\svc_app | 2024-07-01 14:22:08
(2 rows affected)
-- Session 55 (DPLX\svc_app) is blocking both session 62 and session 71
-- It has been running since 14:22:08 -- check what transaction it has open
- Session 55 is the head blocker — it is holding locks that both session 62 and session 71 need
LCK_M_Sis a shared lock wait (SELECT blocked) andLCK_M_Uis an update lock wait — both are waiting for session 55 to release its transactionKILL 55would terminate session 55, roll back its open transaction, and unblock both waiting sessions — use only when the blocking is severe and the business impact of the rollback is understood- Repeated blocking on the same table usually points to a missing index causing lock escalation or a transaction that opens a lock and then waits on slow application logic before committing
Lesson Summary
| DMV / Concept | What It Shows | Key Use |
|---|---|---|
| sys.dm_os_sys_info | Server hardware, uptime, CPU count, memory | First stop health snapshot |
| sys.dm_exec_sessions | All active connections — login, host, program, status | Who is connected right now |
| sys.dm_exec_requests | Currently executing requests — wait type, CPU, elapsed time | What is running and what is it waiting for |
| sys.dm_os_wait_stats | Cumulative wait totals since last restart — by wait type | Identify the dominant bottleneck category |
| PAGEIOLATCH_SH | Waiting to read a data page from disk | Missing indexes or I/O subsystem pressure |
| LCK_M_S / LCK_M_U | Waiting for a shared or update lock | Lock contention between concurrent sessions |
| sys.dm_exec_query_stats | Cumulative resource usage per cached query plan | Find the most expensive queries by CPU or reads |
| sys.dm_exec_sql_text | Retrieves T-SQL text from a sql_handle | CROSS APPLY to get query text alongside stats |
| sys.dm_db_missing_index_details | Indexes the optimiser wished existed during query compilation | Starting point for index tuning — not a definitive list |
| blocking_session_id | Non-zero on a request means it is waiting for another session's lock | Filter sys.dm_exec_requests to find active blocking chains |
| KILL | Terminates a session and rolls back its open transaction | Last resort to break a severe blocking chain |
Practice Questions
Practice 1. Which DMV shows the cumulative wait totals across all wait types since the last SQL Server restart?
Practice 2. A wait type of PAGEIOLATCH_SH dominates your wait statistics. What is the most likely cause?
Practice 3. How do you determine the average CPU cost per execution of a cached query using sys.dm_exec_query_stats?
Practice 4. What does a non-zero blocking_session_id on a row in sys.dm_exec_requests mean?
Practice 5. Why should you not create every index suggested by the missing index DMVs?
Quiz
Quiz 1. You run a health check and see BlockedRequests = 8. Which DMV and filter do you query next to identify the head blocker?
Quiz 2. A query runs 5,000 times per day with an average of 2ms CPU. Another runs 3 times with an average of 8,000ms CPU. Which should you investigate first for tuning?
Quiz 3. DMV data for wait statistics and query stats resets when SQL Server restarts. What does this mean for long-term monitoring?
Quiz 4. What T-SQL operator is used to retrieve query text from sys.dm_exec_sql_text alongside rows from sys.dm_exec_query_stats?
Quiz 5. You issue KILL 55 to break a blocking chain. What happens to session 55's open transaction?
Next up — SQL Server in the Cloud (Azure SQL) — How Azure SQL Database differs from on-premises SQL Server, its deployment options, and how to migrate and manage databases in the cloud.