MS SQL Lesson 38 – SQL Server Monitoring | Dataplexa

SQL Server Monitoring

Why Monitor SQL Server?

A SQL Server instance running in production is under constant load — queries executing, transactions committing, indexes being read, memory being allocated and released. Without monitoring, the first sign of a problem is often a phone call reporting that the application is slow or unresponsive. By the time that call comes, the root cause may have been building for hours. Proactive monitoring lets you detect problems while they are still small: a query that doubled in execution time last night, a table that is now missing an index, a blocking chain that is holding up dozens of sessions.

SQL Server exposes its internals through Dynamic Management Views (DMVs) — system views that reflect the live state of the engine. Querying DMVs requires no third-party tools and no special setup. Every DBA and developer with appropriate permissions can query them directly in SSMS or any SQL client. The key DMV families covered in this lesson are: wait statistics, query performance, index usage, blocking, and database file I/O.

DMVs expose real-time engine state — connections, waits, query plans, index usage, blocking
No additional tools are required — DMV queries run in any SQL client with VIEW SERVER STATE permission
Wait statistics reveal what the engine is waiting for — the single most diagnostic signal in SQL Server
Missing index DMVs show SQL Server's own recommendations based on actual query patterns
Blocking DMVs identify sessions holding locks that are stalling other work
DMV data resets when SQL Server restarts — it reflects only the period since the last service start

-- Server health snapshot: connections, memory, CPU, and uptime.
-- A quick first look when investigating a performance complaint.
SELECT
    @@SERVERNAME                            AS ServerName,
    @@VERSION                               AS SqlVersion,
    DATEDIFF(HOUR, sqlserver_start_time, GETDATE())
                                            AS UptimeHours,
    cpu_count                               AS LogicalCPUs,
    physical_memory_kb / 1024              AS PhysicalMemoryMB,
    (SELECT COUNT(*) FROM sys.dm_exec_sessions
     WHERE is_user_process = 1)            AS ActiveUserSessions,
    (SELECT COUNT(*) FROM sys.dm_exec_requests
     WHERE status = 'running')             AS ActiveRequests,
    (SELECT COUNT(*) FROM sys.dm_exec_requests
     WHERE blocking_session_id > 0)        AS BlockedRequests
FROM sys.dm_os_sys_info;
-- Active sessions with their current statement and wait type
SELECT TOP 20
    s.session_id,
    s.login_name,
    s.host_name,
    s.program_name,
    r.status,
    r.wait_type,
    r.wait_time / 1000.0        AS WaitSecs,
    r.cpu_time / 1000.0         AS CpuSecs,
    r.total_elapsed_time / 1000.0 AS ElapsedSecs,
    SUBSTRING(t.text, (r.statement_start_offset/2)+1,
        ((CASE r.statement_end_offset WHEN -1 THEN DATALENGTH(t.text)
          ELSE r.statement_end_offset END - r.statement_start_offset)/2)+1)
                                AS CurrentStatement
FROM sys.dm_exec_sessions    s
JOIN sys.dm_exec_requests    r ON s.session_id = r.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE s.is_user_process = 1
ORDER BY r.total_elapsed_time DESC;

-- Server health snapshot
ServerName | UptimeHours | LogicalCPUs | PhysicalMemoryMB | ActiveUserSessions | ActiveRequests | BlockedRequests
-------------|-------------|-------------|------------------|--------------------|----------------|----------------
DPLX-SQL-01 | 312 | 8 | 16384 | 23 | 4 | 1

-- Active sessions (top 5 by elapsed time)
session_id | login_name | status | wait_type | WaitSecs | CpuSecs | ElapsedSecs | CurrentStatement
-----------|--------------|---------|--------------|----------|---------|-------------|------------------
55 | DPLX\svc_app | running | PAGEIOLATCH_SH | 12.4 | 8.2 | 14.1 | SELECT o.OrderId...
62 | app_readonly | suspended | LCK_M_S | 8.1 | 0.0 | 8.1 | SELECT * FROM...
41 | DPLX\svc_app | running | CXPACKET | 3.2 | 18.4 | 5.6 | INSERT INTO...
(3 rows shown)

One blocked request is visible in the health snapshot — session 62 is waiting on a shared lock (LCK_M_S)
PAGEIOLATCH_SH on session 55 means the engine is waiting for a data page to be read from disk — a storage I/O signal
CXPACKET on session 41 indicates parallel query exchange — common on large aggregation or sort operations
CROSS APPLY sys.dm_exec_sql_text extracts the exact T-SQL text being executed by each active request

Wait Statistics — The Most Diagnostic Signal

Every time SQL Server cannot immediately proceed with work it records a wait. It logs what it was waiting for and for how long. The cumulative totals in sys.dm_os_wait_stats reveal the dominant bottleneck across the entire instance since the last restart. High CPU waits point to query plan problems. High I/O waits point to missing indexes or insufficient storage. High lock waits point to contention between concurrent transactions. Reading wait statistics is the fastest way to identify the category of a performance problem before drilling into individual queries.

-- Top wait types since last SQL Server restart.
-- Exclude known benign waits (idle waits that are always high).
-- This reveals the dominant bottleneck on the instance.
SELECT TOP 15
    wait_type,
    waiting_tasks_count                         AS WaitCount,
    wait_time_ms / 1000.0                       AS TotalWaitSecs,
    max_wait_time_ms / 1000.0                   AS MaxWaitSecs,
    ROUND(100.0 * wait_time_ms /
        NULLIF(SUM(wait_time_ms) OVER (), 0), 2) AS PctOfTotal
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN (               -- exclude benign idle waits
    'SLEEP_TASK', 'BROKER_TO_FLUSH', 'BROKER_TASK_STOP',
    'CLR_AUTO_EVENT', 'DISPATCHER_QUEUE_SEMAPHORE',
    'FT_IFTS_SCHEDULER_IDLE_WAIT', 'HADR_WORK_QUEUE',
    'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
    'LAZYWRITER_SLEEP', 'LOGMGR_QUEUE',
    'REQUEST_FOR_DEADLOCK_SEARCH', 'RESOURCE_QUEUE',
    'SERVER_IDLE_CHECK', 'SLEEP_DBSTARTUP', 'SLEEP_DCOMSTARTUP',
    'SLEEP_MASTERDBREADY', 'SLEEP_MASTERMDREADY',
    'SLEEP_MASTERUPGRADED', 'SLEEP_MSDBSTARTUP',
    'SLEEP_SYSTEMTASK', 'SLEEP_TEMPDBSTARTUP',
    'SNI_HTTP_ACCEPT', 'SP_SERVER_DIAGNOSTICS_SLEEP',
    'SQLTRACE_BUFFER_FLUSH', 'WAITFOR',
    'XE_DISPATCHER_WAIT', 'XE_TIMER_EVENT'
)
  AND wait_time_ms > 0
ORDER BY wait_time_ms DESC;

wait_type | WaitCount | TotalWaitSecs | MaxWaitSecs | PctOfTotal
-------------------|-----------|---------------|-------------|------------
PAGEIOLATCH_SH | 48210 | 1842.30 | 14.10 | 34.2
LCK_M_S | 1240 | 980.50 | 28.40 | 18.2
CXPACKET | 8820 | 642.10 | 5.20 | 11.9
WRITELOG | 92100 | 480.20 | 0.80 | 8.9
SOS_SCHEDULER_YIELD| 124800 | 310.40 | 0.01 | 5.8
(5 rows shown -- truncated)

PAGEIOLATCH_SH at 34% of total waits means the engine is spending a third of its wait time reading data pages from disk — missing indexes or insufficient buffer pool memory are the likely causes
LCK_M_S at 18% confirms significant lock contention — long-running transactions holding locks while others wait for them
WRITELOG indicates transaction log write latency — the log disk may be too slow or shared with other workloads
SOS_SCHEDULER_YIELD at high counts suggests CPU pressure — queries competing for CPU time and yielding their scheduler slot

Top Queries by CPU and Elapsed Time

Once wait statistics identify the category of problem, sys.dm_exec_query_stats identifies the specific queries responsible. This DMV records the cumulative resource consumption of every query plan in the cache — total CPU, total elapsed time, total reads, total executions. Dividing by execution count gives the average cost per run, which reveals whether a query is slow because it is inefficient or simply because it runs thousands of times per minute.

-- Top 10 queries by total CPU time since last restart.
-- Divide by execution_count to find the average cost per run.
SELECT TOP 10
    qs.total_worker_time / 1000             AS TotalCPUms,
    qs.total_worker_time / qs.execution_count / 1000
                                            AS AvgCPUms,
    qs.total_elapsed_time / qs.execution_count / 1000
                                            AS AvgElapsedMs,
    qs.total_logical_reads / qs.execution_count
                                            AS AvgLogicalReads,
    qs.execution_count                      AS Executions,
    qs.total_logical_reads                  AS TotalLogicalReads,
    SUBSTRING(qt.text, (qs.statement_start_offset/2)+1,
        ((CASE qs.statement_end_offset WHEN -1 THEN DATALENGTH(qt.text)
          ELSE qs.statement_end_offset END - qs.statement_start_offset)/2)+1)
                                            AS QueryText,
    DB_NAME(qt.dbid)                        AS DatabaseName
FROM sys.dm_exec_query_stats  qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
WHERE qt.dbid = DB_ID()        -- current database only
ORDER BY qs.total_worker_time DESC;

TotalCPUms | AvgCPUms | AvgElapsedMs | AvgLogicalReads | Executions | QueryText
-----------|----------|--------------|-----------------|------------|----------
184230 | 1842 | 2104 | 48200 | 100 | SELECT o.OrderId, u.FirstName...
94820 | 9482 | 11240 | 312000 | 10 | SELECT p.Name, SUM(oi.Quantity)...
42100 | 421 | 380 | 8420 | 100 | UPDATE dbo.Orders SET Status...
18400 | 184 | 142 | 1240 | 100 | SELECT COUNT(*) FROM Orders...
(4 rows shown -- truncated)

-- Row 2 stands out: 9,482ms avg CPU and 312,000 avg logical reads per execution
-- Run only 10 times but each run is extremely expensive -- investigate its plan

Row 2 runs only 10 times but averages 9,482ms CPU and 312,000 logical reads per execution — a table scan on a large join is the likely cause
Row 1 has lower average cost but runs 100 times — its total CPU dominates because of frequency, not inefficiency per run
total_logical_reads is the most reliable indicator of missing index impact — each logical read is a page pulled from the buffer pool or disk
CROSS APPLY sys.dm_exec_sql_text retrieves the query text from the plan cache for each stats row

Missing Index Recommendations

SQL Server's query optimiser records every time it estimates that an index would have significantly improved a query it just compiled. These estimates accumulate in sys.dm_db_missing_index_details and related DMVs. The missing index DMVs do not tell you to blindly create every suggested index — they are a starting point. Cross-reference with sys.dm_db_index_usage_stats to confirm that similar indexes are not already unused before adding new ones.

-- Missing index recommendations ordered by estimated improvement impact.
-- The impact score is a rough guide -- investigate each suggestion before creating.
SELECT TOP 10
    ROUND(migs.avg_total_user_cost *
          migs.avg_user_impact *
         (migs.user_seeks + migs.user_scans), 0)  AS ImpactScore,
    migs.user_seeks                               AS Seeks,
    migs.user_scans                               AS Scans,
    ROUND(migs.avg_user_impact, 1)                AS AvgImpactPct,
    mid.statement                                 AS TableName,
    mid.equality_columns                          AS EqualityColumns,
    mid.inequality_columns                        AS InequalityColumns,
    mid.included_columns                          AS IncludedColumns,
    -- generate the CREATE INDEX statement automatically
    'CREATE INDEX IX_' +
        REPLACE(REPLACE(mid.statement, '[', ''), ']', '') +
        '_' + REPLACE(ISNULL(mid.equality_columns,'') +
              ISNULL('_' + mid.inequality_columns,''), ', ', '_') +
        ' ON ' + mid.statement +
        ' (' + ISNULL(mid.equality_columns, '') +
        CASE WHEN mid.inequality_columns IS NOT NULL
             THEN ISNULL(', ' + mid.inequality_columns, '') ELSE '' END + ')' +
        ISNULL(' INCLUDE (' + mid.included_columns + ')', '')
                                                  AS SuggestedCreateStatement
FROM sys.dm_db_missing_index_groups  mig
JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle
JOIN sys.dm_db_missing_index_details    mid  ON mig.index_handle        = mid.index_handle
WHERE mid.database_id = DB_ID()
ORDER BY ImpactScore DESC;

ImpactScore | Seeks | Scans | AvgImpactPct | TableName | EqualityColumns | InequalityColumns | IncludedColumns
------------|-------|-------|--------------|------------------|-----------------|-------------------|----------------
9842100 | 8420 | 0 | 94.2 | [dbo].[Orders] | [UserId] | [OrderDate] | [Total],[Status]
1240800 | 1240 | 0 | 82.4 | [dbo].[OrderItems]| [ProductId] | NULL | [Quantity],[Price]
384200 | 384 | 12 | 71.8 | [dbo].[Products] | [Category] | [Price] | [Name],[Stock]
(3 rows shown)

The top suggestion has an impact score of 9,842,100 — the optimiser estimates adding this index on Orders(UserId, OrderDate) would reduce query cost by 94.2% on average
The generated CREATE INDEX statement is a starting template — review it, rename it to your convention, and test it in a non-production environment first
Do not create every suggested index blindly — each index adds write overhead and maintenance cost; a table with 20 indexes may write slower than one with 5 well-chosen indexes
Combine missing index data with sys.dm_db_index_usage_stats to ensure you are not adding an index that duplicates an existing unused one

Detecting Blocking and Deadlocks

Blocking occurs when one session holds a lock and another session needs the same lock to proceed. The waiting session is suspended until the blocking session commits or rolls back. In a busy OLTP system, short blocks lasting a few hundred milliseconds are normal. Blocks lasting seconds or minutes indicate transactions that are too long, isolation levels that are too strict, or missing indexes that cause lock escalation. sys.dm_exec_requests combined with sys.dm_os_waiting_tasks identifies active blocking chains in real time.

-- Real-time blocking chain: who is blocking whom and for how long?
-- blocking_session_id > 0 means the session is waiting for a lock.
SELECT
    r.session_id                            AS BlockedSession,
    r.blocking_session_id                   AS BlockingSession,
    r.wait_type,
    r.wait_time / 1000.0                    AS WaitSecs,
    r.status,
    SUBSTRING(bt.text, 1, 200)              AS BlockedStatement,
    s.login_name                            AS BlockedLogin,
    bs.login_name                           AS BlockingLogin,
    bs.last_request_start_time              AS BlockingStarted
FROM sys.dm_exec_requests          r
JOIN sys.dm_exec_sessions          s  ON r.session_id          = s.session_id
JOIN sys.dm_exec_sessions          bs ON r.blocking_session_id = bs.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) bt
WHERE r.blocking_session_id > 0
ORDER BY r.wait_time DESC;
-- Kill a blocking session (use with care -- rolls back its open transaction):
-- KILL 55;   -- terminates session 55 and rolls back any uncommitted work

BlockedSession | BlockingSession | wait_type | WaitSecs | BlockedStatement | BlockedLogin | BlockingLogin | BlockingStarted
---------------|-----------------|-----------|----------|--------------------------|--------------|---------------|------------------
62 | 55 | LCK_M_S | 28.4 | SELECT * FROM dbo.Orders | app_readonly | DPLX\svc_app | 2024-07-01 14:22:08
71 | 55 | LCK_M_U | 15.2 | UPDATE dbo.Orders SET... | DPLX\svc_app | DPLX\svc_app | 2024-07-01 14:22:08
(2 rows affected)

-- Session 55 (DPLX\svc_app) is blocking both session 62 and session 71
-- It has been running since 14:22:08 -- check what transaction it has open

Session 55 is the head blocker — it is holding locks that both session 62 and session 71 need
LCK_M_S is a shared lock wait (SELECT blocked) and LCK_M_U is an update lock wait — both are waiting for session 55 to release its transaction
KILL 55 would terminate session 55, roll back its open transaction, and unblock both waiting sessions — use only when the blocking is severe and the business impact of the rollback is understood
Repeated blocking on the same table usually points to a missing index causing lock escalation or a transaction that opens a lock and then waits on slow application logic before committing

Lesson Summary

DMV / Concept	What It Shows	Key Use
sys.dm_os_sys_info	Server hardware, uptime, CPU count, memory	First stop health snapshot
sys.dm_exec_sessions	All active connections — login, host, program, status	Who is connected right now
sys.dm_exec_requests	Currently executing requests — wait type, CPU, elapsed time	What is running and what is it waiting for
sys.dm_os_wait_stats	Cumulative wait totals since last restart — by wait type	Identify the dominant bottleneck category
PAGEIOLATCH_SH	Waiting to read a data page from disk	Missing indexes or I/O subsystem pressure
LCK_M_S / LCK_M_U	Waiting for a shared or update lock	Lock contention between concurrent sessions
sys.dm_exec_query_stats	Cumulative resource usage per cached query plan	Find the most expensive queries by CPU or reads
sys.dm_exec_sql_text	Retrieves T-SQL text from a sql_handle	CROSS APPLY to get query text alongside stats
sys.dm_db_missing_index_details	Indexes the optimiser wished existed during query compilation	Starting point for index tuning — not a definitive list
blocking_session_id	Non-zero on a request means it is waiting for another session's lock	Filter `sys.dm_exec_requests` to find active blocking chains
KILL	Terminates a session and rolls back its open transaction	Last resort to break a severe blocking chain

Practice Questions

Practice 1. Which DMV shows the cumulative wait totals across all wait types since the last SQL Server restart?

Practice 2. A wait type of PAGEIOLATCH_SH dominates your wait statistics. What is the most likely cause?

Practice 3. How do you determine the average CPU cost per execution of a cached query using sys.dm_exec_query_stats?

Practice 4. What does a non-zero blocking_session_id on a row in sys.dm_exec_requests mean?

Practice 5. Why should you not create every index suggested by the missing index DMVs?

Quiz

Quiz 1. You run a health check and see BlockedRequests = 8. Which DMV and filter do you query next to identify the head blocker?

sys.dm_os_wait_stats WHERE wait_type = 'LCK_M_S'
sys.dm_exec_query_stats ORDER BY total_worker_time DESC
sys.dm_exec_requests WHERE blocking_session_id > 0 — join to find the head blocker
sys.dm_db_missing_index_details WHERE database_id = DB_ID()

Quiz 2. A query runs 5,000 times per day with an average of 2ms CPU. Another runs 3 times with an average of 8,000ms CPU. Which should you investigate first for tuning?

The high-frequency query — it runs more often so it must use more resources overall
The 8,000ms query — high average CPU per execution strongly suggests a missing index or bad plan
Neither — 8,000ms is within acceptable limits for complex queries
The query with higher total_logical_reads regardless of execution count

Quiz 3. DMV data for wait statistics and query stats resets when SQL Server restarts. What does this mean for long-term monitoring?

DMVs are unreliable and should not be used for production monitoring
DMVs only reflect the period since the last restart — capture snapshots regularly or use Query Store, which persists data across restarts
Restarts only clear data older than 30 days from DMVs
DMV data persists in tempdb across restarts automatically

Quiz 4. What T-SQL operator is used to retrieve query text from sys.dm_exec_sql_text alongside rows from sys.dm_exec_query_stats?

INNER JOIN
LEFT JOIN
CROSS APPLY — it calls the table-valued function once per row and joins the result inline
OUTER APPLY

Quiz 5. You issue KILL 55 to break a blocking chain. What happens to session 55's open transaction?

It is committed automatically to preserve the work already done
It remains open until the application reconnects and commits
It is rolled back automatically — KILL terminates the session and SQL Server rolls back any uncommitted work to maintain data consistency
SQL Server raises an error and refuses to kill a session with an open transaction

Next up — SQL Server in the Cloud (Azure SQL) — How Azure SQL Database differs from on-premises SQL Server, its deployment options, and how to migrate and manage databases in the cloud.

← Previous Course Index Next →

MS SQL Server