MS SQL Lesson 26 – Subqueries | Dataplexa

Subqueries

A subquery is a SELECT statement nested inside another SQL statement. It lets you use the result of one query as input to another — as a filter value, as a derived table, as a calculated column, or as the basis of an existence check. Subqueries are one of the most expressive tools in T-SQL: they make it possible to answer questions that cannot be expressed in a single flat SELECT, and they allow complex logic to be broken into readable, layered steps. Understanding where subqueries can appear, how they relate to the outer query, and when a JOIN or CTE is a better alternative gives you the full range of options for structuring any query correctly and efficiently.

Scalar Subquery — Single Value in SELECT or WHERE

A scalar subquery returns exactly one row and one column — a single value. It can appear anywhere a single value is valid: in the SELECT list as a computed column, in WHERE as a comparison value, or in SET within an UPDATE. If a scalar subquery returns more than one row SQL Server raises an error at runtime. If it returns no rows it returns NULL. Scalar subqueries are self-contained — they can reference the outer query's tables using a correlated reference, or they can be completely independent. Independent scalar subqueries are evaluated once and their result is reused; correlated ones are re-evaluated for every row of the outer query.

-- Scalar subquery in SELECT — compare each product price to the overall average
USE DataplexaStore;

SELECT
    ProductName,
    Category,
    Price,
    (SELECT AVG(Price) FROM Products)                   AS avg_all_products,
    Price - (SELECT AVG(Price) FROM Products)           AS diff_from_avg,
    CASE
        WHEN Price > (SELECT AVG(Price) FROM Products)  THEN 'Above Average'
        WHEN Price < (SELECT AVG(Price) FROM Products)  THEN 'Below Average'
        ELSE 'Average'
    END                                                 AS price_band
FROM   Products
ORDER BY Price DESC;

-- Scalar subquery in WHERE — products priced above the average
SELECT ProductName, Category, Price
FROM   Products
WHERE  Price > (SELECT AVG(Price) FROM Products)
ORDER BY Price DESC;

ProductName Category Price avg_all_products diff_from_avg price_band
──────────────────── ─────────── ────── ──────────────── ───────────── ────────────
Monitor 27 inch Electronics 359.99 109.87 250.12 Above Average
Standing Desk Home Office 367.49 109.87 257.62 Above Average
Ergonomic Chair Home Office 262.49 109.87 152.62 Above Average
Mechanical Keyboard Electronics 80.99 109.87 -28.88 Below Average
Webcam HD Electronics 71.99 109.87 -37.88 Below Average
USB-C Hub Electronics 44.99 109.87 -64.88 Below Average
Desk Lamp Home Office 36.74 109.87 -73.13 Below Average
Wireless Mouse Electronics 29.69 109.87 -80.18 Below Average
Notebook A5 Stationery 12.99 109.87 -96.88 Below Average
Ballpoint Pen Set Stationery 8.99 109.87 -100.88 Below Average

-- Products above average price:
ProductName Category Price
─────────────── ─────────── ──────
Standing Desk Home Office 367.49
Monitor 27 inch Electronics 359.99
Ergonomic Chair Home Office 262.49

The scalar subquery (SELECT AVG(Price) FROM Products) is evaluated once and its result (109.87) is reused in all three places it appears in the SELECT — SQL Server is smart enough not to run it three times per row
Only three products are priced above the average — this is expected for any right-skewed price distribution where a few premium items pull the mean well above the majority of products
Prices reflect the discounts applied in L19 (10% on Electronics) and the 5% increase applied to Home Office in L17 — the DataplexaStore dataset carries the cumulative effect of all changes made across the lesson series

Subquery in FROM — Derived Table

A subquery in the FROM clause is called a derived table or inline view. It acts exactly like a regular table — you can SELECT from it, JOIN it to other tables, filter it with WHERE, and aggregate it. The subquery must be given an alias. Derived tables are evaluated once and their result is used as the data source for the outer query. They are the correct tool when you need to aggregate data before joining it, pre-filter rows before a GROUP BY, or apply a transformation that cannot be expressed in a single level of SELECT.

-- Derived table — aggregate OrderItems before joining to Orders and Users

SELECT
    u.FullName,
    o.OrderID,
    o.Status,
    oi_summary.line_count,
    oi_summary.calculated_total
FROM       Orders  o
JOIN       Users   u            ON o.UserID    = u.UserID
JOIN (
    -- This subquery runs first, producing one row per OrderID
    SELECT
        OrderID,
        COUNT(*)                AS line_count,
        SUM(Quantity * UnitPrice) AS calculated_total
    FROM   OrderItems
    GROUP BY OrderID
) AS oi_summary                 ON o.OrderID = oi_summary.OrderID
ORDER BY o.OrderID;

-- Derived table to find the top-spending user per country
SELECT
    country_totals.Country,
    country_totals.FullName,
    country_totals.total_spent
FROM (
    SELECT
        u.Country,
        u.FullName,
        SUM(o.TotalAmount)      AS total_spent,
        RANK() OVER (PARTITION BY u.Country ORDER BY SUM(o.TotalAmount) DESC) AS country_rank
    FROM   Users  u
    JOIN   Orders o ON u.UserID = o.UserID
    GROUP BY u.Country, u.FullName
) AS country_totals
WHERE country_totals.country_rank = 1
ORDER BY country_totals.total_spent DESC;

-- Orders with line item summary:
FullName OrderID Status line_count calculated_total
──────────── ─────── ───────── ────────── ────────────────
Bob Williams 2 delivered 1 89.99
Clara Davis 3 shipped 2 159.91
David Kim 5 shipped 2 42.98
Bob Williams 6 delivered 1 80.99

-- Top spender per country:
Country FullName total_spent
─────── ──────────── ───────────
UK Bob Williams 512.92
Canada Clara Davis 159.91
South Korea David Kim 42.98

The derived table oi_summary aggregates OrderItems into one row per OrderID before the JOIN — without this aggregation the JOIN would produce one row per order item, requiring GROUP BY on the outer query and making the query harder to read
The second derived table uses RANK() OVER (PARTITION BY Country) — a window function inside the subquery, which cannot be used directly in a WHERE clause; the derived table wrapper is the standard technique to filter on window function results
Only three countries have users who have placed orders — the remaining six users have never ordered, so they do not appear in the country_totals subquery

Correlated Subquery

A correlated subquery references a column from the outer query inside the subquery — it is re-evaluated once for every row the outer query processes. This makes correlated subqueries powerful but potentially expensive: on a table with one million rows, a correlated subquery runs one million times. They are the correct tool for row-by-row calculations that genuinely depend on the current outer row — finding each product's rank within its own category, or checking whether a related record exists. When a correlated subquery can be replaced by a JOIN or a window function it usually should be, for performance. But for EXISTS checks — where you simply need to know whether any matching row exists — correlated subqueries are often the cleanest and most readable solution.

-- Correlated subquery — each product compared to its own category average
-- The inner query references p.Category from the outer query

SELECT
    p.ProductName,
    p.Category,
    p.Price,
    (
        SELECT AVG(Price)
        FROM   Products
        WHERE  Category = p.Category    -- correlated reference to outer query's row
    )                                   AS category_avg_price,
    p.Price - (
        SELECT AVG(Price)
        FROM   Products
        WHERE  Category = p.Category
    )                                   AS diff_from_category_avg
FROM   Products p
ORDER BY p.Category, p.Price DESC;

ProductName Category Price category_avg_price diff_from_category_avg
──────────────────── ─────────── ────── ────────────────── ──────────────────────
Monitor 27 inch Electronics 359.99 117.53 242.46
Mechanical Keyboard Electronics 80.99 117.53 -36.54
Webcam HD Electronics 71.99 117.53 -45.54
USB-C Hub Electronics 44.99 117.53 -72.54
Wireless Mouse Electronics 29.69 117.53 -87.84
Standing Desk Home Office 367.49 222.24 145.25
Ergonomic Chair Home Office 262.49 222.24 40.25
Desk Lamp Home Office 36.74 222.24 -185.50
Notebook A5 Stationery 12.99 10.99 2.00
Ballpoint Pen Set Stationery 8.99 10.99 -2.00

Each row uses its own category to compute a different average — Electronics average is 117.53, Home Office is 222.24, Stationery is 10.99; a non-correlated subquery could not produce three different averages in a single pass
This exact result is more efficiently produced with a window function: AVG(Price) OVER (PARTITION BY Category) — but the correlated subquery version is shown here to illustrate the concept; window functions are covered in a dedicated lesson
The correlated subquery runs once per product row (10 times total here) — on a small table this is imperceptible; on a table with millions of rows the repeated execution becomes a significant cost

EXISTS and NOT EXISTS

EXISTS is a boolean operator that returns true if a correlated subquery produces at least one row, and false if it produces no rows. It does not care about column values — only whether any row matches. This makes EXISTS faster than IN for existence checks on large datasets because SQL Server can stop scanning the inner query the moment the first matching row is found. NOT EXISTS returns true when the subquery produces no rows — it is the cleanest and most NULL-safe alternative to NOT IN, avoiding the NULL trap that makes NOT IN unreliable when the subquery can return NULL values.

-- EXISTS — users who have placed at least one order
SELECT u.UserID, u.FullName, u.Country
FROM   Users u
WHERE  EXISTS (
    SELECT 1                    -- the value selected does not matter — only row existence
    FROM   Orders o
    WHERE  o.UserID = u.UserID  -- correlated reference
)
ORDER BY u.FullName;

-- NOT EXISTS — users who have never placed an order (NULL-safe anti-join)
SELECT u.UserID, u.FullName, u.Country, u.MembershipTier
FROM   Users u
WHERE  NOT EXISTS (
    SELECT 1
    FROM   Orders o
    WHERE  o.UserID = u.UserID
)
ORDER BY u.FullName;

-- EXISTS vs IN — functionally equivalent for non-NULL sets
-- IN form:
SELECT FullName FROM Users WHERE UserID IN (SELECT UserID FROM Orders);

-- EXISTS form (preferred when inner query is large or nullable):
SELECT FullName FROM Users
WHERE  EXISTS (SELECT 1 FROM Orders WHERE Orders.UserID = Users.UserID);

-- Users with orders (EXISTS):
UserID FullName Country
────── ──────────── ───────────
2 Bob Williams UK
3 Clara Davis Canada
4 David Kim South Korea

-- Users without orders (NOT EXISTS):
UserID FullName Country MembershipTier
────── ──────────── ───────── ──────────────
5 Eva Martinez Spain Premium
6 Frank Chen China Standard
7 Grace Patel India VIP
8 Henry Osei Ghana Standard
9 Isla Nguyen Vietnam Premium
10 James Wilson Australia Standard

-- Both IN and EXISTS return identical results:
FullName
────────────
Bob Williams
Clara Davis
David Kim

SELECT 1 inside EXISTS is conventional — EXISTS only checks whether any row was returned, never what values those rows contain; SELECT 1, SELECT *, and SELECT NULL all produce the same EXISTS result
NOT EXISTS is NULL-safe because it tests for the absence of rows, not the absence of a value — a NULL UserID in Orders would cause NOT IN to return zero results (the NULL trap), but NOT EXISTS would still correctly identify the unmatched users
For this small dataset IN and EXISTS produce identical results — at scale EXISTS can be faster because SQL Server can short-circuit on the first matching row, while IN must materialise the entire inner result set

Subquery vs JOIN vs CTE — Choosing the Right Tool

Subqueries, JOINs, and CTEs (Common Table Expressions) often produce the same result but differ in readability, reusability, and sometimes performance. A JOIN is the right choice when you need columns from both tables in the result. A subquery is right when the inner query result is used as a filter or scalar value and its columns do not need to appear in the output. A CTE is right when the subquery would be repeated, deeply nested, or complex enough to benefit from being named and written once. SQL Server's query optimiser often rewrites subqueries as joins internally, so the performance difference between equivalent forms is frequently zero — but readability is a real and lasting cost.

-- Three equivalent ways to find products ordered more than once
-- Each produces the same result — compare the readability trade-offs

-- Option 1: Subquery in WHERE with IN
SELECT ProductName, Category, Price
FROM   Products
WHERE  ProductID IN (
    SELECT   ProductID
    FROM     OrderItems
    GROUP BY ProductID
    HAVING   COUNT(*) > 1
)
ORDER BY ProductName;

-- Option 2: JOIN to a derived table
SELECT p.ProductName, p.Category, p.Price
FROM   Products p
JOIN (
    SELECT   ProductID, COUNT(*) AS order_lines
    FROM     OrderItems
    GROUP BY ProductID
    HAVING   COUNT(*) > 1
) AS multi_ordered ON p.ProductID = multi_ordered.ProductID
ORDER BY p.ProductName;

-- Option 3: CTE (Common Table Expression) — named, readable, reusable
WITH MultiOrdered AS (
    SELECT   ProductID, COUNT(*) AS order_lines
    FROM     OrderItems
    GROUP BY ProductID
    HAVING   COUNT(*) > 1
)
SELECT p.ProductName, p.Category, p.Price, m.order_lines
FROM   Products      p
JOIN   MultiOrdered  m ON p.ProductID = m.ProductID
ORDER BY p.ProductName;

-- Option 1 (IN subquery):
ProductName Category Price
──────────── ─────────── ─────
Notebook A5 Stationery 12.99
Monitor 27 inch Electronics 359.99
Wireless Mouse Electronics 29.69
Mechanical Keyboard Electronics 80.99
USB-C Hub Electronics 44.99
Desk Lamp Home Office 36.74

-- Option 2 (derived table JOIN) — identical result

-- Option 3 (CTE) — adds order_lines column:
ProductName Category Price order_lines
──────────────────── ─────────── ────── ───────────
Desk Lamp Home Office 36.74 2
Mechanical Keyboard Electronics 80.99 2
Monitor 27 inch Electronics 359.99 2
Notebook A5 Stationery 12.99 2
USB-C Hub Electronics 44.99 2
Wireless Mouse Electronics 29.69 2

All three produce the same filtered product list — the CTE form additionally surfaces the order_lines count because the named CTE allows its columns to be selected, unlike the IN subquery which only uses the inner result for filtering
The CTE is the most readable of the three for a reader encountering the query for the first time — the WITH block names the logic before it is used, reading top-to-bottom rather than inside-out
SQL Server's optimiser frequently rewrites subqueries and CTEs as joins internally — always check the execution plan if performance matters rather than assuming one syntax is faster than another

Summary Table

Subquery Type	Where It Appears	Returns	Key Point
Scalar	SELECT list, WHERE, SET	Exactly one value	Error if more than one row returned
Derived Table	FROM clause	A result set (table)	Must be aliased — aggregate before joining
Correlated	SELECT list, WHERE	Value per outer row	Re-evaluated per row — expensive at scale
EXISTS / NOT EXISTS	WHERE clause	TRUE or FALSE	NULL-safe — preferred over NOT IN
IN subquery	WHERE clause	A set of values	Avoid NOT IN when inner query can return NULL

Practice Questions

Practice 1. What happens if a scalar subquery returns more than one row?

Practice 2. What is the difference between a correlated and a non-correlated subquery?

Practice 3. Why is NOT EXISTS preferred over NOT IN when the inner query might return NULL values?

Practice 4. Why must a derived table in the FROM clause be given an alias?

Practice 5. When should you prefer a CTE over a subquery?

Quiz

Quiz 1. A scalar subquery returns no rows. What value does SQL Server substitute?

NULL
0
SQL Server raises an error
An empty string

Quiz 2. What does SELECT 1 inside an EXISTS subquery signify?

The column value is irrelevant — only row existence matters
It returns the count of matching rows
It limits the subquery to one row
It checks that the first column equals 1

Quiz 3. Which subquery type is re-evaluated once for every row in the outer query?

Correlated subquery
Scalar subquery
Derived table
EXISTS subquery

Quiz 4. A derived table in the FROM clause must have what?

An alias
An ORDER BY clause
A primary key column
A WHERE clause

Quiz 5. NOT IN (SELECT UserID FROM Orders) returns zero rows if the Orders table contains a NULL UserID. Why?

Any comparison with NULL produces unknown — the whole condition becomes unknown and the row is excluded
NULL causes a runtime error
SQL Server ignores NULLs in NOT IN
NOT IN filters NULLs automatically

Next up - Views - Save your most important queries as named database objects and control exactly what different users and applications can see.

← Previous Course Index Next →

MS SQL Server