MS SQL Server
Subqueries
A subquery is a SELECT statement nested inside another SQL statement. It lets you use the result of one query as input to another — as a filter value, as a derived table, as a calculated column, or as the basis of an existence check. Subqueries are one of the most expressive tools in T-SQL: they make it possible to answer questions that cannot be expressed in a single flat SELECT, and they allow complex logic to be broken into readable, layered steps. Understanding where subqueries can appear, how they relate to the outer query, and when a JOIN or CTE is a better alternative gives you the full range of options for structuring any query correctly and efficiently.
Scalar Subquery — Single Value in SELECT or WHERE
A scalar subquery returns exactly one row and one column — a single value. It can appear anywhere a single value is valid: in the SELECT list as a computed column, in WHERE as a comparison value, or in SET within an UPDATE. If a scalar subquery returns more than one row SQL Server raises an error at runtime. If it returns no rows it returns NULL. Scalar subqueries are self-contained — they can reference the outer query's tables using a correlated reference, or they can be completely independent. Independent scalar subqueries are evaluated once and their result is reused; correlated ones are re-evaluated for every row of the outer query.
-- Scalar subquery in SELECT — compare each product price to the overall average
USE DataplexaStore;
SELECT
ProductName,
Category,
Price,
(SELECT AVG(Price) FROM Products) AS avg_all_products,
Price - (SELECT AVG(Price) FROM Products) AS diff_from_avg,
CASE
WHEN Price > (SELECT AVG(Price) FROM Products) THEN 'Above Average'
WHEN Price < (SELECT AVG(Price) FROM Products) THEN 'Below Average'
ELSE 'Average'
END AS price_band
FROM Products
ORDER BY Price DESC;
-- Scalar subquery in WHERE — products priced above the average
SELECT ProductName, Category, Price
FROM Products
WHERE Price > (SELECT AVG(Price) FROM Products)
ORDER BY Price DESC;──────────────────── ─────────── ────── ──────────────── ───────────── ────────────
Monitor 27 inch Electronics 359.99 109.87 250.12 Above Average
Standing Desk Home Office 367.49 109.87 257.62 Above Average
Ergonomic Chair Home Office 262.49 109.87 152.62 Above Average
Mechanical Keyboard Electronics 80.99 109.87 -28.88 Below Average
Webcam HD Electronics 71.99 109.87 -37.88 Below Average
USB-C Hub Electronics 44.99 109.87 -64.88 Below Average
Desk Lamp Home Office 36.74 109.87 -73.13 Below Average
Wireless Mouse Electronics 29.69 109.87 -80.18 Below Average
Notebook A5 Stationery 12.99 109.87 -96.88 Below Average
Ballpoint Pen Set Stationery 8.99 109.87 -100.88 Below Average
-- Products above average price:
ProductName Category Price
─────────────── ─────────── ──────
Standing Desk Home Office 367.49
Monitor 27 inch Electronics 359.99
Ergonomic Chair Home Office 262.49
- The scalar subquery
(SELECT AVG(Price) FROM Products)is evaluated once and its result (109.87) is reused in all three places it appears in the SELECT — SQL Server is smart enough not to run it three times per row - Only three products are priced above the average — this is expected for any right-skewed price distribution where a few premium items pull the mean well above the majority of products
- Prices reflect the discounts applied in L19 (10% on Electronics) and the 5% increase applied to Home Office in L17 — the DataplexaStore dataset carries the cumulative effect of all changes made across the lesson series
Subquery in FROM — Derived Table
A subquery in the FROM clause is called a derived table or inline view. It acts exactly like a regular table — you can SELECT from it, JOIN it to other tables, filter it with WHERE, and aggregate it. The subquery must be given an alias. Derived tables are evaluated once and their result is used as the data source for the outer query. They are the correct tool when you need to aggregate data before joining it, pre-filter rows before a GROUP BY, or apply a transformation that cannot be expressed in a single level of SELECT.
-- Derived table — aggregate OrderItems before joining to Orders and Users
SELECT
u.FullName,
o.OrderID,
o.Status,
oi_summary.line_count,
oi_summary.calculated_total
FROM Orders o
JOIN Users u ON o.UserID = u.UserID
JOIN (
-- This subquery runs first, producing one row per OrderID
SELECT
OrderID,
COUNT(*) AS line_count,
SUM(Quantity * UnitPrice) AS calculated_total
FROM OrderItems
GROUP BY OrderID
) AS oi_summary ON o.OrderID = oi_summary.OrderID
ORDER BY o.OrderID;
-- Derived table to find the top-spending user per country
SELECT
country_totals.Country,
country_totals.FullName,
country_totals.total_spent
FROM (
SELECT
u.Country,
u.FullName,
SUM(o.TotalAmount) AS total_spent,
RANK() OVER (PARTITION BY u.Country ORDER BY SUM(o.TotalAmount) DESC) AS country_rank
FROM Users u
JOIN Orders o ON u.UserID = o.UserID
GROUP BY u.Country, u.FullName
) AS country_totals
WHERE country_totals.country_rank = 1
ORDER BY country_totals.total_spent DESC;FullName OrderID Status line_count calculated_total
──────────── ─────── ───────── ────────── ────────────────
Bob Williams 2 delivered 1 89.99
Clara Davis 3 shipped 2 159.91
David Kim 5 shipped 2 42.98
Bob Williams 6 delivered 1 80.99
-- Top spender per country:
Country FullName total_spent
─────── ──────────── ───────────
UK Bob Williams 512.92
Canada Clara Davis 159.91
South Korea David Kim 42.98
- The derived table oi_summary aggregates OrderItems into one row per OrderID before the JOIN — without this aggregation the JOIN would produce one row per order item, requiring GROUP BY on the outer query and making the query harder to read
- The second derived table uses RANK() OVER (PARTITION BY Country) — a window function inside the subquery, which cannot be used directly in a WHERE clause; the derived table wrapper is the standard technique to filter on window function results
- Only three countries have users who have placed orders — the remaining six users have never ordered, so they do not appear in the country_totals subquery
Correlated Subquery
A correlated subquery references a column from the outer query inside the subquery — it is re-evaluated once for every row the outer query processes. This makes correlated subqueries powerful but potentially expensive: on a table with one million rows, a correlated subquery runs one million times. They are the correct tool for row-by-row calculations that genuinely depend on the current outer row — finding each product's rank within its own category, or checking whether a related record exists. When a correlated subquery can be replaced by a JOIN or a window function it usually should be, for performance. But for EXISTS checks — where you simply need to know whether any matching row exists — correlated subqueries are often the cleanest and most readable solution.
-- Correlated subquery — each product compared to its own category average
-- The inner query references p.Category from the outer query
SELECT
p.ProductName,
p.Category,
p.Price,
(
SELECT AVG(Price)
FROM Products
WHERE Category = p.Category -- correlated reference to outer query's row
) AS category_avg_price,
p.Price - (
SELECT AVG(Price)
FROM Products
WHERE Category = p.Category
) AS diff_from_category_avg
FROM Products p
ORDER BY p.Category, p.Price DESC;──────────────────── ─────────── ────── ────────────────── ──────────────────────
Monitor 27 inch Electronics 359.99 117.53 242.46
Mechanical Keyboard Electronics 80.99 117.53 -36.54
Webcam HD Electronics 71.99 117.53 -45.54
USB-C Hub Electronics 44.99 117.53 -72.54
Wireless Mouse Electronics 29.69 117.53 -87.84
Standing Desk Home Office 367.49 222.24 145.25
Ergonomic Chair Home Office 262.49 222.24 40.25
Desk Lamp Home Office 36.74 222.24 -185.50
Notebook A5 Stationery 12.99 10.99 2.00
Ballpoint Pen Set Stationery 8.99 10.99 -2.00
- Each row uses its own category to compute a different average — Electronics average is 117.53, Home Office is 222.24, Stationery is 10.99; a non-correlated subquery could not produce three different averages in a single pass
- This exact result is more efficiently produced with a window function:
AVG(Price) OVER (PARTITION BY Category)— but the correlated subquery version is shown here to illustrate the concept; window functions are covered in a dedicated lesson - The correlated subquery runs once per product row (10 times total here) — on a small table this is imperceptible; on a table with millions of rows the repeated execution becomes a significant cost
EXISTS and NOT EXISTS
EXISTS is a boolean operator that returns true if a correlated subquery produces at least one row, and false if it produces no rows. It does not care about column values — only whether any row matches. This makes EXISTS faster than IN for existence checks on large datasets because SQL Server can stop scanning the inner query the moment the first matching row is found. NOT EXISTS returns true when the subquery produces no rows — it is the cleanest and most NULL-safe alternative to NOT IN, avoiding the NULL trap that makes NOT IN unreliable when the subquery can return NULL values.
-- EXISTS — users who have placed at least one order
SELECT u.UserID, u.FullName, u.Country
FROM Users u
WHERE EXISTS (
SELECT 1 -- the value selected does not matter — only row existence
FROM Orders o
WHERE o.UserID = u.UserID -- correlated reference
)
ORDER BY u.FullName;
-- NOT EXISTS — users who have never placed an order (NULL-safe anti-join)
SELECT u.UserID, u.FullName, u.Country, u.MembershipTier
FROM Users u
WHERE NOT EXISTS (
SELECT 1
FROM Orders o
WHERE o.UserID = u.UserID
)
ORDER BY u.FullName;
-- EXISTS vs IN — functionally equivalent for non-NULL sets
-- IN form:
SELECT FullName FROM Users WHERE UserID IN (SELECT UserID FROM Orders);
-- EXISTS form (preferred when inner query is large or nullable):
SELECT FullName FROM Users
WHERE EXISTS (SELECT 1 FROM Orders WHERE Orders.UserID = Users.UserID);UserID FullName Country
────── ──────────── ───────────
2 Bob Williams UK
3 Clara Davis Canada
4 David Kim South Korea
-- Users without orders (NOT EXISTS):
UserID FullName Country MembershipTier
────── ──────────── ───────── ──────────────
5 Eva Martinez Spain Premium
6 Frank Chen China Standard
7 Grace Patel India VIP
8 Henry Osei Ghana Standard
9 Isla Nguyen Vietnam Premium
10 James Wilson Australia Standard
-- Both IN and EXISTS return identical results:
FullName
────────────
Bob Williams
Clara Davis
David Kim
SELECT 1inside EXISTS is conventional — EXISTS only checks whether any row was returned, never what values those rows contain; SELECT 1, SELECT *, and SELECT NULL all produce the same EXISTS result- NOT EXISTS is NULL-safe because it tests for the absence of rows, not the absence of a value — a NULL UserID in Orders would cause NOT IN to return zero results (the NULL trap), but NOT EXISTS would still correctly identify the unmatched users
- For this small dataset IN and EXISTS produce identical results — at scale EXISTS can be faster because SQL Server can short-circuit on the first matching row, while IN must materialise the entire inner result set
Subquery vs JOIN vs CTE — Choosing the Right Tool
Subqueries, JOINs, and CTEs (Common Table Expressions) often produce the same result but differ in readability, reusability, and sometimes performance. A JOIN is the right choice when you need columns from both tables in the result. A subquery is right when the inner query result is used as a filter or scalar value and its columns do not need to appear in the output. A CTE is right when the subquery would be repeated, deeply nested, or complex enough to benefit from being named and written once. SQL Server's query optimiser often rewrites subqueries as joins internally, so the performance difference between equivalent forms is frequently zero — but readability is a real and lasting cost.
-- Three equivalent ways to find products ordered more than once
-- Each produces the same result — compare the readability trade-offs
-- Option 1: Subquery in WHERE with IN
SELECT ProductName, Category, Price
FROM Products
WHERE ProductID IN (
SELECT ProductID
FROM OrderItems
GROUP BY ProductID
HAVING COUNT(*) > 1
)
ORDER BY ProductName;
-- Option 2: JOIN to a derived table
SELECT p.ProductName, p.Category, p.Price
FROM Products p
JOIN (
SELECT ProductID, COUNT(*) AS order_lines
FROM OrderItems
GROUP BY ProductID
HAVING COUNT(*) > 1
) AS multi_ordered ON p.ProductID = multi_ordered.ProductID
ORDER BY p.ProductName;
-- Option 3: CTE (Common Table Expression) — named, readable, reusable
WITH MultiOrdered AS (
SELECT ProductID, COUNT(*) AS order_lines
FROM OrderItems
GROUP BY ProductID
HAVING COUNT(*) > 1
)
SELECT p.ProductName, p.Category, p.Price, m.order_lines
FROM Products p
JOIN MultiOrdered m ON p.ProductID = m.ProductID
ORDER BY p.ProductName;ProductName Category Price
──────────── ─────────── ─────
Notebook A5 Stationery 12.99
Monitor 27 inch Electronics 359.99
Wireless Mouse Electronics 29.69
Mechanical Keyboard Electronics 80.99
USB-C Hub Electronics 44.99
Desk Lamp Home Office 36.74
-- Option 2 (derived table JOIN) — identical result
-- Option 3 (CTE) — adds order_lines column:
ProductName Category Price order_lines
──────────────────── ─────────── ────── ───────────
Desk Lamp Home Office 36.74 2
Mechanical Keyboard Electronics 80.99 2
Monitor 27 inch Electronics 359.99 2
Notebook A5 Stationery 12.99 2
USB-C Hub Electronics 44.99 2
Wireless Mouse Electronics 29.69 2
- All three produce the same filtered product list — the CTE form additionally surfaces the order_lines count because the named CTE allows its columns to be selected, unlike the IN subquery which only uses the inner result for filtering
- The CTE is the most readable of the three for a reader encountering the query for the first time — the WITH block names the logic before it is used, reading top-to-bottom rather than inside-out
- SQL Server's optimiser frequently rewrites subqueries and CTEs as joins internally — always check the execution plan if performance matters rather than assuming one syntax is faster than another
Summary Table
| Subquery Type | Where It Appears | Returns | Key Point |
|---|---|---|---|
| Scalar | SELECT list, WHERE, SET | Exactly one value | Error if more than one row returned |
| Derived Table | FROM clause | A result set (table) | Must be aliased — aggregate before joining |
| Correlated | SELECT list, WHERE | Value per outer row | Re-evaluated per row — expensive at scale |
| EXISTS / NOT EXISTS | WHERE clause | TRUE or FALSE | NULL-safe — preferred over NOT IN |
| IN subquery | WHERE clause | A set of values | Avoid NOT IN when inner query can return NULL |
Practice Questions
Practice 1. What happens if a scalar subquery returns more than one row?
Practice 2. What is the difference between a correlated and a non-correlated subquery?
Practice 3. Why is NOT EXISTS preferred over NOT IN when the inner query might return NULL values?
Practice 4. Why must a derived table in the FROM clause be given an alias?
Practice 5. When should you prefer a CTE over a subquery?
Quiz
Quiz 1. A scalar subquery returns no rows. What value does SQL Server substitute?
Quiz 2. What does SELECT 1 inside an EXISTS subquery signify?
Quiz 3. Which subquery type is re-evaluated once for every row in the outer query?
Quiz 4. A derived table in the FROM clause must have what?
Quiz 5. NOT IN (SELECT UserID FROM Orders) returns zero rows if the Orders table contains a NULL UserID. Why?
Next up - Views - Save your most important queries as named database objects and control exactly what different users and applications can see.