PostgreSQL
Subqueries
A subquery is a SELECT statement nested inside another SQL statement. Subqueries let you use the result of one query as the input to another — filtering rows based on a computed list, comparing values to aggregates, or building multi-step logic all in a single statement. They are one of the most powerful tools in SQL and show up in WHERE clauses, FROM clauses, SELECT lists, and HAVING clauses. This lesson covers every major subquery pattern with real Dataplexa Store examples.
Subquery in WHERE — Filtering with a Computed List
The most common subquery pattern is using one in the WHERE clause. The inner query runs first, produces a result, and the outer query uses that result to filter its rows. This lets you express complex filters without needing a JOIN.
-- Find customers who have placed at least one order
SELECT first_name, last_name, email
FROM customers
WHERE id IN (
SELECT DISTINCT customer_id FROM orders
)
ORDER BY last_name;
-- Find products that have NEVER been ordered
SELECT name, price, category
FROM products
WHERE id NOT IN (
SELECT DISTINCT product_id FROM order_items
)
ORDER BY name;
-- Employees earning above the company average salary
SELECT first_name, last_name, salary, department
FROM employees
WHERE salary > (
SELECT AVG(salary) FROM employees
)
ORDER BY salary DESC;
first_name | last_name | salary | department
------------+-----------+----------+-------------
Sofia | Taylor | 92000.00 | Engineering
Marcus | Reed | 85000.00 | Engineering
James | Carter | 65000.00 | Sales
-- Never ordered products:
name | price | category
--------------+-------+----------
Old Keyboard | 12.99 | Accessories
Scalar Subquery — A Single Value
A scalar subquery returns exactly one row and one column — a single value. You can use it anywhere a single value is valid: in SELECT for a computed column, in WHERE for a comparison, or in SET inside an UPDATE. If the subquery returns more than one row, PostgreSQL throws an error.
-- Show each product's price alongside the overall average price
SELECT name,
price,
ROUND((SELECT AVG(price) FROM products), 2) AS avg_price,
price - ROUND((SELECT AVG(price) FROM products), 2) AS diff_from_avg
FROM products
ORDER BY diff_from_avg DESC;
-- Show total revenue alongside each order (scalar subquery in SELECT)
SELECT id,
total_amount,
ROUND((SELECT SUM(total_amount) FROM orders WHERE status != 'cancelled'), 2) AS company_total,
ROUND(total_amount / (SELECT SUM(total_amount) FROM orders WHERE status != 'cancelled') * 100, 2) AS pct_of_total
FROM orders
WHERE status = 'delivered'
ORDER BY pct_of_total DESC
LIMIT 5;
name | price | avg_price | diff_from_avg
----------------+--------+-----------+--------------
Standing Desk | 349.99 | 97.49 | 252.50
Ergonomic Chair| 299.99 | 97.49 | 202.50
USB Hub | 24.99 | 97.49 | -72.50
-- % of total revenue:
id | total_amount | company_total | pct_of_total
-----+--------------+---------------+--------------
178 | 598.50 | 23127.44 | 2.59
Subquery in FROM — Inline Views
When you put a subquery in the FROM clause, the result becomes a temporary table that the outer query can SELECT from, JOIN to, or filter. This is called a derived table or inline view. It is the technique to use when you need to aggregate first, then filter or aggregate again on the result.
-- Aggregate orders per customer, then filter on the summary
SELECT customer_summary.customer_id,
customer_summary.order_count,
customer_summary.total_spent
FROM (
SELECT customer_id,
COUNT(*) AS order_count,
ROUND(SUM(total_amount), 2) AS total_spent
FROM orders
WHERE status != 'cancelled'
GROUP BY customer_id
) AS customer_summary
WHERE customer_summary.total_spent > 300
ORDER BY customer_summary.total_spent DESC;
-- Find categories where average product price exceeds the store-wide average
SELECT category, avg_price
FROM (
SELECT category, ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category
) AS cat_avg
WHERE avg_price > (SELECT AVG(price) FROM products)
ORDER BY avg_price DESC;
customer_id | order_count | total_spent
-------------+-------------+-------------
3 | 8 | 892.45
12 | 7 | 741.20
-- Categories above store average price:
category | avg_price
------------+-----------
Furniture | 274.99
Electronics | 62.44
EXISTS and NOT EXISTS — Checking for Existence
EXISTS returns true if the subquery returns at least one row. It is often faster than IN for large datasets because PostgreSQL stops evaluating as soon as it finds the first matching row. NOT EXISTS is the reverse — it returns true only if the subquery returns no rows. The subquery in EXISTS is a correlated subquery — it references a column from the outer query using an alias.
-- EXISTS: customers who have placed at least one order
SELECT c.first_name, c.last_name, c.email
FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.id -- correlated: references outer query's c.id
)
ORDER BY c.last_name;
-- NOT EXISTS: customers who have NEVER ordered
SELECT c.first_name, c.last_name, c.email
FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.id
)
ORDER BY c.last_name;
-- EXISTS: products that appear in at least one order
SELECT p.name, p.price
FROM products p
WHERE EXISTS (
SELECT 1 FROM order_items oi
WHERE oi.product_id = p.id
)
ORDER BY p.name;
first_name | last_name | email
------------+-----------+------------------------
Carol | Davis | carol.davis@example.com
David | Lee | david.lee@example.com
(5 rows)
Correlated Subqueries
A correlated subquery references a column from the outer query. This means it runs once for every row the outer query processes — making it powerful but potentially slow on large tables. Use EXISTS as the correlated pattern when performance matters, or use a JOIN-based approach for large datasets.
-- For each customer, show their most recent order date (correlated)
SELECT c.first_name,
c.last_name,
(SELECT MAX(o.order_date)
FROM orders o
WHERE o.customer_id = c.id) AS last_order_date
FROM customers c
ORDER BY last_order_date DESC NULLS LAST
LIMIT 5;
-- For each product, show how many times it has been ordered (correlated)
SELECT p.name,
p.price,
(SELECT SUM(oi.quantity)
FROM order_items oi
WHERE oi.product_id = p.id) AS total_units_sold
FROM products p
ORDER BY total_units_sold DESC NULLS LAST
LIMIT 5;
first_name | last_name | last_order_date
------------+-----------+-----------------
Alice | Morgan | 2024-05-30
Noah | Martinez | 2024-05-28
Carol | Davis | NULL
-- Units sold per product:
name | price | total_units_sold
--------------+-------+------------------
USB Hub | 24.99 | 142
Wireless Mouse| 29.99 | 118
Subquery vs JOIN — When to Use Each
Subqueries and JOINs can often achieve the same result. Subqueries tend to be more readable for filtering and existence checks. JOINs are generally better for performance when you need columns from both tables in the output, or when working with very large datasets. Modern PostgreSQL optimises many subqueries into equivalent join plans automatically, but understanding both gives you the flexibility to choose the right tool.
-- Same result — subquery vs JOIN
-- Subquery approach: customers who have ordered
SELECT first_name, last_name FROM customers
WHERE id IN (SELECT DISTINCT customer_id FROM orders);
-- JOIN approach: customers who have ordered
SELECT DISTINCT c.first_name, c.last_name
FROM customers c
JOIN orders o ON c.id = o.customer_id;
-- For existence checks, NOT EXISTS is typically cleaner than NOT IN
-- NOT IN has a NULL trap: if any value in the subquery is NULL,
-- the entire NOT IN returns false for all rows
-- Safe: NOT EXISTS
SELECT name FROM products
WHERE NOT EXISTS (SELECT 1 FROM order_items oi WHERE oi.product_id = products.id);
-- Risky if product_id can be NULL: NOT IN
SELECT name FROM products
WHERE id NOT IN (SELECT product_id FROM order_items); -- may miss rows if NULL exists
-- NOT EXISTS is safer when NULLs may be present in the subquery result
Subquery Quick Reference
| Pattern | Location | Use For |
|---|---|---|
| WHERE id IN (SELECT ...) | WHERE clause | Filter rows matching a computed list |
| WHERE val > (SELECT AVG(...)) | WHERE clause | Compare to a single aggregate value |
| SELECT (SELECT ...) AS col | SELECT list | Scalar computed column per row |
| FROM (SELECT ...) AS alias | FROM clause | Aggregate then filter — inline view |
| WHERE EXISTS (SELECT 1 ...) | WHERE clause | Check if any matching row exists |
| WHERE NOT EXISTS (SELECT 1 ...) | WHERE clause | Check that no matching row exists (safer than NOT IN) |
🧪 Practice Questions
Write these queries against the Dataplexa Store database.
1. Write the WHERE clause to find employees earning above the company average salary.
2. Which operator checks whether a correlated subquery returns at least one row — and stops as soon as it finds a match?
3. What syntax puts a subquery in the FROM clause as a temporary table?
4. Which is safer for finding rows with no match in another table — NOT IN or NOT EXISTS — and why?
5. A subquery in the SELECT list returns one value per outer row. What is this type of subquery called?
🎯 Quiz — Test Your Understanding
Q1. What makes a correlated subquery different from a regular subquery?
Q2. Why is NOT EXISTS generally safer than NOT IN when checking for unmatched rows?
Q3. You want to find customers whose total spending exceeds $500 — but you cannot use HAVING because you need to JOIN to the customers table for their names first. What is the best approach?
Q4. What is the defining characteristic of a scalar subquery?
Q5. Inside an EXISTS subquery, why do developers conventionally write SELECT 1 rather than SELECT *?
Next up: Common Table Expressions (CTEs) — cleaner, reusable subqueries with the WITH clause.