PostgreSQL Lesson 30 – Subqueries | Dataplexa

Subqueries

A subquery is a SELECT statement nested inside another SQL statement. Subqueries let you use the result of one query as the input to another — filtering rows based on a computed list, comparing values to aggregates, or building multi-step logic all in a single statement. They are one of the most powerful tools in SQL and show up in WHERE clauses, FROM clauses, SELECT lists, and HAVING clauses. This lesson covers every major subquery pattern with real Dataplexa Store examples.

Subquery in WHERE — Filtering with a Computed List

The most common subquery pattern is using one in the WHERE clause. The inner query runs first, produces a result, and the outer query uses that result to filter its rows. This lets you express complex filters without needing a JOIN.

-- Find customers who have placed at least one order
SELECT first_name, last_name, email
FROM customers
WHERE id IN (
  SELECT DISTINCT customer_id FROM orders
)
ORDER BY last_name;

-- Find products that have NEVER been ordered
SELECT name, price, category
FROM products
WHERE id NOT IN (
  SELECT DISTINCT product_id FROM order_items
)
ORDER BY name;

-- Employees earning above the company average salary
SELECT first_name, last_name, salary, department
FROM employees
WHERE salary > (
  SELECT AVG(salary) FROM employees
)
ORDER BY salary DESC;

Scalar Subquery — A Single Value

A scalar subquery returns exactly one row and one column — a single value. You can use it anywhere a single value is valid: in SELECT for a computed column, in WHERE for a comparison, or in SET inside an UPDATE. If the subquery returns more than one row, PostgreSQL throws an error.

-- Show each product's price alongside the overall average price
SELECT name,
       price,
       ROUND((SELECT AVG(price) FROM products), 2)  AS avg_price,
       price - ROUND((SELECT AVG(price) FROM products), 2) AS diff_from_avg
FROM products
ORDER BY diff_from_avg DESC;

-- Show total revenue alongside each order (scalar subquery in SELECT)
SELECT id,
       total_amount,
       ROUND((SELECT SUM(total_amount) FROM orders WHERE status != 'cancelled'), 2) AS company_total,
       ROUND(total_amount / (SELECT SUM(total_amount) FROM orders WHERE status != 'cancelled') * 100, 2) AS pct_of_total
FROM orders
WHERE status = 'delivered'
ORDER BY pct_of_total DESC
LIMIT 5;

-- Price vs average:
      name       | price  | avg_price | diff_from_avg
----------------+--------+-----------+--------------
Standing Desk  | 349.99 |     97.49 |      252.50
Ergonomic Chair| 299.99 |     97.49 |      202.50
USB Hub        | 24.99 |     97.49 |     -72.50

-- % of total revenue:
id  | total_amount | company_total | pct_of_total
-----+--------------+---------------+--------------
178 |      598.50 |     23127.44 |        2.59

Subquery in FROM — Inline Views

When you put a subquery in the FROM clause, the result becomes a temporary table that the outer query can SELECT from, JOIN to, or filter. This is called a derived table or inline view. It is the technique to use when you need to aggregate first, then filter or aggregate again on the result.

-- Aggregate orders per customer, then filter on the summary
SELECT customer_summary.customer_id,
       customer_summary.order_count,
       customer_summary.total_spent
FROM (
  SELECT customer_id,
         COUNT(*)                    AS order_count,
         ROUND(SUM(total_amount), 2) AS total_spent
  FROM orders
  WHERE status != 'cancelled'
  GROUP BY customer_id
) AS customer_summary
WHERE customer_summary.total_spent > 300
ORDER BY customer_summary.total_spent DESC;

-- Find categories where average product price exceeds the store-wide average
SELECT category, avg_price
FROM (
  SELECT category, ROUND(AVG(price), 2) AS avg_price
  FROM products
  GROUP BY category
) AS cat_avg
WHERE avg_price > (SELECT AVG(price) FROM products)
ORDER BY avg_price DESC;

EXISTS and NOT EXISTS — Checking for Existence

EXISTS returns true if the subquery returns at least one row. It is often faster than IN for large datasets because PostgreSQL stops evaluating as soon as it finds the first matching row. NOT EXISTS is the reverse — it returns true only if the subquery returns no rows. The subquery in EXISTS is a correlated subquery — it references a column from the outer query using an alias.

-- EXISTS: customers who have placed at least one order
SELECT c.first_name, c.last_name, c.email
FROM customers c
WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.customer_id = c.id   -- correlated: references outer query's c.id
)
ORDER BY c.last_name;

-- NOT EXISTS: customers who have NEVER ordered
SELECT c.first_name, c.last_name, c.email
FROM customers c
WHERE NOT EXISTS (
  SELECT 1 FROM orders o
  WHERE o.customer_id = c.id
)
ORDER BY c.last_name;

-- EXISTS: products that appear in at least one order
SELECT p.name, p.price
FROM products p
WHERE EXISTS (
  SELECT 1 FROM order_items oi
  WHERE oi.product_id = p.id
)
ORDER BY p.name;

Correlated Subqueries

A correlated subquery references a column from the outer query. This means it runs once for every row the outer query processes — making it powerful but potentially slow on large tables. Use EXISTS as the correlated pattern when performance matters, or use a JOIN-based approach for large datasets.

-- For each customer, show their most recent order date (correlated)
SELECT c.first_name,
       c.last_name,
       (SELECT MAX(o.order_date)
        FROM orders o
        WHERE o.customer_id = c.id) AS last_order_date
FROM customers c
ORDER BY last_order_date DESC NULLS LAST
LIMIT 5;

-- For each product, show how many times it has been ordered (correlated)
SELECT p.name,
       p.price,
       (SELECT SUM(oi.quantity)
        FROM order_items oi
        WHERE oi.product_id = p.id) AS total_units_sold
FROM products p
ORDER BY total_units_sold DESC NULLS LAST
LIMIT 5;

Subquery vs JOIN — When to Use Each

Subqueries and JOINs can often achieve the same result. Subqueries tend to be more readable for filtering and existence checks. JOINs are generally better for performance when you need columns from both tables in the output, or when working with very large datasets. Modern PostgreSQL optimises many subqueries into equivalent join plans automatically, but understanding both gives you the flexibility to choose the right tool.

-- Same result — subquery vs JOIN

-- Subquery approach: customers who have ordered
SELECT first_name, last_name FROM customers
WHERE id IN (SELECT DISTINCT customer_id FROM orders);

-- JOIN approach: customers who have ordered
SELECT DISTINCT c.first_name, c.last_name
FROM customers c
JOIN orders o ON c.id = o.customer_id;

-- For existence checks, NOT EXISTS is typically cleaner than NOT IN
-- NOT IN has a NULL trap: if any value in the subquery is NULL,
-- the entire NOT IN returns false for all rows

-- Safe: NOT EXISTS
SELECT name FROM products
WHERE NOT EXISTS (SELECT 1 FROM order_items oi WHERE oi.product_id = products.id);

-- Risky if product_id can be NULL: NOT IN
SELECT name FROM products
WHERE id NOT IN (SELECT product_id FROM order_items);  -- may miss rows if NULL exists

-- Both return the same rows
-- NOT EXISTS is safer when NULLs may be present in the subquery result

Subquery Quick Reference

Pattern	Location	Use For
WHERE id IN (SELECT ...)	WHERE clause	Filter rows matching a computed list
WHERE val > (SELECT AVG(...))	WHERE clause	Compare to a single aggregate value
SELECT (SELECT ...) AS col	SELECT list	Scalar computed column per row
FROM (SELECT ...) AS alias	FROM clause	Aggregate then filter — inline view
WHERE EXISTS (SELECT 1 ...)	WHERE clause	Check if any matching row exists
WHERE NOT EXISTS (SELECT 1 ...)	WHERE clause	Check that no matching row exists (safer than NOT IN)

🧪 Practice Questions

Write these queries against the Dataplexa Store database.

1. Write the WHERE clause to find employees earning above the company average salary.

2. Which operator checks whether a correlated subquery returns at least one row — and stops as soon as it finds a match?

3. What syntax puts a subquery in the FROM clause as a temporary table?

4. Which is safer for finding rows with no match in another table — NOT IN or NOT EXISTS — and why?

5. A subquery in the SELECT list returns one value per outer row. What is this type of subquery called?

Next up: Common Table Expressions (CTEs) — cleaner, reusable subqueries with the WITH clause.

← Previous Course Index Next →

PostgreSQL