PostgreSQL
Advanced Joins
Lesson 23 covered the four main join types. This lesson goes deeper — self joins for hierarchical data, CROSS JOINs for generating combinations, joining on multiple conditions, non-equi joins, and the common pitfalls that produce wrong results silently. These techniques come up regularly in real-world reporting and data modelling work.
Self Join — Joining a Table to Itself
A self join is when you join a table to itself. This sounds unusual but it is the natural solution for hierarchical data — like an employees table where each employee has a manager who is also an employee. Without a self join you cannot get the manager's name from the same table in a single query.
-- Employees table has a manager_id column pointing to another employee's id
-- Self join to get each employee's manager name
SELECT e.first_name || ' ' || e.last_name AS employee,
e.job_title,
m.first_name || ' ' || m.last_name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id
ORDER BY manager NULLS LAST, employee;
-- Find all employees who earn more than their manager
SELECT e.first_name || ' ' || e.last_name AS employee,
e.salary AS employee_salary,
m.first_name || ' ' || m.last_name AS manager,
m.salary AS manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary
ORDER BY e.salary DESC;
employee | job_title | manager
--------------+--------------------+---------------
James Carter | Account Manager | Sofia Taylor
Sarah Reed | Sales Rep | Sofia Taylor
Sofia Taylor | Head of Sales | NULL (CEO)
-- Earns more than manager:
employee | emp_salary | manager | mgr_salary
--------------+------------+--------------+------------
Marcus Reed | 85000.00 | James Carter | 65000.00
CROSS JOIN — Every Combination
A CROSS JOIN produces the Cartesian product of two tables — every row from the left table paired with every row from the right table. If the left table has 5 rows and the right has 4, the result has 20 rows. There is no ON condition — it is simply all combinations. This is rarely needed in transactional queries but useful for generating test data, size matrices, or scheduling grids.
-- Generate all combinations of product categories and months
-- Useful for building a complete grid with no gaps for reporting
SELECT c.category, m.month_name
FROM (SELECT DISTINCT category FROM products) c
CROSS JOIN (
VALUES ('January'), ('February'), ('March'),
('April'), ('May'), ('June')
) AS m(month_name)
ORDER BY c.category, m.month_name;
-- How many combinations does this produce?
SELECT COUNT(DISTINCT category) * 6 AS total_combinations
FROM products;
-------------+------------
Accessories | April
Accessories | February
Accessories | January
Electronics | April
Electronics | February
... (24 rows total — 4 categories × 6 months)
total_combinations
--------------------
24
Joining on Multiple Conditions
An ON clause can have multiple conditions joined with AND. This is necessary when a match requires more than one column to agree — for example matching both a customer ID and a date range, or confirming both a product and a warehouse match.
-- Join orders to a promotions table matching customer AND date range
-- Only apply the promotion if the order falls within the promotion window
SELECT o.id AS order_id,
o.customer_id,
o.order_date,
o.total_amount,
pr.name AS promotion,
pr.discount_pct
FROM orders o
LEFT JOIN promotions pr
ON o.customer_id = pr.customer_id -- customer must match
AND o.order_date BETWEEN pr.start_date -- and order must fall in window
AND pr.end_date
WHERE o.status = 'delivered'
ORDER BY o.order_date DESC
LIMIT 5;
----------+-------------+------------+--------------+--------------+--------------
199 | 5 | 2024-05-30 | 74.99 | Summer Sale | 15.00
198 | 22 | 2024-05-28 | 312.50 | NULL | NULL
Non-Equi Joins — Joining on Ranges
Most joins use equality — table_a.id = table_b.id. But the ON clause accepts any boolean expression, including inequalities and ranges. A non-equi join matches rows based on a range condition rather than an exact match. This is useful for salary bands, price tiers, and date range lookups.
-- Assign a commission tier to each employee based on their salary range
-- salary_tiers is a lookup table: tier_name, min_salary, max_salary
SELECT e.first_name, e.last_name, e.salary,
t.tier_name,
t.commission_pct
FROM employees e
JOIN salary_tiers t
ON e.salary BETWEEN t.min_salary AND t.max_salary
ORDER BY e.salary DESC;
-- Find all orders where total_amount falls within a discount bracket
SELECT o.id, o.total_amount, d.discount_label, d.discount_pct
FROM orders o
JOIN discount_brackets d
ON o.total_amount >= d.min_order_value
AND o.total_amount < d.max_order_value
WHERE o.status = 'pending'
ORDER BY o.total_amount DESC
LIMIT 5;
------------+-----------+----------+------------+----------------
Sofia | Taylor | 92000.00 | Senior | 8.00
Marcus | Reed | 85000.00 | Senior | 8.00
James | Carter | 65000.00 | Mid | 5.00
The Duplicate Row Trap — When Joins Multiply Data
One of the most common join mistakes is silently producing more rows than you expect. This happens when the right table has multiple rows matching a single row from the left table — each match produces a separate output row. The result looks correct but numbers like SUM become inflated because rows are counted multiple times.
-- A customer has 5 orders. Joining customers to orders gives 5 rows per customer.
-- SUM(o.total_amount) is fine here — we want total spent per customer
SELECT c.first_name, COUNT(o.id) AS orders, SUM(o.total_amount) AS total_spent
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.first_name
LIMIT 3;
-- But this is a trap — joining orders to order_items inflates order total
-- Each order appears once per line item in order_items
-- SUM(total_amount) without DISTINCT would count the order total for every line item
SELECT o.id,
o.total_amount AS declared_total,
SUM(oi.quantity * oi.unit_price) AS calculated_total -- correct
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
GROUP BY o.id, o.total_amount
ORDER BY o.id
LIMIT 3;
first_name | orders | total_spent
------------+--------+-------------
Liam | 8 | 741.20
Emma | 7 | 612.40
-- Order totals (calculated correctly from line items):
id | declared_total | calculated_total
-----+----------------+------------------
101 | 149.97 | 169.94
102 | 89.98 | 89.98
JOIN with Aggregation — The Right Pattern
Combining joins with GROUP BY and aggregates is the bread and butter of reporting queries. The key is understanding what level you are aggregating at — per customer, per product, per month — and making sure your GROUP BY reflects that correctly.
-- Top 5 best-selling products by units sold
SELECT p.name,
p.category,
SUM(oi.quantity) AS units_sold,
ROUND(SUM(oi.quantity * oi.unit_price), 2) AS revenue
FROM products p
JOIN order_items oi ON p.id = oi.product_id
GROUP BY p.id, p.name, p.category
ORDER BY units_sold DESC
LIMIT 5;
-- Revenue by customer with their name — joined aggregation
SELECT c.first_name || ' ' || c.last_name AS customer,
c.state,
COUNT(DISTINCT o.id) AS orders,
ROUND(SUM(o.total_amount), 2) AS total_spent
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.status != 'cancelled'
GROUP BY c.id, c.first_name, c.last_name, c.state
ORDER BY total_spent DESC
LIMIT 5;
name | category | units_sold | revenue
--------------+-------------+------------+---------
USB Hub | Electronics | 142 | 3548.58
Wireless Mouse| Electronics | 118 | 3538.82
Desk Lamp | Office | 96 | 3359.04
-- Top customers:
customer | state | orders | total_spent
--------------+-------+--------+-------------
Liam Brown | PA | 8 | 741.20
Emma Wilson | AZ | 7 | 612.40
Advanced Joins Quick Reference
| Technique | When to Use | Key Point |
|---|---|---|
| Self Join | Hierarchical data in one table | Must use two different aliases for the same table |
| CROSS JOIN | Generate all combinations | No ON clause — result is rows_a × rows_b |
| Multi-condition ON | Match on multiple columns | Use AND inside the ON clause |
| Non-equi Join | Range or inequality matching | ON uses BETWEEN, >=, < instead of = |
| Duplicate row trap | One-to-many relationships | Aggregate at the right level — check row counts |
🧪 Practice Questions
Answer based on what you learned in this lesson.
1. An employees table has a manager_id that references another row in the same table. What type of join retrieves the manager's name alongside each employee?
2. Which join type produces every possible combination of rows from two tables with no ON condition?
3. In an ON clause, which keyword lets you add a second matching condition?
4. You want to join employees to a salary_tiers table using BETWEEN to assign each employee a tier. What type of join is this?
5. When writing a self join, what must you give the table to distinguish the two references to it?
🎯 Quiz — Test Your Understanding
Q1. You want to show each employee alongside their manager's name. Both are in the employees table. What is the correct approach?
Q2. You join orders to order_items (one order has 5 line items). You then SUM(orders.total_amount) without grouping correctly. What goes wrong?
Q3. You want to join employees to salary_tiers where the employee salary falls within the tier's range. Which ON clause is correct?
Q4. A CROSS JOIN between a table of 6 months and a table of 4 categories produces how many rows?
Q5. In a self join for managers, some employees have no manager (the CEO). Which join preserves those employees in the results?
Next up: Subqueries — writing queries inside queries for powerful single-statement logic.