PostgreSQL
GROUP BY & HAVING
Lesson 21 introduced GROUP BY and HAVING alongside the aggregate functions. This lesson goes deeper — covering multi-column grouping, grouping rules that catch developers out, ordering grouped results, combining WHERE and HAVING, and building the kind of real reporting queries you will use every day against the Dataplexa Store.
GROUP BY Recap — The Core Rule
Every column in your SELECT list must either be inside an aggregate function or appear in the GROUP BY clause. This rule is enforced strictly by PostgreSQL — if you list a column in SELECT that is neither aggregated nor grouped, you get an error. This rule exists because PostgreSQL does not know which value to show for that column when multiple rows are collapsed into one group.
-- Correct — category is in both SELECT and GROUP BY
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category;
-- Error — name is in SELECT but not in GROUP BY and not aggregated
SELECT category, name, COUNT(*) AS product_count
FROM products
GROUP BY category; -- fails: column "name" must appear in GROUP BY or aggregate
-- Fixed — either aggregate name or add it to GROUP BY
SELECT category, COUNT(*) AS product_count, MAX(name) AS sample_product
FROM products
GROUP BY category;
category | product_count
-------------+---------------
Electronics | 9
Office | 8
Furniture | 4
Accessories | 9
-- Error:
ERROR: column "products.name" must appear in the GROUP BY clause
or be used in an aggregate function
Grouping by Multiple Columns
You can group by any number of columns. PostgreSQL creates one group for each unique combination of the grouped column values. This lets you produce breakdowns like revenue by status and month, products by category and active status, or employees by department and job title.
-- Order count and revenue by status AND month
SELECT
status,
EXTRACT(MONTH FROM order_date) AS order_month,
COUNT(*) AS order_count,
ROUND(SUM(total_amount), 2) AS revenue
FROM orders
GROUP BY status, EXTRACT(MONTH FROM order_date)
ORDER BY order_month, status;
-- Products grouped by category and active status
SELECT category, is_active,
COUNT(*) AS product_count,
ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category, is_active
ORDER BY category, is_active DESC;
-- Employees grouped by department and job title
SELECT department, job_title,
COUNT(*) AS headcount,
ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department, job_title
ORDER BY department, avg_salary DESC;
status | order_month | order_count | revenue
------------+-------------+-------------+---------
delivered | 1 | 14 | 1823.40
pending | 1 | 9 | 987.20
shipped | 1 | 4 | 412.80
-- By category and active:
category | is_active | product_count | avg_price
-------------+-----------+---------------+-----------
Accessories | t | 8 | 43.22
Accessories | f | 1 | 12.99
Electronics | t | 8 | 62.44
Grouping with Expressions
You do not have to group by raw column values — you can group by expressions like EXTRACT, DATE_TRUNC, or CASE. This lets you roll data up by year, by month, by quarter, or by any custom bucket you define.
-- Revenue by year and month using DATE_TRUNC
SELECT
DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS orders,
ROUND(SUM(total_amount), 2) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;
-- Group customers into tiers using CASE
SELECT
CASE
WHEN joined_date >= CURRENT_DATE - INTERVAL '1 year' THEN 'New'
WHEN joined_date >= CURRENT_DATE - INTERVAL '3 years' THEN 'Established'
ELSE 'Loyal'
END AS customer_tier,
COUNT(*) AS customer_count
FROM customers
GROUP BY customer_tier
ORDER BY customer_count DESC;
-- Products grouped by price band
SELECT
CASE
WHEN price < 30 THEN 'Budget (under $30)'
WHEN price < 100 THEN 'Mid-range ($30–$99)'
ELSE 'Premium ($100+)'
END AS price_band,
COUNT(*) AS product_count,
ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY price_band
ORDER BY avg_price;
month | orders | revenue
---------------------+--------+---------
2024-01-01 00:00:00 | 32 | 3912.40
2024-02-01 00:00:00 | 28 | 3341.80
-- Price bands:
price_band | product_count | avg_price
---------------------+---------------+-----------
Budget (under $30) | 8 | 23.74
Mid-range ($30–$99) | 16 | 54.18
Premium ($100+) | 8 | 262.49
HAVING — Filtering Groups with Multiple Conditions
HAVING supports the same operators as WHERE — AND, OR, NOT, BETWEEN, IN, and more. You can combine multiple aggregate conditions in a single HAVING clause to build precise group filters.
-- Categories with more than 5 products AND average price above $40
SELECT category,
COUNT(*) AS product_count,
ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category
HAVING COUNT(*) > 5 AND AVG(price) > 40
ORDER BY avg_price DESC;
-- Customers who placed between 3 and 8 orders
SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) BETWEEN 3 AND 8
ORDER BY order_count DESC;
-- States with 2 to 5 customers
SELECT state, COUNT(*) AS customer_count
FROM customers
GROUP BY state
HAVING COUNT(*) BETWEEN 2 AND 5
ORDER BY customer_count DESC, state;
category | product_count | avg_price
-------------+---------------+-----------
Electronics | 9 | 62.44
Accessories | 9 | 43.22
-- Between 3 and 8 orders:
customer_id | order_count
-------------+-------------
3 | 8
12 | 7
28 | 5
WHERE and HAVING Together
Using WHERE and HAVING in the same query is a very common and powerful pattern. WHERE removes individual rows before they enter the grouping — shrinking the data set first. HAVING then filters the resulting groups. This means WHERE runs on every row in the table and HAVING only runs on the summarised groups.
-- Active products only, grouped by category, categories with avg > $40
SELECT category,
COUNT(*) AS active_products,
ROUND(AVG(price), 2) AS avg_price,
SUM(stock_qty) AS total_stock
FROM products
WHERE is_active = TRUE -- row filter first
GROUP BY category
HAVING AVG(price) > 40 -- group filter after
ORDER BY avg_price DESC;
-- Delivered orders in 2024, customers with more than $300 total
SELECT customer_id,
COUNT(*) AS order_count,
ROUND(SUM(total_amount), 2) AS total_spent
FROM orders
WHERE status = 'delivered' -- only delivered orders
AND order_date >= '2024-01-01' -- only 2024
GROUP BY customer_id
HAVING SUM(total_amount) > 300 -- only big spenders
ORDER BY total_spent DESC;
category | active_products | avg_price | total_stock
-------------+-----------------+-----------+-------------
Furniture | 4 | 274.99 | 23
Electronics | 8 | 62.44 | 240
Accessories | 8 | 43.22 | 420
-- Big spenders in 2024 delivered orders:
customer_id | order_count | total_spent
-------------+-------------+-------------
3 | 6 | 612.40
12 | 5 | 498.20
Aliasing Aggregate Columns and Using Them in ORDER BY
You can give aggregate result columns a clean alias with AS and then reference that alias in ORDER BY. Note that you cannot use the alias in HAVING — PostgreSQL evaluates HAVING before the SELECT aliases are finalised, so you must repeat the full aggregate expression in HAVING.
-- Alias works in ORDER BY but not in HAVING
SELECT
customer_id,
COUNT(*) AS order_count,
ROUND(SUM(total_amount), 2) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(total_amount) > 200 -- must repeat the expression in HAVING
ORDER BY total_spent DESC -- can use the alias in ORDER BY
LIMIT 10;
-------------+-------------+-------------
3 | 8 | 892.45
12 | 7 | 741.20
28 | 5 | 530.80
(10 rows)
GROUP BY and HAVING — Full Clause Order
| Step | Clause | What Happens |
|---|---|---|
| 1 | FROM | Load the full table |
| 2 | WHERE | Remove rows that do not match the condition |
| 3 | GROUP BY | Collapse matching rows into groups |
| 4 | HAVING | Remove groups that do not meet the aggregate condition |
| 5 | SELECT | Compute the output columns and aliases |
| 6 | ORDER BY | Sort the final results (aliases available here) |
| 7 | LIMIT / OFFSET | Return only the requested slice of rows |
🧪 Practice Questions
Write these queries against the Dataplexa Store database.
1. Write a query to count how many employees are in each department.
2. You group orders by customer_id. Which clause do you use to keep only customers with more than 4 orders?
3. A column appears in SELECT but is not in GROUP BY and is not aggregated. What does PostgreSQL do?
4. Which function groups timestamps by month so you can produce a monthly revenue report?
5. You alias SUM(total_amount) as total_spent. Can you use that alias in the HAVING clause?
🎯 Quiz — Test Your Understanding
Q1. In a query with both WHERE and HAVING, which runs first?
Q2. You want one row per unique combination of category and is_active. Which GROUP BY is correct?
Q3. You want to group products into price bands — Budget, Mid-range, Premium. Which approach is correct?
Q4. You want categories with more than 5 products AND average price above $40. Which HAVING clause is correct?
Q5. Why can you use a SELECT alias in ORDER BY but not in HAVING?
Next up: Joins — combining data from multiple tables with INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.