PostgreSQL
Window Functions
Window functions are one of the most powerful features in PostgreSQL. Like aggregate functions, they perform calculations across a set of rows — but unlike GROUP BY, they do not collapse those rows into one. Every row keeps its own identity while gaining a new computed column based on a sliding window of related rows. This lets you calculate running totals, rankings, moving averages, row comparisons, and percentile breakdowns — all without losing any data. This lesson covers the full window function toolkit with real Dataplexa Store examples.
How Window Functions Work
A window function uses the OVER() clause to define the set of rows it looks at — the "window". Without any arguments, OVER() means the entire result set. With PARTITION BY, the window is scoped to groups. With ORDER BY, the window becomes a running calculation. The key difference from GROUP BY: every input row still produces an output row.
-- GROUP BY collapses rows — one row per category
SELECT category, ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category;
-- Window function keeps all rows — adds avg per category alongside each product
SELECT name,
category,
price,
ROUND(AVG(price) OVER (PARTITION BY category), 2) AS category_avg
FROM products
ORDER BY category, price;
category | avg_price
-------------+-----------
Accessories | 43.22
Electronics | 62.44
-- Window function (all rows preserved):
name | category | price | category_avg
----------------+-------------+-------+--------------
Monitor Arm | Accessories | 49.99 | 43.22
USB Hub | Accessories | 24.99 | 43.22
Wireless Mouse | Electronics | 29.99 | 62.44
Webcam HD | Electronics | 79.99 | 62.44
ROW_NUMBER, RANK, DENSE_RANK — Ranking Rows
These three functions assign a position number to each row within a window. They differ in how they handle ties. ROW_NUMBER always gives unique numbers — no ties. RANK gives the same number to ties but skips the next number (1, 2, 2, 4). DENSE_RANK gives the same number to ties but does not skip (1, 2, 2, 3). All three require ORDER BY inside the OVER clause.
-- Rank products by price within each category
SELECT name,
category,
price,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS row_num,
RANK() OVER (PARTITION BY category ORDER BY price DESC) AS rank,
DENSE_RANK() OVER (PARTITION BY category ORDER BY price DESC) AS dense_rank
FROM products
ORDER BY category, price DESC;
-- Get the top-ranked product in each category
SELECT name, category, price
FROM (
SELECT name, category, price,
RANK() OVER (PARTITION BY category ORDER BY price DESC) AS rnk
FROM products
) ranked
WHERE rnk = 1
ORDER BY category;
name | category | price | row_num | rank | dense_rank
----------------+-------------+-------+---------+------+------------
Webcam HD | Electronics | 79.99 | 1 | 1 | 1
Wireless Mouse | Electronics | 29.99 | 2 | 2 | 2
USB Hub | Electronics | 24.99 | 3 | 3 | 3
-- Most expensive per category:
name | category | price
--------------+-------------+--------
Monitor Arm | Accessories | 49.99
Webcam HD | Electronics | 79.99
Standing Desk | Furniture | 349.99
Running Totals with SUM OVER
Adding ORDER BY to an aggregate window function turns it into a running (cumulative) calculation. As PostgreSQL moves through the ordered rows, it accumulates the total up to and including the current row. This is perfect for running revenue, cumulative units sold, and progressive totals.
-- Running revenue total ordered by date
SELECT id AS order_id,
order_date,
total_amount,
ROUND(SUM(total_amount) OVER (ORDER BY order_date, id), 2) AS running_total
FROM orders
WHERE status = 'delivered'
ORDER BY order_date, id
LIMIT 8;
-- Running total resetting per month (PARTITION BY month)
SELECT id,
order_date,
total_amount,
ROUND(SUM(total_amount) OVER (
PARTITION BY DATE_TRUNC('month', order_date)
ORDER BY order_date, id
), 2) AS monthly_running_total
FROM orders
WHERE status = 'delivered'
ORDER BY order_date, id
LIMIT 8;
order_id | order_date | total_amount | running_total
----------+------------+--------------+---------------
101 | 2024-01-14 | 149.97 | 149.97
104 | 2024-01-22 | 89.99 | 239.96
107 | 2024-01-28 | 234.50 | 474.46
112 | 2024-02-03 | 74.99 | 549.45
LAG and LEAD — Comparing Adjacent Rows
LAG looks at the previous row's value. LEAD looks at the next row's value. Both accept an optional offset (how many rows back/forward) and a default value to return when there is no previous or next row. These are essential for period-over-period comparisons — revenue this month vs last month, today's orders vs yesterday's.
-- Month-over-month revenue comparison
WITH monthly_revenue AS (
SELECT DATE_TRUNC('month', order_date) AS month,
ROUND(SUM(total_amount), 2) AS revenue
FROM orders
WHERE status = 'delivered'
GROUP BY DATE_TRUNC('month', order_date)
)
SELECT month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
ROUND(revenue - LAG(revenue) OVER (ORDER BY month), 2) AS change,
ROUND(
(revenue - LAG(revenue) OVER (ORDER BY month))
/ NULLIF(LAG(revenue) OVER (ORDER BY month), 0) * 100
, 1) AS pct_change
FROM monthly_revenue
ORDER BY month;
---------------------+----------+------------+--------+------------
2024-01-01 00:00:00 | 1823.40 | NULL | NULL | NULL
2024-02-01 00:00:00 | 2103.80 | 1823.40 | 280.40 | 15.4
2024-03-01 00:00:00 | 1944.20 | 2103.80 |-159.60 | -7.6
NTILE — Distributing Rows into Buckets
NTILE(n) divides the rows into n equal-sized buckets and assigns each row a bucket number. This is how you compute quartiles (NTILE(4)), deciles (NTILE(10)), or any percentile breakdown. Rows are assigned to buckets in ORDER BY order.
-- Divide customers into 4 spending quartiles
WITH customer_spending AS (
SELECT customer_id,
ROUND(SUM(total_amount), 2) AS total_spent
FROM orders
WHERE status != 'cancelled'
GROUP BY customer_id
)
SELECT cs.customer_id,
c.first_name || ' ' || c.last_name AS customer,
cs.total_spent,
NTILE(4) OVER (ORDER BY cs.total_spent DESC) AS spending_quartile
FROM customer_spending cs
JOIN customers c ON cs.customer_id = c.id
ORDER BY cs.total_spent DESC
LIMIT 10;
-------------+--------------+-------------+------------------
3 | Liam Brown | 892.45 | 1
12 | Emma Wilson | 741.20 | 1
28 | Noah Martinez| 530.80 | 1
7 | Alice Morgan | 423.50 | 2
FIRST_VALUE and LAST_VALUE — Anchoring to the Window Edge
FIRST_VALUE returns the first value in the window frame and LAST_VALUE returns the last. Combined with PARTITION BY, these let you compare each row to the best or worst performer in its group without a self-join.
-- Compare each product's price to the most expensive in its category
SELECT name,
category,
price,
FIRST_VALUE(price) OVER (
PARTITION BY category ORDER BY price DESC
) AS category_max_price,
ROUND(price / FIRST_VALUE(price) OVER (
PARTITION BY category ORDER BY price DESC
) * 100, 1) AS pct_of_max
FROM products
ORDER BY category, price DESC;
----------------+-------------+-------+--------------------+------------
Webcam HD | Electronics | 79.99 | 79.99 | 100.0
Wireless Mouse | Electronics | 29.99 | 79.99 | 37.5
USB Hub | Electronics | 24.99 | 79.99 | 31.2
Window Function Quick Reference
| Function | What It Does | Needs ORDER BY in OVER? |
|---|---|---|
| ROW_NUMBER() | Unique sequential number per row | Yes |
| RANK() | Rank with gaps on ties | Yes |
| DENSE_RANK() | Rank without gaps on ties | Yes |
| SUM() OVER | Running or partitioned total | Yes for running — optional for partition total |
| AVG() OVER | Running or partitioned average | Optional |
| LAG(col, n) | Value from n rows before current | Yes |
| LEAD(col, n) | Value from n rows after current | Yes |
| NTILE(n) | Assigns rows to n equal buckets | Yes |
| FIRST_VALUE(col) | First value in the window frame | Yes |
| LAST_VALUE(col) | Last value in the window frame | Yes |
🧪 Practice Questions
Answer based on what you learned in this lesson.
1. What clause defines the window in a window function?
2. What is the difference between RANK() and DENSE_RANK() when rows tie?
3. Which window function returns the value from the previous row?
4. Which keyword inside OVER() scopes a window function to a group — like category — so the calculation resets per group?
5. Which window function divides customers into spending quartiles?
🎯 Quiz — Test Your Understanding
Q1. What is the key difference between a window function and GROUP BY with an aggregate?
Q2. You want to number products 1, 2, 3 within each category by price descending — with no ties. Which expression is correct?
Q3. Which window expression computes a running total of order amounts ordered by date?
Q4. You want to compare each month's revenue to the previous month's revenue. Which expression fetches the prior month value?
Q5. You use ROW_NUMBER() to rank products and want to keep only rank = 1. You cannot filter window functions in WHERE directly. What is the solution?
Next up: String functions — manipulating, formatting, and searching text data in PostgreSQL.