PostgreSQL Lesson 32 – Materialized Views | Dataplexa

Window Functions

Window functions are one of the most powerful features in PostgreSQL. Like aggregate functions, they perform calculations across a set of rows — but unlike GROUP BY, they do not collapse those rows into one. Every row keeps its own identity while gaining a new computed column based on a sliding window of related rows. This lets you calculate running totals, rankings, moving averages, row comparisons, and percentile breakdowns — all without losing any data. This lesson covers the full window function toolkit with real Dataplexa Store examples.

How Window Functions Work

A window function uses the OVER() clause to define the set of rows it looks at — the "window". Without any arguments, OVER() means the entire result set. With PARTITION BY, the window is scoped to groups. With ORDER BY, the window becomes a running calculation. The key difference from GROUP BY: every input row still produces an output row.

-- GROUP BY collapses rows — one row per category
SELECT category, ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category;

-- Window function keeps all rows — adds avg per category alongside each product
SELECT name,
       category,
       price,
       ROUND(AVG(price) OVER (PARTITION BY category), 2) AS category_avg
FROM products
ORDER BY category, price;

ROW_NUMBER, RANK, DENSE_RANK — Ranking Rows

These three functions assign a position number to each row within a window. They differ in how they handle ties. ROW_NUMBER always gives unique numbers — no ties. RANK gives the same number to ties but skips the next number (1, 2, 2, 4). DENSE_RANK gives the same number to ties but does not skip (1, 2, 2, 3). All three require ORDER BY inside the OVER clause.

-- Rank products by price within each category
SELECT name,
       category,
       price,
       ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS row_num,
       RANK()       OVER (PARTITION BY category ORDER BY price DESC) AS rank,
       DENSE_RANK() OVER (PARTITION BY category ORDER BY price DESC) AS dense_rank
FROM products
ORDER BY category, price DESC;

-- Get the top-ranked product in each category
SELECT name, category, price
FROM (
  SELECT name, category, price,
         RANK() OVER (PARTITION BY category ORDER BY price DESC) AS rnk
  FROM products
) ranked
WHERE rnk = 1
ORDER BY category;

Running Totals with SUM OVER

Adding ORDER BY to an aggregate window function turns it into a running (cumulative) calculation. As PostgreSQL moves through the ordered rows, it accumulates the total up to and including the current row. This is perfect for running revenue, cumulative units sold, and progressive totals.

-- Running revenue total ordered by date
SELECT id AS order_id,
       order_date,
       total_amount,
       ROUND(SUM(total_amount) OVER (ORDER BY order_date, id), 2) AS running_total
FROM orders
WHERE status = 'delivered'
ORDER BY order_date, id
LIMIT 8;

-- Running total resetting per month (PARTITION BY month)
SELECT id,
       order_date,
       total_amount,
       ROUND(SUM(total_amount) OVER (
         PARTITION BY DATE_TRUNC('month', order_date)
         ORDER BY order_date, id
       ), 2) AS monthly_running_total
FROM orders
WHERE status = 'delivered'
ORDER BY order_date, id
LIMIT 8;

-- Running total:
order_id | order_date | total_amount | running_total
----------+------------+--------------+---------------
      101 | 2024-01-14 |      149.97 |       149.97
      104 | 2024-01-22 |       89.99 |       239.96
      107 | 2024-01-28 |      234.50 |       474.46
      112 | 2024-02-03 |       74.99 |       549.45

LAG and LEAD — Comparing Adjacent Rows

LAG looks at the previous row's value. LEAD looks at the next row's value. Both accept an optional offset (how many rows back/forward) and a default value to return when there is no previous or next row. These are essential for period-over-period comparisons — revenue this month vs last month, today's orders vs yesterday's.

-- Month-over-month revenue comparison
WITH monthly_revenue AS (
  SELECT DATE_TRUNC('month', order_date) AS month,
         ROUND(SUM(total_amount), 2) AS revenue
  FROM orders
  WHERE status = 'delivered'
  GROUP BY DATE_TRUNC('month', order_date)
)
SELECT month,
       revenue,
       LAG(revenue)  OVER (ORDER BY month)  AS prev_month_revenue,
       ROUND(revenue - LAG(revenue) OVER (ORDER BY month), 2) AS change,
       ROUND(
         (revenue - LAG(revenue) OVER (ORDER BY month))
         / NULLIF(LAG(revenue) OVER (ORDER BY month), 0) * 100
       , 1) AS pct_change
FROM monthly_revenue
ORDER BY month;

       month       | revenue | prev_month | change | pct_change
---------------------+----------+------------+--------+------------
2024-01-01 00:00:00 | 1823.40  |       NULL |   NULL |       NULL
2024-02-01 00:00:00 | 2103.80  |   1823.40 | 280.40 |       15.4
2024-03-01 00:00:00 | 1944.20  |   2103.80 |-159.60 |       -7.6

NTILE — Distributing Rows into Buckets

NTILE(n) divides the rows into n equal-sized buckets and assigns each row a bucket number. This is how you compute quartiles (NTILE(4)), deciles (NTILE(10)), or any percentile breakdown. Rows are assigned to buckets in ORDER BY order.

-- Divide customers into 4 spending quartiles
WITH customer_spending AS (
  SELECT customer_id,
         ROUND(SUM(total_amount), 2) AS total_spent
  FROM orders
  WHERE status != 'cancelled'
  GROUP BY customer_id
)
SELECT cs.customer_id,
       c.first_name || ' ' || c.last_name AS customer,
       cs.total_spent,
       NTILE(4) OVER (ORDER BY cs.total_spent DESC) AS spending_quartile
FROM customer_spending cs
JOIN customers c ON cs.customer_id = c.id
ORDER BY cs.total_spent DESC
LIMIT 10;

customer_id |    customer    | total_spent | spending_quartile
-------------+--------------+-------------+------------------
          3 | Liam Brown   |      892.45 |                 1
         12 | Emma Wilson  |      741.20 |                 1
         28 | Noah Martinez|      530.80 |                 1
          7 | Alice Morgan |      423.50 |                 2

FIRST_VALUE and LAST_VALUE — Anchoring to the Window Edge

FIRST_VALUE returns the first value in the window frame and LAST_VALUE returns the last. Combined with PARTITION BY, these let you compare each row to the best or worst performer in its group without a self-join.

-- Compare each product's price to the most expensive in its category
SELECT name,
       category,
       price,
       FIRST_VALUE(price) OVER (
         PARTITION BY category ORDER BY price DESC
       ) AS category_max_price,
       ROUND(price / FIRST_VALUE(price) OVER (
         PARTITION BY category ORDER BY price DESC
       ) * 100, 1) AS pct_of_max
FROM products
ORDER BY category, price DESC;

      name       |  category   | price | category_max_price | pct_of_max
----------------+-------------+-------+--------------------+------------
Webcam HD      | Electronics | 79.99 |               79.99 |      100.0
Wireless Mouse | Electronics | 29.99 |               79.99 |       37.5
USB Hub        | Electronics | 24.99 |               79.99 |       31.2

Window Function Quick Reference

Function	What It Does	Needs ORDER BY in OVER?
ROW_NUMBER()	Unique sequential number per row	Yes
RANK()	Rank with gaps on ties	Yes
DENSE_RANK()	Rank without gaps on ties	Yes
SUM() OVER	Running or partitioned total	Yes for running — optional for partition total
AVG() OVER	Running or partitioned average	Optional
LAG(col, n)	Value from n rows before current	Yes
LEAD(col, n)	Value from n rows after current	Yes
NTILE(n)	Assigns rows to n equal buckets	Yes
FIRST_VALUE(col)	First value in the window frame	Yes
LAST_VALUE(col)	Last value in the window frame	Yes

🧪 Practice Questions

Answer based on what you learned in this lesson.

1. What clause defines the window in a window function?

2. What is the difference between RANK() and DENSE_RANK() when rows tie?

3. Which window function returns the value from the previous row?

4. Which keyword inside OVER() scopes a window function to a group — like category — so the calculation resets per group?

5. Which window function divides customers into spending quartiles?

Next up: String functions — manipulating, formatting, and searching text data in PostgreSQL.

← Previous Course Index Next →

PostgreSQL