MS SQL Lesson 20 – GROUP BY | Dataplexa

GROUP BY

What Is GROUP BY?

GROUP BY collapses multiple rows that share a common value into a single summary row. Instead of seeing every individual order, you see one row per customer. Instead of every product sale, you see one row per category. You always pair GROUP BY with an aggregate function — COUNT(), SUM(), AVG(), MIN(), or MAX() — which calculates a single value across all the rows in each group.

Without GROUP BY, aggregate functions collapse the entire result into one row. With GROUP BY, they calculate one value per group. This is the foundation of almost every business report — totals per category, counts per customer, averages per month.

GROUP BY defines the grouping — every distinct value in that column becomes one output row
Every column in SELECT must either be in GROUP BY or wrapped in an aggregate function
NULL values are treated as a group of their own — all NULLs land in the same bucket
ORDER BY can be added after GROUP BY to sort the summarised result

-- The simplest GROUP BY — count how many products exist per category.
-- Without GROUP BY, COUNT(*) would return just one number for the whole table.
-- With GROUP BY Category, we get one count per category instead.
SELECT
    Category,          -- the grouping column — one row per distinct value
    COUNT(*) AS TotalProducts  -- aggregate: how many rows fall into each group
FROM Products
GROUP BY Category      -- collapse all rows with the same Category into one
ORDER BY TotalProducts DESC;  -- show busiest categories first

Category | TotalProducts
-------------|-------------
Electronics | 18
Stationery | 14
Furniture | 9
Accessories | 7
Clothing | 5
(5 rows affected)

Five categories exist in the Products table — one output row per distinct Category value
COUNT(*) counts every row in each group — including rows where other columns are NULL
ORDER BY TotalProducts DESC sorts by the aggregate result — Electronics has the most products
If you removed GROUP BY Category the query would fail — Category must either be grouped or aggregated

Grouping with Multiple Aggregate Functions

You can include as many aggregate functions as you need in a single GROUP BY query. Each one runs independently across the same groups. This lets you build a complete summary in one statement — count, total, average, minimum, and maximum all at once.

-- Full category summary — multiple aggregates in one query.
-- Each aggregate calculates independently across the same groups.
SELECT
    Category,
    COUNT(*)              AS TotalProducts,   -- how many products in the category
    SUM(Stock)            AS TotalStock,      -- combined units across all products
    ROUND(AVG(Price), 2)  AS AvgPrice,        -- average price, rounded to 2 decimal places
    MIN(Price)            AS CheapestPrice,   -- lowest price in this category
    MAX(Price)            AS MostExpensive    -- highest price in this category
FROM Products
GROUP BY Category
ORDER BY AvgPrice DESC;   -- most expensive categories at the top

Category | TotalProducts | TotalStock | AvgPrice | CheapestPrice | MostExpensive
------------|---------------|------------|----------|---------------|---------------
Furniture | 9 | 184 | 289.99 | 49.99 | 899.99
Electronics | 18 | 1842 | 87.45 | 8.99 | 1299.99
Clothing | 5 | 310 | 64.80 | 19.99 | 129.99
Accessories | 7 | 523 | 38.20 | 5.99 | 89.99
Stationery | 14 | 3108 | 14.75 | 1.99 | 49.99
(5 rows affected)

All five aggregates run across the same five category groups — the GROUP BY is evaluated once and shared
ROUND(AVG(Price), 2) nests a function inside an aggregate — AVG runs first, then ROUND trims the result to 2 decimal places
Furniture has only 9 products but the highest average price — grouping reveals patterns invisible in raw row data
Electronics has the widest price range ($8.99 to $1,299.99) — MIN and MAX together show spread, not just the average

Grouping by Multiple Columns

You can GROUP BY more than one column. The engine creates a group for every combination of values across all the grouped columns. If you group by Category and Brand, you get one row per unique Category-Brand pair — not one per Category and not one per Brand.

-- Group by two columns — one row per unique Category + Brand combination.
-- This drills deeper than grouping by Category alone.
SELECT
    Category,               -- first grouping column
    Brand,                  -- second grouping column
    COUNT(*)   AS Products, -- count of products for this specific brand+category pair
    SUM(Stock) AS Stock,    -- total stock for this brand in this category
    MIN(Price) AS MinPrice, -- cheapest product from this brand in this category
    MAX(Price) AS MaxPrice  -- most expensive product from this brand in this category
FROM Products
WHERE Brand IS NOT NULL     -- exclude products with no brand assigned
GROUP BY Category, Brand    -- group on the combination of both columns
ORDER BY Category, Products DESC; -- sort by category then most products first

Category | Brand | Products | Stock | MinPrice | MaxPrice
------------|------------|----------|-------|----------|----------
Electronics | LogiTech | 5 | 487 | 24.99 | 89.99
Electronics | ViewMax | 4 | 127 | 199.99 | 1299.99
Electronics | TechLink | 4 | 312 | 8.99 | 49.99
Electronics | KeyMaster | 3 | 184 | 69.99 | 149.99
Electronics | SoundWave | 2 | 96 | 29.99 | 79.99
Furniture | DeskPro | 5 | 98 | 149.99 | 899.99
Furniture | ChairCo | 4 | 86 | 49.99 | 449.99
Stationery | PaperCo | 9 | 1840 | 1.99 | 24.99
Stationery | WriteRight | 5 | 1268 | 2.99 | 49.99
(9 rows affected)

Nine rows — one per unique Category and Brand combination found in the data
The order of columns in GROUP BY does not change the result — GROUP BY Brand, Category produces identical groups
ViewMax has only 4 Electronics products but the highest MaxPrice — useful for spotting premium niche brands
WHERE Brand IS NOT NULL runs before grouping — rows with no brand are excluded before groups are formed

GROUP BY with COUNT DISTINCT

COUNT(*) counts every row in the group. COUNT(DISTINCT column) counts only the unique values in that column within each group. Use it when you want to know "how many different X does each group have" rather than "how many rows does each group have".

-- How many distinct customers placed orders per country?
-- COUNT(*) would count orders. COUNT(DISTINCT UserId) counts unique customers.
SELECT
    u.Country,                          -- group by the customer's country
    COUNT(DISTINCT o.UserId)  AS UniqueCustomers,  -- distinct customers who ordered
    COUNT(*)                  AS TotalOrders,      -- total number of orders placed
    SUM(o.Total)              AS Revenue,          -- total revenue from this country
    ROUND(AVG(o.Total), 2)    AS AvgOrderValue     -- average order size
FROM Orders  o                          -- o is an alias for the Orders table
JOIN Users   u ON o.UserId = u.UserId   -- join to get the customer's country
GROUP BY u.Country
ORDER BY Revenue DESC;                  -- highest revenue countries first

Country | UniqueCustomers | TotalOrders | Revenue | AvgOrderValue
---------------|-----------------|-------------|-----------|---------------
United States | 38 | 142 | 18420.50 | 129.72
United Kingdom | 22 | 89 | 11240.00 | 126.29
Germany | 15 | 61 | 7830.75 | 128.37
Canada | 12 | 48 | 5920.20 | 123.34
Australia | 9 | 34 | 4210.00 | 123.82
France | 8 | 29 | 3480.60 | 120.02
(6 rows affected)

COUNT(DISTINCT o.UserId) counts unique customers — the US has 38 unique buyers, not just 142 order rows
COUNT(*) alongside it gives total orders — the ratio reveals how often customers reorder
The JOIN runs before GROUP BY — rows are joined first, then collapsed into groups
Germany has a higher average order value than Canada despite fewer customers — grouping reveals this without manual calculation

GROUP BY with WHERE and ORDER BY

GROUP BY works with WHERE and ORDER BY in a specific sequence: WHERE filters rows first, then GROUP BY forms the groups from the remaining rows, then ORDER BY sorts the grouped result. Understanding this order is essential — you cannot filter on an aggregate value using WHERE. That requires HAVING, which is covered in the next lesson.

-- Full clause order: SELECT → FROM → WHERE → GROUP BY → ORDER BY
-- WHERE filters BEFORE grouping. It cannot reference aggregate results.
-- Monthly order summary for year 2024 only
SELECT
    YEAR(OrderDate)   AS OrderYear,    -- extract the year from the date
    MONTH(OrderDate)  AS OrderMonth,   -- extract the month number
    COUNT(*)          AS TotalOrders,  -- orders placed in this month
    SUM(Total)        AS Revenue,      -- total revenue for the month
    ROUND(AVG(Total), 2) AS AvgOrder   -- average order value for the month
FROM Orders
WHERE YEAR(OrderDate) = 2024           -- WHERE runs first — only 2024 rows enter the group
GROUP BY YEAR(OrderDate), MONTH(OrderDate)  -- group by year and month together
ORDER BY OrderYear, OrderMonth;             -- sort chronologically

OrderYear | OrderMonth | TotalOrders | Revenue | AvgOrder
----------|------------|-------------|----------|----------
2024 | 1 | 38 | 4820.40 | 126.85
2024 | 2 | 31 | 3940.20 | 127.10
2024 | 3 | 44 | 5610.75 | 127.52
2024 | 4 | 29 | 3680.00 | 126.90
2024 | 5 | 35 | 4420.50 | 126.30
2024 | 6 | 41 | 5180.25 | 126.35
(6 rows affected)

WHERE YEAR(OrderDate) = 2024 removes all non-2024 rows before any grouping happens — only 2024 rows are grouped
YEAR() and MONTH() are T-SQL date functions — they extract parts of a date value
Both YEAR(OrderDate) and MONTH(OrderDate) must appear in GROUP BY because they appear in SELECT without an aggregate
March (month 3) had the highest revenue at $5,610.75 — this kind of monthly trend is invisible without GROUP BY

GROUP BY with JOINs

GROUP BY is most powerful when combined with JOINs. You join tables to bring in the columns you want to group by, then aggregate across the joined result. The JOIN always runs before the GROUP BY — you are grouping the already-joined rows.

-- Which products generated the most revenue?
-- OrderItems holds the line items. We need to join Products to get the name.
-- Then GROUP BY product to get total revenue and units sold per product.
SELECT
    p.Name                        AS Product,       -- product name from Products table
    SUM(oi.Quantity)              AS UnitsSold,     -- total units sold across all orders
    SUM(oi.Quantity * oi.Price)   AS Revenue,       -- total revenue: qty × price per line
    ROUND(AVG(oi.Price), 2)       AS AvgSalePrice,  -- average price it actually sold at
    COUNT(DISTINCT oi.OrderId)    AS OrdersContaining  -- how many orders included this product
FROM OrderItems oi                        -- oi = alias for OrderItems
JOIN Products   p ON oi.ProductId = p.ProductId  -- join to get the product name
GROUP BY p.ProductId, p.Name              -- group by ID and name together
ORDER BY Revenue DESC;                    -- highest revenue products first

Product | UnitsSold | Revenue | AvgSalePrice | OrdersContaining
---------------------|-----------|-----------|--------------|------------------
Monitor 27-inch | 312 | 93556.80 | 299.86 | 289
Standing Desk | 187 | 65425.00 | 349.87 | 182
Mechanical Keyboard | 894 | 80388.60 | 89.92 | 876
Wireless Mouse | 1240 | 37189.60 | 29.99 | 1198
USB-C Hub | 983 | 24566.17 | 24.99 | 956
Notebook A5 | 2140 | 26750.00 | 12.50 | 1890
Ballpoint Pens | 3820 | 34291.80 | 8.98 | 2104
(7 rows affected)

Grouping by both p.ProductId and p.Name is the safe pattern — ProductId ensures uniqueness, Name is included so it appears in SELECT
SUM(oi.Quantity * oi.Price) calculates revenue per line item first, then sums across all lines for the product
Ballpoint Pens sold the most units (3,820) but Monitor 27-inch generated the most revenue ($93,556) — unit volume and revenue tell different stories
COUNT(DISTINCT oi.OrderId) shows how many separate orders contained each product — a spread metric, not just a volume metric

Common GROUP BY Mistakes

The most common error with GROUP BY is putting a column in SELECT that is not in GROUP BY and not wrapped in an aggregate function. SQL Server will reject this immediately with a clear error. Every non-aggregate column in SELECT must appear in GROUP BY — no exceptions.

-- MISTAKE: Brand is in SELECT but not in GROUP BY and not aggregated.
-- SQL Server will return: "Column 'Products.Brand' is invalid in the select list
-- because it is not contained in either an aggregate function or the GROUP BY clause."
-- WRONG -- do not run
SELECT Category, Brand, COUNT(*) AS Total  -- Brand is not in GROUP BY
FROM Products
GROUP BY Category;   -- only Category is grouped -- Brand has no home
-- FIX 1: Add Brand to GROUP BY (if you want one row per Category+Brand)
SELECT Category, Brand, COUNT(*) AS Total
FROM Products
GROUP BY Category, Brand;    -- now Brand is part of the grouping
-- FIX 2: Aggregate Brand instead (if you want one row per Category only)
SELECT
    Category,
    COUNT(DISTINCT Brand) AS UniqueBrands,  -- count how many distinct brands per category
    COUNT(*)              AS TotalProducts
FROM Products
GROUP BY Category;

-- FIX 1: GROUP BY Category, Brand
Category | Brand | Total
------------|------------|------
Electronics | KeyMaster | 3
Electronics | LogiTech | 5
Electronics | TechLink | 4
Electronics | ViewMax | 4
Stationery | PaperCo | 9
Stationery | WriteRight | 5
Furniture | ChairCo | 4
Furniture | DeskPro | 5
(8 rows affected)

-- FIX 2: Aggregate Brand
Category | UniqueBrands | TotalProducts
------------|--------------|---------------
Electronics | 5 | 18
Furniture | 2 | 9
Stationery | 2 | 14
Accessories | 3 | 7
Clothing | 4 | 5
(5 rows affected)

The rule is strict — every SELECT column must either be in GROUP BY or inside an aggregate function
Fix 1 makes Brand part of the group — you get more rows but each row is more specific
Fix 2 aggregates Brand — you keep one row per Category but lose the individual brand names
Which fix to use depends on the question you are answering — both are valid for different purposes

Lesson Summary

Concept	What It Does	Example
GROUP BY	Collapses rows sharing a value into one summary row	`GROUP BY Category` — one row per category
*COUNT()**	Counts all rows in the group including NULLs	`COUNT(*) AS TotalOrders`
COUNT(DISTINCT col)	Counts only unique values in that column per group	`COUNT(DISTINCT UserId)` — unique customers
SUM()	Adds up all values in a column for the group	`SUM(Total)` — total revenue per group
AVG()	Calculates the mean of a column for the group	`AVG(Price)` — average price per category
MIN() / MAX()	Returns the smallest or largest value per group	`MIN(Price)`, `MAX(Price)`
Multiple columns	Creates a group per unique combination of all listed columns	`GROUP BY Category, Brand`
Clause order	WHERE → GROUP BY → ORDER BY — WHERE filters before grouping	`WHERE Year = 2024 GROUP BY Month`
The rule	Every SELECT column must be in GROUP BY or inside an aggregate	Violating this causes an immediate SQL Server error
YEAR() / MONTH()	T-SQL date functions that extract parts of a date for grouping	`GROUP BY YEAR(OrderDate), MONTH(OrderDate)`

Practice Questions

Practice 1. What aggregate function counts every row in a group, including rows where other columns are NULL?

Practice 2. You write SELECT Category, Brand, COUNT(*) FROM Products GROUP BY Category — what happens when you run it?

Practice 3. What is the difference between COUNT(*) and COUNT(DISTINCT UserId)?

Practice 4. In the clause order WHERE → GROUP BY → ORDER BY, at what point does WHERE filter rows — before or after grouping?

Practice 5. Which T-SQL function extracts the month number from a date column?

Quiz

Quiz 1. You want one row per customer showing how many orders they placed. Which query is correct?

SELECT UserId, COUNT(*) AS Orders FROM Orders
SELECT UserId, COUNT(*) AS Orders FROM Orders GROUP BY UserId
SELECT COUNT(*) AS Orders FROM Orders WHERE UserId IS NOT NULL
SELECT UserId, Orders FROM Orders GROUP BY UserId

Quiz 2. GROUP BY Category, Brand produces one row per — what?

Unique Category value only
Unique Brand value only
Unique combination of Category and Brand
Every row in the table

Quiz 3. You want to filter rows to only 2024 orders before grouping by month. Which clause do you use?

HAVING
WHERE
GROUP BY
ORDER BY

Quiz 4. What does SUM(oi.Quantity * oi.Price) calculate when used with GROUP BY ProductId?

The average price of each product
Total revenue per product — multiplies quantity by price for each line then sums them per product
The total quantity sold across all products
The number of order lines per product

Quiz 5. Which statement about NULL values and GROUP BY is correct?

Rows with NULL in the grouping column are automatically excluded
NULL values are grouped together — all rows with NULL in the grouping column form their own group
GROUP BY treats NULL as zero in aggregate functions
Using GROUP BY on a nullable column causes an error

Next up — HAVING Clause - How to filter groups after aggregation — the step WHERE cannot do.

← Previous Course Index Next →

MS SQL Server