MS SQL Server
GROUP BY
What Is GROUP BY?
GROUP BY collapses multiple rows that share a common value into a single summary row. Instead of seeing every individual order, you see one row per customer. Instead of every product sale, you see one row per category. You always pair GROUP BY with an aggregate function — COUNT(), SUM(), AVG(), MIN(), or MAX() — which calculates a single value across all the rows in each group.
Without GROUP BY, aggregate functions collapse the entire result into one row. With GROUP BY, they calculate one value per group. This is the foundation of almost every business report — totals per category, counts per customer, averages per month.
- GROUP BY defines the grouping — every distinct value in that column becomes one output row
- Every column in SELECT must either be in GROUP BY or wrapped in an aggregate function
- NULL values are treated as a group of their own — all NULLs land in the same bucket
- ORDER BY can be added after GROUP BY to sort the summarised result
-- The simplest GROUP BY — count how many products exist per category.
-- Without GROUP BY, COUNT(*) would return just one number for the whole table.
-- With GROUP BY Category, we get one count per category instead.
SELECT
Category, -- the grouping column — one row per distinct value
COUNT(*) AS TotalProducts -- aggregate: how many rows fall into each group
FROM Products
GROUP BY Category -- collapse all rows with the same Category into one
ORDER BY TotalProducts DESC; -- show busiest categories first-------------|-------------
Electronics | 18
Stationery | 14
Furniture | 9
Accessories | 7
Clothing | 5
(5 rows affected)
- Five categories exist in the Products table — one output row per distinct
Categoryvalue COUNT(*)counts every row in each group — including rows where other columns are NULLORDER BY TotalProducts DESCsorts by the aggregate result — Electronics has the most products- If you removed
GROUP BY Categorythe query would fail —Categorymust either be grouped or aggregated
Grouping with Multiple Aggregate Functions
You can include as many aggregate functions as you need in a single GROUP BY query. Each one runs independently across the same groups. This lets you build a complete summary in one statement — count, total, average, minimum, and maximum all at once.
-- Full category summary — multiple aggregates in one query.
-- Each aggregate calculates independently across the same groups.
SELECT
Category,
COUNT(*) AS TotalProducts, -- how many products in the category
SUM(Stock) AS TotalStock, -- combined units across all products
ROUND(AVG(Price), 2) AS AvgPrice, -- average price, rounded to 2 decimal places
MIN(Price) AS CheapestPrice, -- lowest price in this category
MAX(Price) AS MostExpensive -- highest price in this category
FROM Products
GROUP BY Category
ORDER BY AvgPrice DESC; -- most expensive categories at the top------------|---------------|------------|----------|---------------|---------------
Furniture | 9 | 184 | 289.99 | 49.99 | 899.99
Electronics | 18 | 1842 | 87.45 | 8.99 | 1299.99
Clothing | 5 | 310 | 64.80 | 19.99 | 129.99
Accessories | 7 | 523 | 38.20 | 5.99 | 89.99
Stationery | 14 | 3108 | 14.75 | 1.99 | 49.99
(5 rows affected)
- All five aggregates run across the same five category groups — the GROUP BY is evaluated once and shared
ROUND(AVG(Price), 2)nests a function inside an aggregate — AVG runs first, then ROUND trims the result to 2 decimal places- Furniture has only 9 products but the highest average price — grouping reveals patterns invisible in raw row data
- Electronics has the widest price range ($8.99 to $1,299.99) — MIN and MAX together show spread, not just the average
Grouping by Multiple Columns
You can GROUP BY more than one column. The engine creates a group for every combination of values across all the grouped columns. If you group by Category and Brand, you get one row per unique Category-Brand pair — not one per Category and not one per Brand.
-- Group by two columns — one row per unique Category + Brand combination.
-- This drills deeper than grouping by Category alone.
SELECT
Category, -- first grouping column
Brand, -- second grouping column
COUNT(*) AS Products, -- count of products for this specific brand+category pair
SUM(Stock) AS Stock, -- total stock for this brand in this category
MIN(Price) AS MinPrice, -- cheapest product from this brand in this category
MAX(Price) AS MaxPrice -- most expensive product from this brand in this category
FROM Products
WHERE Brand IS NOT NULL -- exclude products with no brand assigned
GROUP BY Category, Brand -- group on the combination of both columns
ORDER BY Category, Products DESC; -- sort by category then most products first------------|------------|----------|-------|----------|----------
Electronics | LogiTech | 5 | 487 | 24.99 | 89.99
Electronics | ViewMax | 4 | 127 | 199.99 | 1299.99
Electronics | TechLink | 4 | 312 | 8.99 | 49.99
Electronics | KeyMaster | 3 | 184 | 69.99 | 149.99
Electronics | SoundWave | 2 | 96 | 29.99 | 79.99
Furniture | DeskPro | 5 | 98 | 149.99 | 899.99
Furniture | ChairCo | 4 | 86 | 49.99 | 449.99
Stationery | PaperCo | 9 | 1840 | 1.99 | 24.99
Stationery | WriteRight | 5 | 1268 | 2.99 | 49.99
(9 rows affected)
- Nine rows — one per unique Category and Brand combination found in the data
- The order of columns in GROUP BY does not change the result —
GROUP BY Brand, Categoryproduces identical groups - ViewMax has only 4 Electronics products but the highest MaxPrice — useful for spotting premium niche brands
WHERE Brand IS NOT NULLruns before grouping — rows with no brand are excluded before groups are formed
GROUP BY with COUNT DISTINCT
COUNT(*) counts every row in the group. COUNT(DISTINCT column) counts only the unique values in that column within each group. Use it when you want to know "how many different X does each group have" rather than "how many rows does each group have".
-- How many distinct customers placed orders per country?
-- COUNT(*) would count orders. COUNT(DISTINCT UserId) counts unique customers.
SELECT
u.Country, -- group by the customer's country
COUNT(DISTINCT o.UserId) AS UniqueCustomers, -- distinct customers who ordered
COUNT(*) AS TotalOrders, -- total number of orders placed
SUM(o.Total) AS Revenue, -- total revenue from this country
ROUND(AVG(o.Total), 2) AS AvgOrderValue -- average order size
FROM Orders o -- o is an alias for the Orders table
JOIN Users u ON o.UserId = u.UserId -- join to get the customer's country
GROUP BY u.Country
ORDER BY Revenue DESC; -- highest revenue countries first---------------|-----------------|-------------|-----------|---------------
United States | 38 | 142 | 18420.50 | 129.72
United Kingdom | 22 | 89 | 11240.00 | 126.29
Germany | 15 | 61 | 7830.75 | 128.37
Canada | 12 | 48 | 5920.20 | 123.34
Australia | 9 | 34 | 4210.00 | 123.82
France | 8 | 29 | 3480.60 | 120.02
(6 rows affected)
COUNT(DISTINCT o.UserId)counts unique customers — the US has 38 unique buyers, not just 142 order rowsCOUNT(*)alongside it gives total orders — the ratio reveals how often customers reorder- The JOIN runs before GROUP BY — rows are joined first, then collapsed into groups
- Germany has a higher average order value than Canada despite fewer customers — grouping reveals this without manual calculation
GROUP BY with WHERE and ORDER BY
GROUP BY works with WHERE and ORDER BY in a specific sequence: WHERE filters rows first, then GROUP BY forms the groups from the remaining rows, then ORDER BY sorts the grouped result. Understanding this order is essential — you cannot filter on an aggregate value using WHERE. That requires HAVING, which is covered in the next lesson.
-- Full clause order: SELECT → FROM → WHERE → GROUP BY → ORDER BY
-- WHERE filters BEFORE grouping. It cannot reference aggregate results.
-- Monthly order summary for year 2024 only
SELECT
YEAR(OrderDate) AS OrderYear, -- extract the year from the date
MONTH(OrderDate) AS OrderMonth, -- extract the month number
COUNT(*) AS TotalOrders, -- orders placed in this month
SUM(Total) AS Revenue, -- total revenue for the month
ROUND(AVG(Total), 2) AS AvgOrder -- average order value for the month
FROM Orders
WHERE YEAR(OrderDate) = 2024 -- WHERE runs first — only 2024 rows enter the group
GROUP BY YEAR(OrderDate), MONTH(OrderDate) -- group by year and month together
ORDER BY OrderYear, OrderMonth; -- sort chronologically----------|------------|-------------|----------|----------
2024 | 1 | 38 | 4820.40 | 126.85
2024 | 2 | 31 | 3940.20 | 127.10
2024 | 3 | 44 | 5610.75 | 127.52
2024 | 4 | 29 | 3680.00 | 126.90
2024 | 5 | 35 | 4420.50 | 126.30
2024 | 6 | 41 | 5180.25 | 126.35
(6 rows affected)
WHERE YEAR(OrderDate) = 2024removes all non-2024 rows before any grouping happens — only 2024 rows are groupedYEAR()andMONTH()are T-SQL date functions — they extract parts of a date value- Both
YEAR(OrderDate)andMONTH(OrderDate)must appear in GROUP BY because they appear in SELECT without an aggregate - March (month 3) had the highest revenue at $5,610.75 — this kind of monthly trend is invisible without GROUP BY
GROUP BY with JOINs
GROUP BY is most powerful when combined with JOINs. You join tables to bring in the columns you want to group by, then aggregate across the joined result. The JOIN always runs before the GROUP BY — you are grouping the already-joined rows.
-- Which products generated the most revenue?
-- OrderItems holds the line items. We need to join Products to get the name.
-- Then GROUP BY product to get total revenue and units sold per product.
SELECT
p.Name AS Product, -- product name from Products table
SUM(oi.Quantity) AS UnitsSold, -- total units sold across all orders
SUM(oi.Quantity * oi.Price) AS Revenue, -- total revenue: qty × price per line
ROUND(AVG(oi.Price), 2) AS AvgSalePrice, -- average price it actually sold at
COUNT(DISTINCT oi.OrderId) AS OrdersContaining -- how many orders included this product
FROM OrderItems oi -- oi = alias for OrderItems
JOIN Products p ON oi.ProductId = p.ProductId -- join to get the product name
GROUP BY p.ProductId, p.Name -- group by ID and name together
ORDER BY Revenue DESC; -- highest revenue products first---------------------|-----------|-----------|--------------|------------------
Monitor 27-inch | 312 | 93556.80 | 299.86 | 289
Standing Desk | 187 | 65425.00 | 349.87 | 182
Mechanical Keyboard | 894 | 80388.60 | 89.92 | 876
Wireless Mouse | 1240 | 37189.60 | 29.99 | 1198
USB-C Hub | 983 | 24566.17 | 24.99 | 956
Notebook A5 | 2140 | 26750.00 | 12.50 | 1890
Ballpoint Pens | 3820 | 34291.80 | 8.98 | 2104
(7 rows affected)
- Grouping by both
p.ProductIdandp.Nameis the safe pattern —ProductIdensures uniqueness,Nameis included so it appears in SELECT SUM(oi.Quantity * oi.Price)calculates revenue per line item first, then sums across all lines for the product- Ballpoint Pens sold the most units (3,820) but Monitor 27-inch generated the most revenue ($93,556) — unit volume and revenue tell different stories
COUNT(DISTINCT oi.OrderId)shows how many separate orders contained each product — a spread metric, not just a volume metric
Common GROUP BY Mistakes
The most common error with GROUP BY is putting a column in SELECT that is not in GROUP BY and not wrapped in an aggregate function. SQL Server will reject this immediately with a clear error. Every non-aggregate column in SELECT must appear in GROUP BY — no exceptions.
-- MISTAKE: Brand is in SELECT but not in GROUP BY and not aggregated.
-- SQL Server will return: "Column 'Products.Brand' is invalid in the select list
-- because it is not contained in either an aggregate function or the GROUP BY clause."
-- WRONG -- do not run
SELECT Category, Brand, COUNT(*) AS Total -- Brand is not in GROUP BY
FROM Products
GROUP BY Category; -- only Category is grouped -- Brand has no home
-- FIX 1: Add Brand to GROUP BY (if you want one row per Category+Brand)
SELECT Category, Brand, COUNT(*) AS Total
FROM Products
GROUP BY Category, Brand; -- now Brand is part of the grouping
-- FIX 2: Aggregate Brand instead (if you want one row per Category only)
SELECT
Category,
COUNT(DISTINCT Brand) AS UniqueBrands, -- count how many distinct brands per category
COUNT(*) AS TotalProducts
FROM Products
GROUP BY Category;Category | Brand | Total
------------|------------|------
Electronics | KeyMaster | 3
Electronics | LogiTech | 5
Electronics | TechLink | 4
Electronics | ViewMax | 4
Stationery | PaperCo | 9
Stationery | WriteRight | 5
Furniture | ChairCo | 4
Furniture | DeskPro | 5
(8 rows affected)
-- FIX 2: Aggregate Brand
Category | UniqueBrands | TotalProducts
------------|--------------|---------------
Electronics | 5 | 18
Furniture | 2 | 9
Stationery | 2 | 14
Accessories | 3 | 7
Clothing | 4 | 5
(5 rows affected)
- The rule is strict — every SELECT column must either be in GROUP BY or inside an aggregate function
- Fix 1 makes Brand part of the group — you get more rows but each row is more specific
- Fix 2 aggregates Brand — you keep one row per Category but lose the individual brand names
- Which fix to use depends on the question you are answering — both are valid for different purposes
Lesson Summary
| Concept | What It Does | Example |
|---|---|---|
| GROUP BY | Collapses rows sharing a value into one summary row | GROUP BY Category — one row per category |
| COUNT(*) | Counts all rows in the group including NULLs | COUNT(*) AS TotalOrders |
| COUNT(DISTINCT col) | Counts only unique values in that column per group | COUNT(DISTINCT UserId) — unique customers |
| SUM() | Adds up all values in a column for the group | SUM(Total) — total revenue per group |
| AVG() | Calculates the mean of a column for the group | AVG(Price) — average price per category |
| MIN() / MAX() | Returns the smallest or largest value per group | MIN(Price), MAX(Price) |
| Multiple columns | Creates a group per unique combination of all listed columns | GROUP BY Category, Brand |
| Clause order | WHERE → GROUP BY → ORDER BY — WHERE filters before grouping | WHERE Year = 2024 GROUP BY Month |
| The rule | Every SELECT column must be in GROUP BY or inside an aggregate | Violating this causes an immediate SQL Server error |
| YEAR() / MONTH() | T-SQL date functions that extract parts of a date for grouping | GROUP BY YEAR(OrderDate), MONTH(OrderDate) |
Practice Questions
Practice 1. What aggregate function counts every row in a group, including rows where other columns are NULL?
Practice 2. You write SELECT Category, Brand, COUNT(*) FROM Products GROUP BY Category — what happens when you run it?
Practice 3. What is the difference between COUNT(*) and COUNT(DISTINCT UserId)?
Practice 4. In the clause order WHERE → GROUP BY → ORDER BY, at what point does WHERE filter rows — before or after grouping?
Practice 5. Which T-SQL function extracts the month number from a date column?
Quiz
Quiz 1. You want one row per customer showing how many orders they placed. Which query is correct?
Quiz 2. GROUP BY Category, Brand produces one row per — what?
Quiz 3. You want to filter rows to only 2024 orders before grouping by month. Which clause do you use?
Quiz 4. What does SUM(oi.Quantity * oi.Price) calculate when used with GROUP BY ProductId?
Quiz 5. Which statement about NULL values and GROUP BY is correct?
Next up — HAVING Clause - How to filter groups after aggregation — the step WHERE cannot do.