GROUP BY Clause| Dataplexa

GROUP BY Clause

So far, we have used aggregate functions like COUNT, SUM, and AVG on the entire table.

The GROUP BY clause allows us to apply aggregate functions per category. This is one of the most powerful features in SQL.


Why GROUP BY is Needed

Consider this question:

  • What is the average salary per department?

Using only aggregate functions, we can calculate the overall average salary — but not department-wise averages.

This is exactly what GROUP BY solves.


What Does GROUP BY Do?

The GROUP BY clause:

  • Groups rows that share the same value
  • Applies aggregate functions to each group
  • Returns one row per group

Basic syntax:

SELECT column_name, AGGREGATE_FUNCTION(column)
FROM table_name
GROUP BY column_name;
  

Example Table

We will continue using the employees table:

+----+----------+------------+--------+
| id | name     | department | salary |
+----+----------+------------+--------+
| 1  | Alice    | IT         | 75000  |
| 2  | Bob      | IT         | 72000  |
| 3  | Charlie  | HR         | 60000  |
| 4  | Diana    | Finance    | 70000  |
| 5  | Evan     | HR         | 58000  |
+----+----------+------------+--------+
  

GROUP BY in Action

Example: Average salary per department.

SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
  

How this works internally:

  1. Rows are grouped by department
  2. AVG(salary) is calculated for each group
  3. One row is returned per department

GROUP BY with COUNT

Example: Number of employees in each department.

SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;
  

This tells you how many employees work in each department.


GROUP BY with SUM

Example: Total salary paid per department.

SELECT department, SUM(salary) AS total_salary
FROM employees
GROUP BY department;
  

Important Rule: SELECT vs GROUP BY

When using GROUP BY:

  • Every column in SELECT must either be:
    • Included in GROUP BY, or
    • Used inside an aggregate function

❌ Incorrect:

SELECT name, AVG(salary)
FROM employees
GROUP BY department;
  

This is invalid because name is neither grouped nor aggregated.

✅ Correct:

SELECT department, AVG(salary)
FROM employees
GROUP BY department;
  

GROUP BY Multiple Columns

You can group by more than one column.

Example: Count employees by department and salary.

SELECT department, salary, COUNT(*) AS count
FROM employees
GROUP BY department, salary;
  

This creates groups based on unique combinations of department and salary.


Execution Order (Important)

SQL processes queries in this logical order:

  1. FROM
  2. WHERE
  3. GROUP BY
  4. SELECT
  5. ORDER BY

This explains why WHERE filters rows before grouping.


Common Beginner Mistakes

  • Forgetting to include columns in GROUP BY
  • Using GROUP BY without aggregate functions
  • Confusing WHERE and GROUP BY
  • Assuming GROUP BY sorts data (it does not)

Why GROUP BY is Powerful

GROUP BY allows you to:

  • Create summaries and reports
  • Analyze trends per category
  • Build dashboards and KPIs
  • Answer complex business questions

What’s Next?

In the next lesson, we will learn the HAVING clause, which allows us to filter grouped results.