Excel Course
Data Cleaning in Power Query
Data cleaning is where most analysts spend the majority of their time — and it is the area where Power Query delivers the most dramatic improvement over manual Excel work. Handling nulls, fixing inconsistent values, removing duplicates, unpivoting wide tables, and reshaping messy exports are all tasks that used to require formulas, manual editing, or VBA. In Power Query, every cleaning operation is a recorded step that runs automatically on every refresh. Fix the data once, and it stays fixed.
Handling Null Values
Power Query represents missing values as null — not blank cells, not zeros, not dashes. Null is a distinct state meaning "no value present." You can see nulls clearly in the preview pane as grey italic null labels. There are three main strategies for dealing with them: remove the rows that contain them, replace them with a default value, or fill them down from the previous non-null value above.
Use when: A null in that column means the entire row is invalid or incomplete.
Use when: A missing value has a known default — zero for missing sales, "Unknown" for missing category.
Use when: Nulls appear because a grouped label was only entered once — fill repeats it for every row in the group.
| Department | Name | Salary |
|---|---|---|
| Finance | Priya | £61,000 |
| null | James | £67,000 |
| null | Maria | £58,000 |
| Operations | Sarah | £48,000 |
| null | David | £52,000 |
Fill → Down
| Department | Name | Salary |
|---|---|---|
| Finance | Priya | £61,000 |
| Finance | James | £67,000 |
| Finance | Maria | £58,000 |
| Operations | Sarah | £48,000 |
| Operations | David | £52,000 |
Removing Duplicates
Power Query can remove duplicate rows based on all columns or based on specific key columns. Removing based on all columns is the equivalent of Excel's Remove Duplicates feature. Removing based on key columns keeps the first occurrence of each unique key value — useful for deduplicating a table where the same ID appears multiple times and you want to keep one record per ID.
Remove duplicate rows (all columns must match to be a duplicate):
Home → Remove Rows → Remove Duplicates
Remove duplicates based on a key column only:
Select the key column (e.g. Employee ID)
Right-click → Remove Duplicates
Power Query keeps the first occurrence of each unique key value
Keep only duplicates (find rows that DO repeat):
Home → Remove Rows → Keep Duplicates
Useful for auditing — finding IDs or values that appear more than once
A common scenario: a data extract contains one row per transaction, but multiple transactions share the same customer ID. Removing duplicates on the Customer ID column keeps one row per customer — the first one in the data. If you need the most recent transaction rather than the first, sort by date descending first, then remove duplicates on the ID column.
Unpivoting — Turning Wide Tables Tall
One of the most powerful and frequently needed transformations in Power Query is unpivoting. Wide tables — where months, categories, or regions are spread across multiple columns — need to be restructured into tall tables before they can be analysed properly in PivotTables, charts, or formulas. Unpivoting collapses those column headers into rows.
| Rep | Jan | Feb | Mar | Apr |
|---|---|---|---|---|
| Priya | £32,000 | £29,000 | £31,000 | £27,000 |
| James | £28,000 | £31,000 | £33,000 | £25,000 |
| Rep | Month | Sales |
|---|---|---|
| Priya | Jan | £32,000 |
| Priya | Feb | £29,000 |
| Priya | Mar | £31,000 |
| Priya | Apr | £27,000 |
| James | Jan | £28,000 |
| James | Feb | £31,000 |
Unpivot Other Columns — The Safer Method
There are three unpivot options and it is worth knowing the difference. The safest for future-proofing is "Unpivot Other Columns" — select the columns you want to keep as rows (like Rep), then right-click and choose Unpivot Other Columns. This way, if new month columns are added to the source in the future, they are automatically unpivoted too without any changes to the query.
Three unpivot options — right-click a column header:
Unpivot Columns
Unpivots the columns you selected. New columns added to the source later
will NOT be unpivoted automatically.
Unpivot Other Columns ← recommended
Select the columns to KEEP as identifiers (Rep, Product etc),
then use this option. Everything else is unpivoted.
New columns added to the source ARE automatically unpivoted.
Unpivot Only Selected Columns
Unpivots exactly the selected columns regardless of what else exists.
Best when you want to unpivot a specific fixed set only.
The Unpivot Other Columns pattern is the right default for any report that involves time series data — monthly sales, weekly metrics, quarterly results — because the source almost always gains new period columns over time. Build the query to handle that growth from the start and you will never need to edit it when January turns to February turns to March.
Fixing Inconsistent Text Values
Inconsistent casing and naming is one of the most common data quality issues in business data — especially in category columns that are typed manually. "North", "north", "NORTH", and " North" are four different values to Excel but should all be the same. Power Query gives you transformation options for all of these in one place.
| Operation | What it does | Example |
|---|---|---|
| Lowercase | Converts all text to lower case | NORTH → north |
| Uppercase | Converts all text to upper case | north → NORTH |
| Capitalize Each Word | Title-cases the value (like PROPER in Excel) | north east → North East |
| Trim | Removes leading and trailing spaces | " North " → "North" |
| Clean | Removes non-printable characters | Removes hidden control characters |
| Replace Values | Substitutes a specific value for another | "N/A" → null, "Nrth" → "North" |
The order matters when applying multiple text fixes. The recommended sequence for a messy text column is: Clean first (remove hidden characters) → Trim (remove spaces) → Capitalize Each Word or Lowercase (standardise casing) → Replace Values (fix any specific known bad values). Applying them in this order means each step builds on already-cleaner data.
Conditional Column — Power Query's IF Statement
Conditional Column is Power Query's point-and-click equivalent of a nested IF formula. You define rules — if this column equals this value, put this result — and Power Query builds the M code for you. It is in Add Column → Conditional Column.
Pivoting — Tall Back to Wide
Pivoting is the reverse of unpivoting. If you have tall data with a column of category labels you want to spread out into separate columns, use Transform → Pivot Column. You choose which column becomes the new headers and which column provides the values to fill those headers.
Tall data:
Rep | Quarter | Sales
Priya | Q1 | £32,000
Priya | Q2 | £29,000
James | Q1 | £28,000
James | Q2 | £31,000
Pivot: Select the Quarter column → Transform → Pivot Column
Values column: Sales
Aggregate: Sum (or Don't Aggregate for 1:1 data)
Result:
Rep | Q1 | Q2
Priya | £32,000 | £29,000
James | £28,000 | £31,000
Pivot and Unpivot are mirror operations and you will use them in different situations. Unpivot is far more common in data cleaning work because most raw data arrives wide and needs to go tall for analysis. Pivot is useful when building output tables or summary views that people want to read in a wide format.
🟠 Practice
Q1. Your Department column has nulls wherever the department name was only entered for the first row of each group. What Power Query operation fills those nulls from the value above?
Q2. You have a wide table with columns: Rep, Jan, Feb, Mar, Apr, May, Jun. You want to unpivot the month columns into rows, and you want new months to be included automatically in future. Which unpivot option should you use and how?
Q3. The Region column contains a mix of "north", "NORTH", and "North". What is the quickest way in Power Query to standardise all values to "North"?
🟣 Quiz
Q1. What is the key advantage of "Unpivot Other Columns" over "Unpivot Columns"?
Q2. You want to remove duplicate rows from a query but keep only the most recent entry for each Customer ID. The data has a Date column. What is the correct sequence of steps?
Q3. What does null mean in Power Query, and why is it important to handle it before loading data to Excel?
Next up — Merging Tables in Power Query, where you will learn how to join two queries together using match columns — the Power Query equivalent of VLOOKUP, but far more powerful and fully automated on every refresh.