Excel Lesson 28 – Data Cleaning | Dataplexa
Lesson 28 · Power Query Practical

Data Cleaning in Power Query

Data cleaning is where most analysts spend the majority of their time — and it is the area where Power Query delivers the most dramatic improvement over manual Excel work. Handling nulls, fixing inconsistent values, removing duplicates, unpivoting wide tables, and reshaping messy exports are all tasks that used to require formulas, manual editing, or VBA. In Power Query, every cleaning operation is a recorded step that runs automatically on every refresh. Fix the data once, and it stays fixed.

Handling Null Values

Power Query represents missing values as null — not blank cells, not zeros, not dashes. Null is a distinct state meaning "no value present." You can see nulls clearly in the preview pane as grey italic null labels. There are three main strategies for dealing with them: remove the rows that contain them, replace them with a default value, or fill them down from the previous non-null value above.

Three Ways to Handle Null Values
Remove Rows with Null
Right-click column → Remove Empty or use Home → Remove Rows → Remove Blank Rows.

Use when: A null in that column means the entire row is invalid or incomplete.
Replace Null with a Value
Right-click column → Replace Values → Value to find: null → Replace with: 0 (or "Unknown" etc).

Use when: A missing value has a known default — zero for missing sales, "Unknown" for missing category.
Fill Down
Select column → Transform → Fill → Down (or Up).

Use when: Nulls appear because a grouped label was only entered once — fill repeats it for every row in the group.
Fill Down — Fixing a Grouped Export
❌ Before — Department only entered once per group
Department Name Salary
FinancePriya£61,000
nullJames£67,000
nullMaria£58,000
OperationsSarah£48,000
nullDavid£52,000
Transform →
Fill → Down
✅ After — every row has its department
Department Name Salary
FinancePriya£61,000
FinanceJames£67,000
FinanceMaria£58,000
OperationsSarah£48,000
OperationsDavid£52,000

Removing Duplicates

Power Query can remove duplicate rows based on all columns or based on specific key columns. Removing based on all columns is the equivalent of Excel's Remove Duplicates feature. Removing based on key columns keeps the first occurrence of each unique key value — useful for deduplicating a table where the same ID appears multiple times and you want to keep one record per ID.

Remove duplicate rows (all columns must match to be a duplicate):
  Home → Remove Rows → Remove Duplicates

Remove duplicates based on a key column only:
  Select the key column (e.g. Employee ID)
  Right-click → Remove Duplicates
  Power Query keeps the first occurrence of each unique key value

Keep only duplicates (find rows that DO repeat):
  Home → Remove Rows → Keep Duplicates
  Useful for auditing — finding IDs or values that appear more than once
Remove Duplicates on Employee ID: 8 rows with 2 duplicate IDs → 6 unique rows retained (first occurrence of each ID kept)

A common scenario: a data extract contains one row per transaction, but multiple transactions share the same customer ID. Removing duplicates on the Customer ID column keeps one row per customer — the first one in the data. If you need the most recent transaction rather than the first, sort by date descending first, then remove duplicates on the ID column.

Unpivoting — Turning Wide Tables Tall

One of the most powerful and frequently needed transformations in Power Query is unpivoting. Wide tables — where months, categories, or regions are spread across multiple columns — need to be restructured into tall tables before they can be analysed properly in PivotTables, charts, or formulas. Unpivoting collapses those column headers into rows.

Unpivot — Wide to Tall
❌ Wide format — months as columns (cannot PivotTable or chart cleanly)
Rep Jan Feb Mar Apr
Priya£32,000£29,000£31,000£27,000
James£28,000£31,000£33,000£25,000
Select Jan, Feb, Mar, Apr → Right-click → Unpivot Columns
✅ Tall format — one row per rep per month, ready for PivotTables and charts
Rep Month Sales
PriyaJan£32,000
PriyaFeb£29,000
PriyaMar£31,000
PriyaApr£27,000
JamesJan£28,000
JamesFeb£31,000
2 reps × 4 months = 8 rows total · Month column auto-named "Attribute" → rename to "Month"

Unpivot Other Columns — The Safer Method

There are three unpivot options and it is worth knowing the difference. The safest for future-proofing is "Unpivot Other Columns" — select the columns you want to keep as rows (like Rep), then right-click and choose Unpivot Other Columns. This way, if new month columns are added to the source in the future, they are automatically unpivoted too without any changes to the query.

Three unpivot options — right-click a column header:

Unpivot Columns
  Unpivots the columns you selected. New columns added to the source later
  will NOT be unpivoted automatically.

Unpivot Other Columns  ← recommended
  Select the columns to KEEP as identifiers (Rep, Product etc),
  then use this option. Everything else is unpivoted.
  New columns added to the source ARE automatically unpivoted.

Unpivot Only Selected Columns
  Unpivots exactly the selected columns regardless of what else exists.
  Best when you want to unpivot a specific fixed set only.

The Unpivot Other Columns pattern is the right default for any report that involves time series data — monthly sales, weekly metrics, quarterly results — because the source almost always gains new period columns over time. Build the query to handle that growth from the start and you will never need to edit it when January turns to February turns to March.

Fixing Inconsistent Text Values

Inconsistent casing and naming is one of the most common data quality issues in business data — especially in category columns that are typed manually. "North", "north", "NORTH", and " North" are four different values to Excel but should all be the same. Power Query gives you transformation options for all of these in one place.

Text Cleaning Operations — Transform → Format
Operation What it does Example
Lowercase Converts all text to lower case NORTH → north
Uppercase Converts all text to upper case north → NORTH
Capitalize Each Word Title-cases the value (like PROPER in Excel) north east → North East
Trim Removes leading and trailing spaces " North " → "North"
Clean Removes non-printable characters Removes hidden control characters
Replace Values Substitutes a specific value for another "N/A" → null, "Nrth" → "North"

The order matters when applying multiple text fixes. The recommended sequence for a messy text column is: Clean first (remove hidden characters) → Trim (remove spaces) → Capitalize Each Word or Lowercase (standardise casing) → Replace Values (fix any specific known bad values). Applying them in this order means each step builds on already-cleaner data.

Conditional Column — Power Query's IF Statement

Conditional Column is Power Query's point-and-click equivalent of a nested IF formula. You define rules — if this column equals this value, put this result — and Power Query builds the M code for you. It is in Add Column → Conditional Column.

Conditional Column — Region Grouping
Add Column → Conditional Column
Column Name
Operator
Value
Output
Region
equals
North
UK
Region
equals
East
UK
Region
equals
West
US
Otherwise:
Other

Pivoting — Tall Back to Wide

Pivoting is the reverse of unpivoting. If you have tall data with a column of category labels you want to spread out into separate columns, use Transform → Pivot Column. You choose which column becomes the new headers and which column provides the values to fill those headers.

Tall data:
  Rep   | Quarter | Sales
  Priya | Q1      | £32,000
  Priya | Q2      | £29,000
  James | Q1      | £28,000
  James | Q2      | £31,000

Pivot: Select the Quarter column → Transform → Pivot Column
  Values column: Sales
  Aggregate: Sum (or Don't Aggregate for 1:1 data)

Result:
  Rep   | Q1      | Q2
  Priya | £32,000 | £29,000
  James | £28,000 | £31,000
Pivot on Quarter column → Q1 and Q2 become column headers with Sales values filled in per rep

Pivot and Unpivot are mirror operations and you will use them in different situations. Unpivot is far more common in data cleaning work because most raw data arrives wide and needs to go tall for analysis. Pivot is useful when building output tables or summary views that people want to read in a wide format.

💡 Teacher's Note
The unpivot transformation is genuinely one of the most valuable things you can learn in Power Query — and it is something that cannot be done cleanly with formulas at all. If you have worked with any kind of business reporting data, you will have seen the wide-format problem: a spreadsheet with months or products spread across dozens of columns that you need to turn into a proper database-style table. In Power Query this takes about four clicks and thirty seconds. Understanding when data is in the wrong shape — and knowing that Unpivot Other Columns is the fix — is one of those skills that immediately makes you look like an expert to anyone watching.

🟠 Practice

Q1. Your Department column has nulls wherever the department name was only entered for the first row of each group. What Power Query operation fills those nulls from the value above?




Q2. You have a wide table with columns: Rep, Jan, Feb, Mar, Apr, May, Jun. You want to unpivot the month columns into rows, and you want new months to be included automatically in future. Which unpivot option should you use and how?




Q3. The Region column contains a mix of "north", "NORTH", and "North". What is the quickest way in Power Query to standardise all values to "North"?



🟣 Quiz

Q1. What is the key advantage of "Unpivot Other Columns" over "Unpivot Columns"?







Q2. You want to remove duplicate rows from a query but keep only the most recent entry for each Customer ID. The data has a Date column. What is the correct sequence of steps?







Q3. What does null mean in Power Query, and why is it important to handle it before loading data to Excel?






Next up — Merging Tables in Power Query, where you will learn how to join two queries together using match columns — the Power Query equivalent of VLOOKUP, but far more powerful and fully automated on every refresh.