Power BI Lesson 23 – Advanced Data Cleaning | Dataplexa
Power Query · Lesson 23

Advanced Cleaning Techniques

Basic cleaning removes blank rows and fixes column names. Advanced cleaning handles the subtler problems — cell-level errors that break type changes, null values that silently corrupt calculations, inconsistent text that prevents accurate grouping, and source data that is messy in different ways every time it arrives. This lesson gives you the complete toolkit for making Power Query queries that are robust under real-world conditions.

Understanding Cell-Level Errors

An error at the cell level is different from a step-level error. A step-level error breaks the entire query — nothing loads. A cell-level error is contained to one cell in one row — the rest of the column loads fine, but that cell shows the word "Error" in red. Both need to be handled, but in different ways.

Preview grid — cell-level errors visible in Revenue column
OrderID CustomerName Revenue Status
1001Alice Brown1,200Shipped
1002Bob SinghErrorDelivered
1003Carol Lee430Processing
1004David KimErrorShipped
1005Eve Patel1,540Delivered
2 errors in Revenue column — likely caused by text values ("N/A", "$-") that could not convert to Decimal Number

To see the exact error message for any red cell, click the cell in the preview. Power Query shows the error type and message below the grid. The three most common causes of cell errors are type conversion failures, division by zero, and out-of-range values.

Error type Common cause Example
DataFormat.Error Type conversion failed — the value cannot be parsed as the target type Revenue column contains "N/A" — cannot convert to Decimal Number
Expression.Error A calculation produced an invalid result — often division by zero Custom column divides Revenue by Quantity — some Quantity values are 0
Number.FromText error Text-to-number conversion on a cell containing non-numeric characters Revenue cell contains "$1,200" — the $ and comma prevent numeric conversion
Formula.Firewall Privacy level mismatch between two queries being combined A Private-level query is referenced by an Organisational-level query

Three Ways to Handle Cell Errors

Option 1 — Replace Errors

Replace Errors substitutes a fixed value for every error in the selected column. The replacement value becomes part of the loaded data — DAX measures will use this value in calculations. Choose null when you want errors to be ignored by aggregations, or 0 when you need a numeric placeholder.

Before — errors in Revenue
OrderIDRevenue
10011,200
1002Error
1003430
1004Error
After — Replace Errors with null
OrderIDRevenue
10011,200
1002null
1003430
1004null
How: Select the column → Transform tab → Replace Errors (or right-click the column header → Replace Errors) → enter the replacement value → OK. For null, leave the value field empty.

Option 2 — Remove Errors

Remove Errors deletes the entire row for any row that contains an error in the selected column. Use this when error rows are genuinely invalid and should not be included in any calculation — for example, rows where the primary key is an error and the row is meaningless without it.

Warning: Remove Errors silently discards rows. If the source data later has fewer errors (someone fixed the data), fewer rows are removed and your totals change. Always document why rows were removed. Replacing with null is usually safer than removing — the row stays in the table and can be investigated.

Option 3 — Fix the Source of Errors

The cleanest approach is to fix the root cause before the error occurs. If Revenue contains "N/A" strings that fail type conversion, add a Replace Values step before the Changed Type step — replace "N/A" with null first, then change the type. The type change now succeeds because null converts cleanly to any type.

// Robust pattern — fix before the type change, not after

// Step 1 — Replace Values: clean known bad text values first
// Transform tab → Replace Values
#"Replaced N/A" = Table.ReplaceValue(
    #"Promoted Headers",
    "N/A", null,          // find "N/A", replace with null
    Replacer.ReplaceValue,
    {"Revenue"}           // in the Revenue column only
),

// Step 2 — Replace Values: clean currency symbols
#"Removed Dollar Sign" = Table.ReplaceValue(
    #"Replaced N/A",
    "$", "",              // find "$", replace with empty string
    Replacer.ReplaceText, // ReplaceText for partial matches
    {"Revenue"}
),

// Step 3 — Remove commas from number formatting
#"Removed Comma" = Table.ReplaceValue(
    #"Removed Dollar Sign",
    ",", "",
    Replacer.ReplaceText,
    {"Revenue"}
),

// Step 4 — Now change type safely — no errors will occur
#"Changed Type" = Table.TransformColumnTypes(
    #"Removed Comma",
    {{"Revenue", type number}}
)
Revenue column — before and after cleaning pipeline
RAW VALUES (text)
"1200"
"N/A"
"430"
"$1,540"
"850"
AFTER CLEANING (number)
1200
null
430
1540
850
No errors — nulls are clean · SUM([Revenue]) = 4020 · null rows excluded from aggregation automatically

Handling Null Values

Null is Power Query's representation of a missing or unknown value. It is not zero, not an empty string, and not the word "null" — it is the complete absence of a value. Nulls behave differently depending on context, and understanding those differences prevents silent calculation errors.

Context How null behaves Practical implication
In SUM / AVERAGE Null rows are excluded from the calculation SUM ignores nulls correctly. AVERAGE divides by the count of non-null rows, not total rows — which is usually what you want.
In COUNT COUNT does not count null rows — COUNTA counts non-blank, COUNTROWS counts all rows including nulls COUNT([Revenue]) returns 3 if 2 rows have null Revenue. COUNTROWS(Orders) returns 5 regardless of nulls.
In text concatenation Concatenating null with text using & produces an error in M code [FirstName] & " " & [LastName] errors if either is null. Wrap with: (if [FirstName] = null then "" else [FirstName])
In relationships Null key values in a fact table do not join to any row in the dimension table Orders with null CustomerID have no customer in the model — they appear as "Blank" in visuals. Replace nulls with a known default value if you want them to map to a specific row.
In filters Null is treated as a distinct value — filter dropdowns show "(blank)" as a separate option When filtering out invalid rows, add a specific condition for null: filter where [Status] <> null AND [Status] <> "Cancelled".

Filling Nulls — Fill Down and Fill Up

A common pattern in exported spreadsheets is a merged cell that was exported as one value followed by nulls. The Department column might say "Sales" in the first row of a group, then null for all subsequent rows in that group. Fill Down replaces each null with the value from the row above — reconstructing the implied grouping.

Before — Department has nulls from merged cells
DepartmentEmployee
SalesAlice
nullBob
nullCarol
EngineeringDavid
nullEve
After — Fill Down applied to Department
DepartmentEmployee
SalesAlice
SalesBob
SalesCarol
EngineeringDavid
EngineeringEve
How: Select the column → Transform tab → Fill → Down (or Up). Fill Up is the mirror — it fills each null with the value from the row below, useful when group labels appear at the bottom of each group rather than the top.

Standardising Inconsistent Text

Inconsistent text in dimension columns silently breaks grouping, relationships, and filters. "New York", "new york", "NEW YORK", and "New York " (trailing space) are four different values in Power Query — they will not group together and will not match across tables. Fixing text consistency is one of the highest-impact cleaning steps you can do.

Problem Fix in Power Query M function
Mixed case Transform → Format → Capitalize Each Word (or Lowercase / Uppercase depending on your standard) Text.Proper([City])
Leading/trailing spaces Transform → Format → Trim. Apply to every text column by default — invisible spaces are the most common cause of merge failures between tables. Text.Trim([City])
Double spaces Replace Values: find two spaces " " → replace with one space " ". Repeat until no double spaces remain. Or use a Custom Column with Text.Trim which also collapses internal spaces. Text.Trim handles edges only — use Replace for internals
Abbreviations vs full names Replace Values: "NY" → "New York", "CA" → "California". Create a lookup table and merge it if there are many abbreviations to standardise. Table.ReplaceValue(...)
Non-printable characters Transform → Format → Clean. Removes characters below ASCII 32 — tab characters, line breaks, null bytes — that are invisible in the preview but break matching. Text.Clean([Column])

Column Quality — Finding Problems Fast

Before writing any cleaning steps, turn on Column Quality in the View tab. It shows a mini data quality bar below each column header — the percentage of values that are valid, in error, or empty. This gives you an immediate picture of which columns need attention and how severe the problems are.

View tab → Column Quality enabled
OrderID
100% valid
Revenue
60% valid40% error
CustomerName
80% valid20% empty
Status
100% valid
10011,200Alice BrownShipped
1002ErrornullDelivered
Revenue column has 40% errors → immediate priority for cleaning. CustomerName has 20% empty → investigate before proceeding.
Also useful: Column Distribution (View tab) shows a histogram of value frequencies — instantly reveals whether a text column has many near-duplicate values from inconsistent capitalisation or spacing. Column Profile (View tab) shows min, max, distinct count, and value distribution statistics for the selected column.

A Complete Robust Cleaning Template

// Robust cleaning sequence — apply to every new data source

// 1. Fix structure first
#"Removed Top Rows"    = Table.Skip(Source, 2),
#"Promoted Headers"    = Table.PromoteHeaders(#"Removed Top Rows", ...),
#"Removed Bottom Rows" = Table.RemoveLastN(#"Promoted Headers", 1),

// 2. Clean text columns BEFORE type changes
//    Trim removes leading/trailing spaces
//    Clean removes non-printable characters
#"Trimmed Text" = Table.TransformColumns(
    #"Removed Bottom Rows",
    {
        {"CustomerName", Text.Trim},
        {"City",         Text.Trim},
        {"Status",       Text.Trim}
    }
),
#"Cleaned Text" = Table.TransformColumns(
    #"Trimmed Text",
    {
        {"CustomerName", Text.Clean},
        {"City",         Text.Clean},
        {"Status",       Text.Clean}
    }
),

// 3. Standardise case
#"Proper Case" = Table.TransformColumns(
    #"Cleaned Text",
    {
        {"CustomerName", Text.Proper},
        {"City",         Text.Proper},
        {"Status",       Text.Proper}
    }
),

// 4. Replace known bad values before type changes
#"Replaced N/A"      = Table.ReplaceValue(#"Proper Case", "N/A", null, Replacer.ReplaceValue, {"Revenue"}),
#"Removed Dollar"    = Table.ReplaceValue(#"Replaced N/A", "$", "", Replacer.ReplaceText, {"Revenue"}),
#"Removed Comma"     = Table.ReplaceValue(#"Removed Dollar", ",", "", Replacer.ReplaceText, {"Revenue"}),

// 5. Now set types safely
#"Changed Type" = Table.TransformColumnTypes(
    #"Removed Comma",
    {{"OrderID", Int64.Type}, {"OrderDate", type date},
     {"Revenue", type number}, {"CustomerName", type text}}
),

// 6. Handle any remaining errors after type change
#"Replaced Errors" = Table.ReplaceErrorValues(
    #"Changed Type",
    {{"Revenue", null}}   // replace any remaining Revenue errors with null
),

// 7. Fill down merged-cell nulls
#"Filled Down Department" = Table.FillDown(#"Replaced Errors", {"Department"}),

// 8. Remove rows where key columns are null
#"Removed Null Orders" = Table.SelectRows(
    #"Filled Down Department",
    each [OrderID] <> null
)
What each phase achieves
Phase 1 — Structure
Junk rows removed, real headers promoted, footer row gone. The table shape is correct before any cleaning begins.
Phase 2 — Text cleaning
Every text column trimmed and cleaned before type changes. This prevents the most common source of DataFormat.Error — invisible characters that block numeric conversion.
Phase 3 — Casing
Consistent proper case across all dimension text columns. "alice brown" and "Alice Brown" are now the same value. Grouping and relationships work correctly.
Phase 4 — Replace before type
Known bad values replaced before type conversion. N/A → null, $ → empty, , → empty. No DataFormat.Error in step 5.
Phase 5-8 — Type + safety net
Types set cleanly. Any unexpected remaining errors caught and replaced with null. Merged-cell nulls filled. Rows with null primary keys removed as genuinely invalid.

Teacher's Note: The order of cleaning steps matters more than the steps themselves. The pattern that breaks the fewest queries in practice is: structure → clean text → replace bad values → change types → catch remaining errors. Every beginner's instinct is to change types first because it is the most visible step. Resist that instinct. Changing types before cleaning text is like painting a wall before plastering it — the errors underneath ruin the result. Clean first, type-convert second, catch errors third.

Practice

Practice 1 of 3

A Department column has "Sales" in the first row of each group and null in all subsequent rows — a result of merged cells in the original Excel file. To propagate the group label to every row, you select the column and apply Transform → Fill → ___.

Practice 2 of 3

To see at a glance which columns have errors, which are empty, and what percentage of values in each column are valid — all without clicking individual cells — you enable ___ Quality in the View tab.

Practice 3 of 3

A Revenue column contains the value "$1,200" as text. To convert it to a number without errors, you need to remove the $ and comma using Replace Values steps ___ the Changed Type step in the Applied Steps list.

Lesson Quiz

Quiz 1 of 3

You use Replace Errors with the value 0 on a Revenue column that has 15 error cells. Later you notice the average revenue per order is artificially low. What is the cause and what should you have used instead?

Quiz 2 of 3

A City column contains "new york", "New York", "NEW YORK", and "New York " (with a trailing space). After applying Trim and then Capitalize Each Word, how many distinct City values remain?

Quiz 3 of 3

You build a Custom Column with the formula [FirstName] & " " & [LastName]. Some rows have null in LastName. What happens to those rows and how do you fix it?

Next up — Lesson 24 covers Replacing Errors and Handling Exceptions — a deeper look at error detection, conditional error handling with try…otherwise in M code, and building queries that recover gracefully when source data is unpredictable.