Power BI Lesson 23 – Advanced Data Cleaning | Dataplexa

Power Query · Lesson 23

Advanced Cleaning Techniques

Basic cleaning removes blank rows and fixes column names. Advanced cleaning handles the subtler problems — cell-level errors that break type changes, null values that silently corrupt calculations, inconsistent text that prevents accurate grouping, and source data that is messy in different ways every time it arrives. This lesson gives you the complete toolkit for making Power Query queries that are robust under real-world conditions.

Understanding Cell-Level Errors

An error at the cell level is different from a step-level error. A step-level error breaks the entire query — nothing loads. A cell-level error is contained to one cell in one row — the rest of the column loads fine, but that cell shows the word "Error" in red. Both need to be handled, but in different ways.

Preview grid — cell-level errors visible in Revenue column

OrderID	CustomerName	Revenue	Status
1001	Alice Brown	1,200	Shipped
1002	Bob Singh	Error	Delivered
1003	Carol Lee	430	Processing
1004	David Kim	Error	Shipped
1005	Eve Patel	1,540	Delivered

2 errors in Revenue column — likely caused by text values ("N/A", "$-") that could not convert to Decimal Number

To see the exact error message for any red cell, click the cell in the preview. Power Query shows the error type and message below the grid. The three most common causes of cell errors are type conversion failures, division by zero, and out-of-range values.

Error type	Common cause	Example
DataFormat.Error	Type conversion failed — the value cannot be parsed as the target type	Revenue column contains "N/A" — cannot convert to Decimal Number
Expression.Error	A calculation produced an invalid result — often division by zero	Custom column divides Revenue by Quantity — some Quantity values are 0
Number.FromText error	Text-to-number conversion on a cell containing non-numeric characters	Revenue cell contains "$1,200" — the $ and comma prevent numeric conversion
Formula.Firewall	Privacy level mismatch between two queries being combined	A Private-level query is referenced by an Organisational-level query

Three Ways to Handle Cell Errors

Option 1 — Replace Errors

Replace Errors substitutes a fixed value for every error in the selected column. The replacement value becomes part of the loaded data — DAX measures will use this value in calculations. Choose null when you want errors to be ignored by aggregations, or 0 when you need a numeric placeholder.

Before — errors in Revenue

OrderID	Revenue
1001	1,200
1002	Error
1003	430
1004	Error

After — Replace Errors with null

OrderID	Revenue
1001	1,200
1002	null
1003	430
1004	null

How: Select the column → Transform tab → Replace Errors (or right-click the column header → Replace Errors) → enter the replacement value → OK. For null, leave the value field empty.

Option 2 — Remove Errors

Remove Errors deletes the entire row for any row that contains an error in the selected column. Use this when error rows are genuinely invalid and should not be included in any calculation — for example, rows where the primary key is an error and the row is meaningless without it.

Warning: Remove Errors silently discards rows. If the source data later has fewer errors (someone fixed the data), fewer rows are removed and your totals change. Always document why rows were removed. Replacing with null is usually safer than removing — the row stays in the table and can be investigated.

Option 3 — Fix the Source of Errors

The cleanest approach is to fix the root cause before the error occurs. If Revenue contains "N/A" strings that fail type conversion, add a Replace Values step before the Changed Type step — replace "N/A" with null first, then change the type. The type change now succeeds because null converts cleanly to any type.

// Robust pattern — fix before the type change, not after

// Step 1 — Replace Values: clean known bad text values first
// Transform tab → Replace Values
#"Replaced N/A" = Table.ReplaceValue(
    #"Promoted Headers",
    "N/A", null,          // find "N/A", replace with null
    Replacer.ReplaceValue,
    {"Revenue"}           // in the Revenue column only
),

// Step 2 — Replace Values: clean currency symbols
#"Removed Dollar Sign" = Table.ReplaceValue(
    #"Replaced N/A",
    "$", "",              // find "$", replace with empty string
    Replacer.ReplaceText, // ReplaceText for partial matches
    {"Revenue"}
),

// Step 3 — Remove commas from number formatting
#"Removed Comma" = Table.ReplaceValue(
    #"Removed Dollar Sign",
    ",", "",
    Replacer.ReplaceText,
    {"Revenue"}
),

// Step 4 — Now change type safely — no errors will occur
#"Changed Type" = Table.TransformColumnTypes(
    #"Removed Comma",
    {{"Revenue", type number}}
)

Revenue column — before and after cleaning pipeline

RAW VALUES (text)

"1200"
"N/A"
"430"
"$1,540"
"850"

AFTER CLEANING (number)

No errors — nulls are clean · SUM([Revenue]) = 4020 · null rows excluded from aggregation automatically

Handling Null Values

Null is Power Query's representation of a missing or unknown value. It is not zero, not an empty string, and not the word "null" — it is the complete absence of a value. Nulls behave differently depending on context, and understanding those differences prevents silent calculation errors.

Context	How null behaves	Practical implication
In SUM / AVERAGE	Null rows are excluded from the calculation	SUM ignores nulls correctly. AVERAGE divides by the count of non-null rows, not total rows — which is usually what you want.
In COUNT	COUNT does not count null rows — COUNTA counts non-blank, COUNTROWS counts all rows including nulls	COUNT([Revenue]) returns 3 if 2 rows have null Revenue. COUNTROWS(Orders) returns 5 regardless of nulls.
In text concatenation	Concatenating null with text using & produces an error in M code	[FirstName] & " " & [LastName] errors if either is null. Wrap with: (if [FirstName] = null then "" else [FirstName])
In relationships	Null key values in a fact table do not join to any row in the dimension table	Orders with null CustomerID have no customer in the model — they appear as "Blank" in visuals. Replace nulls with a known default value if you want them to map to a specific row.
In filters	Null is treated as a distinct value — filter dropdowns show "(blank)" as a separate option	When filtering out invalid rows, add a specific condition for null: filter where [Status] <> null AND [Status] <> "Cancelled".

Filling Nulls — Fill Down and Fill Up

A common pattern in exported spreadsheets is a merged cell that was exported as one value followed by nulls. The Department column might say "Sales" in the first row of a group, then null for all subsequent rows in that group. Fill Down replaces each null with the value from the row above — reconstructing the implied grouping.

Before — Department has nulls from merged cells

Department	Employee
Sales	Alice
null	Bob
null	Carol
Engineering	David
null	Eve

After — Fill Down applied to Department

Department	Employee
Sales	Alice
Sales	Bob
Sales	Carol
Engineering	David
Engineering	Eve

How: Select the column → Transform tab → Fill → Down (or Up). Fill Up is the mirror — it fills each null with the value from the row below, useful when group labels appear at the bottom of each group rather than the top.

Standardising Inconsistent Text

Inconsistent text in dimension columns silently breaks grouping, relationships, and filters. "New York", "new york", "NEW YORK", and "New York " (trailing space) are four different values in Power Query — they will not group together and will not match across tables. Fixing text consistency is one of the highest-impact cleaning steps you can do.

Problem	Fix in Power Query	M function
Mixed case	Transform → Format → Capitalize Each Word (or Lowercase / Uppercase depending on your standard)	Text.Proper([City])
Leading/trailing spaces	Transform → Format → Trim. Apply to every text column by default — invisible spaces are the most common cause of merge failures between tables.	Text.Trim([City])
Double spaces	Replace Values: find two spaces " " → replace with one space " ". Repeat until no double spaces remain. Or use a Custom Column with Text.Trim which also collapses internal spaces.	Text.Trim handles edges only — use Replace for internals
Abbreviations vs full names	Replace Values: "NY" → "New York", "CA" → "California". Create a lookup table and merge it if there are many abbreviations to standardise.	Table.ReplaceValue(...)
Non-printable characters	Transform → Format → Clean. Removes characters below ASCII 32 — tab characters, line breaks, null bytes — that are invisible in the preview but break matching.	Text.Clean([Column])

Column Quality — Finding Problems Fast

Before writing any cleaning steps, turn on Column Quality in the View tab. It shows a mini data quality bar below each column header — the percentage of values that are valid, in error, or empty. This gives you an immediate picture of which columns need attention and how severe the problems are.

View tab → Column Quality enabled

OrderID 100% valid	Revenue 60% valid40% error	CustomerName 80% valid20% empty	Status 100% valid
1001	1,200	Alice Brown	Shipped
1002	Error	null	Delivered

Revenue column has 40% errors → immediate priority for cleaning. CustomerName has 20% empty → investigate before proceeding.

Also useful: Column Distribution (View tab) shows a histogram of value frequencies — instantly reveals whether a text column has many near-duplicate values from inconsistent capitalisation or spacing. Column Profile (View tab) shows min, max, distinct count, and value distribution statistics for the selected column.

A Complete Robust Cleaning Template

// Robust cleaning sequence — apply to every new data source

// 1. Fix structure first
#"Removed Top Rows"    = Table.Skip(Source, 2),
#"Promoted Headers"    = Table.PromoteHeaders(#"Removed Top Rows", ...),
#"Removed Bottom Rows" = Table.RemoveLastN(#"Promoted Headers", 1),

// 2. Clean text columns BEFORE type changes
//    Trim removes leading/trailing spaces
//    Clean removes non-printable characters
#"Trimmed Text" = Table.TransformColumns(
    #"Removed Bottom Rows",
    {
        {"CustomerName", Text.Trim},
        {"City",         Text.Trim},
        {"Status",       Text.Trim}
    }
),
#"Cleaned Text" = Table.TransformColumns(
    #"Trimmed Text",
    {
        {"CustomerName", Text.Clean},
        {"City",         Text.Clean},
        {"Status",       Text.Clean}
    }
),

// 3. Standardise case
#"Proper Case" = Table.TransformColumns(
    #"Cleaned Text",
    {
        {"CustomerName", Text.Proper},
        {"City",         Text.Proper},
        {"Status",       Text.Proper}
    }
),

// 4. Replace known bad values before type changes
#"Replaced N/A"      = Table.ReplaceValue(#"Proper Case", "N/A", null, Replacer.ReplaceValue, {"Revenue"}),
#"Removed Dollar"    = Table.ReplaceValue(#"Replaced N/A", "$", "", Replacer.ReplaceText, {"Revenue"}),
#"Removed Comma"     = Table.ReplaceValue(#"Removed Dollar", ",", "", Replacer.ReplaceText, {"Revenue"}),

// 5. Now set types safely
#"Changed Type" = Table.TransformColumnTypes(
    #"Removed Comma",
    {{"OrderID", Int64.Type}, {"OrderDate", type date},
     {"Revenue", type number}, {"CustomerName", type text}}
),

// 6. Handle any remaining errors after type change
#"Replaced Errors" = Table.ReplaceErrorValues(
    #"Changed Type",
    {{"Revenue", null}}   // replace any remaining Revenue errors with null
),

// 7. Fill down merged-cell nulls
#"Filled Down Department" = Table.FillDown(#"Replaced Errors", {"Department"}),

// 8. Remove rows where key columns are null
#"Removed Null Orders" = Table.SelectRows(
    #"Filled Down Department",
    each [OrderID] <> null
)

What each phase achieves

Phase 1 — Structure

Junk rows removed, real headers promoted, footer row gone. The table shape is correct before any cleaning begins.

Phase 2 — Text cleaning

Every text column trimmed and cleaned before type changes. This prevents the most common source of DataFormat.Error — invisible characters that block numeric conversion.

Phase 3 — Casing

Consistent proper case across all dimension text columns. "alice brown" and "Alice Brown" are now the same value. Grouping and relationships work correctly.

Phase 4 — Replace before type

Known bad values replaced before type conversion. N/A → null, $ → empty, , → empty. No DataFormat.Error in step 5.

Phase 5-8 — Type + safety net

Types set cleanly. Any unexpected remaining errors caught and replaced with null. Merged-cell nulls filled. Rows with null primary keys removed as genuinely invalid.

Teacher's Note: The order of cleaning steps matters more than the steps themselves. The pattern that breaks the fewest queries in practice is: structure → clean text → replace bad values → change types → catch remaining errors. Every beginner's instinct is to change types first because it is the most visible step. Resist that instinct. Changing types before cleaning text is like painting a wall before plastering it — the errors underneath ruin the result. Clean first, type-convert second, catch errors third.

Practice

Practice 1 of 3

A Department column has "Sales" in the first row of each group and null in all subsequent rows — a result of merged cells in the original Excel file. To propagate the group label to every row, you select the column and apply Transform → Fill → ___.

Practice 2 of 3

To see at a glance which columns have errors, which are empty, and what percentage of values in each column are valid — all without clicking individual cells — you enable ___ Quality in the View tab.

Practice 3 of 3

A Revenue column contains the value "$1,200" as text. To convert it to a number without errors, you need to remove the $ and comma using Replace Values steps ___ the Changed Type step in the Applied Steps list.

Lesson Quiz

Quiz 1 of 3

You use Replace Errors with the value 0 on a Revenue column that has 15 error cells. Later you notice the average revenue per order is artificially low. What is the cause and what should you have used instead?

Replace Errors with 0 rounds all revenue values down to the nearest integer — use Replace Errors with null and then change the type to Decimal Number to restore precision Replacing errors with 0 includes those 15 orders in the AVERAGE calculation as $0 revenue orders, pulling the average down. Using null instead would exclude them from AVERAGE entirely — null rows are not counted in the denominator. Replace Errors with 0 converts the column type to Whole Number, which rounds all decimal revenue values and causes the average to appear lower than the true value The 15 error rows were removed from the table by the Replace Errors step — the low average is caused by fewer rows in the calculation, not by the replacement value

Quiz 2 of 3

A City column contains "new york", "New York", "NEW YORK", and "New York " (with a trailing space). After applying Trim and then Capitalize Each Word, how many distinct City values remain?

1 — Trim removes the trailing space and Capitalize Each Word normalises the casing, so all four become "New York" 2 — Trim handles the trailing space but Capitalize Each Word cannot fix "NEW YORK" because it is already capitalised — it remains distinct 3 — Trim only removes leading/trailing spaces and does not affect casing. You need a separate Lowercase step before Capitalize Each Word to normalise all four variants. 4 — Trim and Capitalize Each Word are formatting operations only and do not affect how Power Query counts distinct values for grouping purposes

Quiz 3 of 3

You build a Custom Column with the formula [FirstName] & " " & [LastName]. Some rows have null in LastName. What happens to those rows and how do you fix it?

Null is treated as an empty string by the & operator — those rows produce "Alice " with a trailing space, which Trim can clean up afterwards Null is ignored by the & operator — those rows produce "Alice" with no trailing space or error Concatenating with null using & produces an Expression.Error for those rows. Fix by wrapping each nullable column: (if [LastName] = null then "" else [LastName]) before concatenating. Null columns are automatically excluded from Custom Column calculations — the formula runs on non-null rows only and leaves null rows unchanged

Next up — Lesson 24 covers Replacing Errors and Handling Exceptions — a deeper look at error detection, conditional error handling with try…otherwise in M code, and building queries that recover gracefully when source data is unpredictable.

← Previous Course Index Next →

Power BI Course