Pandas Lesson 7 – Missing Values | Dataplexa

Handling Missing Values in Pandas

In real-world data, missing values are unavoidable. They appear due to incomplete data collection, system errors, or manual entry mistakes.

If missing values are ignored, they can lead to wrong calculations, incorrect averages, and misleading analysis.


What Are Missing Values?

Pandas represents missing data using:

  • NaN (Not a Number)
  • None
  • Empty cells in CSV or Excel files

Internally, Pandas treats all of these as missing values.


Loading the Dataset

We continue using the same dataset introduced earlier. Make sure the file is downloaded from the datasets page.

import pandas as pd

df = pd.read_csv("dataplexa_pandas_sales.csv")

Detecting Missing Values

Before fixing missing data, you must identify it.

Use isnull() to check missing values.

df.isnull()

This returns a table of True and False showing where data is missing.


Counting Missing Values

To see how many missing values exist in each column:

df.isnull().sum()

This step helps decide which columns need cleaning.


Removing Rows with Missing Values

If missing values are few, removing rows may be acceptable.

df.dropna()

This removes every row that contains at least one missing value.


Removing Columns with Missing Values

If an entire column contains too many missing values, it may be better to remove the column.

df.dropna(axis=1)

Here, axis=1 refers to columns.


Filling Missing Values

Instead of deleting data, you can replace missing values. This preserves the dataset size.

Example: Replace missing sales values with zero.

df["Sales"].fillna(0)

Filling with Mean or Median

For numeric columns, filling with statistical values is often more meaningful.

Example: Fill missing sales with the average value.

df["Sales"].fillna(df["Sales"].mean())

Median can be used if the data contains outliers.


Filling Missing Values in the Entire Dataset

You can fill missing values across all columns at once.

df.fillna(0)

Forward Fill and Backward Fill

For ordered or time-based data, Pandas can use nearby values.

Forward fill uses the previous value.

df.fillna(method="ffill")

Backward fill uses the next available value.

df.fillna(method="bfill")

Choosing the Right Strategy

There is no single correct method. Choose based on data meaning.

  • Drop rows if missing values are rare
  • Fill values if data loss is risky
  • Use mean or median for numeric columns
  • Use forward/backward fill for ordered data

Practice Exercise

Using the dataset:

  • Identify columns with missing values
  • Remove rows with missing values
  • Fill missing sales values using the mean
  • Fill missing region values with "Unknown"

What’s Next?

Now that missing values are handled, the next lesson focuses on data cleaning techniques to make datasets consistent and reliable.