Pandas Lesson 7 – Missing Values | Dataplexa

Handling Missing Values in Pandas

In real-world data, missing values are unavoidable. They appear due to incomplete data collection, system errors, or manual entry mistakes.

If missing values are ignored, they can lead to wrong calculations, incorrect averages, and misleading analysis.

What Are Missing Values?

Pandas represents missing data using:

Internally, Pandas treats all of these as missing values.

We continue using the same dataset introduced earlier. Make sure the file is downloaded from the datasets page.

import pandas as pd

df = pd.read_csv("dataplexa_pandas_sales.csv")

Before fixing missing data, you must identify it.

Use isnull() to check missing values.

df.isnull()

This returns a table of True and False showing where data is missing.

To see how many missing values exist in each column:

df.isnull().sum()

This step helps decide which columns need cleaning.

If missing values are few, removing rows may be acceptable.

df.dropna()

This removes every row that contains at least one missing value.

If an entire column contains too many missing values, it may be better to remove the column.

df.dropna(axis=1)

Here, axis=1 refers to columns.

Instead of deleting data, you can replace missing values. This preserves the dataset size.

Example: Replace missing sales values with zero.

df["Sales"].fillna(0)

For numeric columns, filling with statistical values is often more meaningful.

Example: Fill missing sales with the average value.

df["Sales"].fillna(df["Sales"].mean())

Median can be used if the data contains outliers.

You can fill missing values across all columns at once.

df.fillna(0)

For ordered or time-based data, Pandas can use nearby values.

Forward fill uses the previous value.

df.fillna(method="ffill")

Backward fill uses the next available value.

df.fillna(method="bfill")

There is no single correct method. Choose based on data meaning.

Using the dataset:

Now that missing values are handled, the next lesson focuses on data cleaning techniques to make datasets consistent and reliable.