NumPy Lesson 21 – Handling Missing Values | Dataplexa

Handling Missing Values in NumPy

In real-world datasets, missing values are very common. They can appear due to data collection errors, incomplete records, or system issues.

NumPy provides efficient tools to detect, analyze, and handle missing values properly.

What Are Missing Values?

Missing values represent unavailable or undefined data. In NumPy, missing values are usually represented as:

Handling missing values correctly is critical for accurate analysis.

Let us create a NumPy array that contains missing values.

import numpy as np

data = np.array([10, 20, np.nan, 40, np.nan, 60])
print(data)

Output:

[10. 20. nan 40. nan 60.]

Use np.isnan() to identify missing values.

missing_mask = np.isnan(data)
print(missing_mask)

Output:

[False False  True False  True False]

Each True indicates a missing value.

You can count missing values using np.sum().

missing_count = np.sum(np.isnan(data))
print(missing_count)

Output:

To remove missing values, use boolean indexing.

clean_data = data[~np.isnan(data)]
print(clean_data)

Output:

[10. 20. 40. 60.]

This method completely removes missing entries.

Instead of removing data, you may want to replace missing values.

filled_data = np.nan_to_num(data, nan=0)
print(filled_data)

Output:

[10. 20.  0. 40.  0. 60.]

A common strategy is replacing missing values with the mean.

mean_value = np.nanmean(data)
filled_mean = np.where(np.isnan(data), mean_value, data)
print(filled_mean)

Output:

[10. 20. 32.5 40. 32.5 60.]

This keeps the overall distribution more stable.

Missing values often appear in tabular data.

matrix = np.array([
    [1, 2, np.nan],
    [4, np.nan, 6],
    [7, 8, 9]
])

print(matrix)

Calculate column-wise means while ignoring missing values:

col_means = np.nanmean(matrix, axis=0)
print(col_means)

In the next lesson, you will learn about performance optimization techniques in NumPy to make your code faster and more efficient.