NumPy Lesson 6 – Boolean Indexing | Dataplexa

Boolean Indexing in NumPy

In real data analysis, you often need to filter data based on conditions. For example, selecting values greater than a threshold, removing invalid values, or extracting rows that meet specific criteria.

NumPy provides Boolean Indexing to perform such filtering efficiently.


What Is Boolean Indexing?

Boolean indexing uses a boolean condition (True or False) to select elements from an array.

The condition is applied element-wise, and only values that satisfy the condition are returned.


Basic Boolean Indexing Example

Let’s start with a simple numeric array.

import numpy as np

arr = np.array([5, 12, 18, 25, 30])
print(arr[arr > 15])

Output:

[18 25 30]

Here, the condition arr > 15 filters only values greater than 15.


Understanding the Boolean Mask

Behind the scenes, NumPy creates a boolean array (mask).

mask = arr > 15
print(mask)

Output:

[False False  True  True  True]

NumPy then uses this mask to extract only the values where the condition is True.


Using Multiple Conditions

You can combine multiple conditions using logical operators:

  • & – AND
  • | – OR
  • ~ – NOT
print(arr[(arr > 10) & (arr < 30)])

Output:

[12 18 25]

Parentheses are mandatory when combining conditions.


Boolean Indexing with Two-Dimensional Arrays

Boolean indexing works the same way with 2D arrays.

matrix = np.array([[10, 20, 30],
                   [5, 15, 25],
                   [0, 8, 40]])

print(matrix[matrix >= 20])

Output:

[20 30 25 40]

The condition is applied to every element across all rows and columns.


Replacing Values Using Boolean Indexing

Boolean indexing is also useful for modifying data.

arr[arr < 15] = 0
print(arr)

Output:

[ 0  0 18 25 30]

This technique is commonly used in data cleaning and preprocessing.


Real-World Use Case Example

Imagine a dataset of exam scores where values below 40 are considered failed.

scores = np.array([78, 45, 32, 90, 55, 28])

passed = scores[scores >= 40]
failed = scores[scores < 40]

print(passed)
print(failed)

Output:

[78 45 90 55]
[32 28]

This shows how Boolean indexing simplifies conditional data selection.


Practice Exercise

Exercise

Create a NumPy array of 10 random integers between 1 and 100 and:

  • Select values greater than 50
  • Select values between 20 and 70
  • Replace values below 30 with 0

Expected Outcome

You should be able to filter and modify arrays using conditions confidently.


What’s Next?

In the next lesson, you will learn about Basic Operations, including arithmetic operations and comparisons in NumPy.