Pandas Lesson 6 – Filtering Data | Dataplexa

Filtering Data in Pandas

In the previous lesson, you learned how to select rows and columns. Selection lets you choose data by position or label.

Filtering goes one step further. It allows you to extract rows based on conditions, just like applying rules to your dataset.


Why Filtering Is Important

In real-world datasets, you often need to:

  • Find records that meet specific criteria
  • Remove irrelevant or unwanted rows
  • Analyze subsets of data
  • Prepare clean data for reporting and visualization

Filtering helps you focus only on meaningful data.


Loading the Dataset

We continue using the same dataset used in earlier lessons.

import pandas as pd

df = pd.read_csv("dataplexa_pandas_sales.csv")

Filtering Rows Using Conditions

The most common way to filter data is by using conditions inside square brackets.

Example: Filter rows where sales are greater than 500.

df[df["Sales"] > 500]

This returns only the rows where the Sales value is greater than 500.


Filtering with Multiple Conditions

You can combine conditions using logical operators:

  • & for AND
  • | for OR

Example: Sales greater than 500 AND quantity greater than 5.

df[(df["Sales"] > 500) & (df["Quantity"] > 5)]

Always wrap each condition in parentheses when combining them.


Filtering Using OR Condition

To filter rows that match at least one condition, use OR.

df[(df["Region"] == "East") | (df["Region"] == "West")]

This returns rows from either the East or West region.


Filtering Using isin()

When checking against multiple values, isin() makes filtering easier.

df[df["Category"].isin(["Electronics", "Furniture"])]

This selects rows where the category matches any value in the list.


Filtering Using String Conditions

For text columns, Pandas provides string methods.

Example: Filter products that contain the word "Laptop".

df[df["Product"].str.contains("Laptop")]

This is useful for searching patterns in text data.


Filtering with Not Equal Conditions

You can exclude values using the != operator.

df[df["Region"] != "South"]

This removes all rows where the region is South.


Chaining Filters with Selection

You can combine filtering and column selection together.

df[df["Sales"] > 500][["Product", "Sales", "Region"]]

This first filters rows and then selects specific columns.


Practice Exercise

Exercise

Using the dataset:

  • Filter rows where quantity is greater than 3
  • Filter rows where sales are less than 300
  • Select only product and sales columns for filtered rows
  • Filter rows where region is not "North"

What’s Next?

Now that you can filter data, the next lesson will focus on handling missing values, an essential step before any serious analysis.