Filtering Data in Pandas
In the previous lesson, you learned how to select rows and columns. Selection lets you choose data by position or label.
Filtering goes one step further. It allows you to extract rows based on conditions, just like applying rules to your dataset.
Why Filtering Is Important
In real-world datasets, you often need to:
- Find records that meet specific criteria
- Remove irrelevant or unwanted rows
- Analyze subsets of data
- Prepare clean data for reporting and visualization
Filtering helps you focus only on meaningful data.
Loading the Dataset
We continue using the same dataset used in earlier lessons.
import pandas as pd
df = pd.read_csv("dataplexa_pandas_sales.csv")
Filtering Rows Using Conditions
The most common way to filter data is by using conditions inside square brackets.
Example: Filter rows where sales are greater than 500.
df[df["Sales"] > 500]
This returns only the rows where the Sales value
is greater than 500.
Filtering with Multiple Conditions
You can combine conditions using logical operators:
&for AND|for OR
Example: Sales greater than 500 AND quantity greater than 5.
df[(df["Sales"] > 500) & (df["Quantity"] > 5)]
Always wrap each condition in parentheses when combining them.
Filtering Using OR Condition
To filter rows that match at least one condition, use OR.
df[(df["Region"] == "East") | (df["Region"] == "West")]
This returns rows from either the East or West region.
Filtering Using isin()
When checking against multiple values,
isin() makes filtering easier.
df[df["Category"].isin(["Electronics", "Furniture"])]
This selects rows where the category matches any value in the list.
Filtering Using String Conditions
For text columns, Pandas provides string methods.
Example: Filter products that contain the word "Laptop".
df[df["Product"].str.contains("Laptop")]
This is useful for searching patterns in text data.
Filtering with Not Equal Conditions
You can exclude values using the != operator.
df[df["Region"] != "South"]
This removes all rows where the region is South.
Chaining Filters with Selection
You can combine filtering and column selection together.
df[df["Sales"] > 500][["Product", "Sales", "Region"]]
This first filters rows and then selects specific columns.
Practice Exercise
Exercise
Using the dataset:
- Filter rows where quantity is greater than 3
- Filter rows where sales are less than 300
- Select only product and sales columns for filtered rows
- Filter rows where region is not "North"
What’s Next?
Now that you can filter data, the next lesson will focus on handling missing values, an essential step before any serious analysis.