String Operations in Pandas
In real datasets, text data is everywhere — customer names, product names, categories, cities, and more.
Pandas provides powerful string operations that allow you to clean, format, search, and transform text data efficiently.
Loading the Dataset
We continue using the same dataset used throughout this Pandas course.
import pandas as pd
df = pd.read_csv("dataplexa_pandas_sales.csv")
Why String Operations Matter
String operations are essential when:
- Cleaning inconsistent text data
- Standardizing names and categories
- Extracting information from text fields
- Filtering rows based on text patterns
Accessing String Methods with .str
Pandas string methods are accessed using the
.str accessor.
Example: converting all product names to lowercase.
df["product_name"] = df["product_name"].str.lower()
This ensures consistent text formatting.
Changing Text Case
Common text transformations include:
lower()– convert to lowercaseupper()– convert to uppercasetitle()– capitalize each word
df["product_name"].str.upper()
df["product_name"].str.title()
Removing Extra Spaces
Text data often contains unwanted spaces.
Use strip(), lstrip(), and rstrip().
df["product_name"] = df["product_name"].str.strip()
This removes spaces from both ends of the text.
Replacing Text Values
Use str.replace() to modify specific parts of text.
Example: replace hyphens with spaces.
df["product_name"] = df["product_name"].str.replace("-", " ")
Checking if Text Contains a Pattern
The contains() method helps filter rows.
Example: find products containing the word "pro".
df[df["product_name"].str.contains("pro", case=False)]
This is commonly used in searching and filtering operations.
Splitting Text into Columns
You can split text into multiple parts using str.split().
Example: split category codes.
df["category_main"] = df["category"].str.split("_").str[0]
Extracting Substrings
Use str.slice() or str.extract()
to extract parts of text.
df["product_code"] = df["product_id"].str.slice(0, 5)
Handling Missing Text Values
String operations automatically handle missing values, but it is good practice to fill them.
df["product_name"] = df["product_name"].fillna("Unknown")
Combining String Columns
You can combine multiple text columns easily.
df["product_label"] = df["product_name"] + " (" + df["category"] + ")"
Practice Exercise
Using the dataset:
- Convert all product names to title case
- Remove extra spaces
- Search for a keyword in product names
- Create a combined label column
What’s Next?
In the next lesson, you will learn how to work with date and time data in Pandas, including parsing dates and time-based operations.