Pandas Lesson 15 – Strings | Dataplexa

String Operations in Pandas

In real datasets, text data is everywhere — customer names, product names, categories, cities, and more.

Pandas provides powerful string operations that allow you to clean, format, search, and transform text data efficiently.


Loading the Dataset

We continue using the same dataset used throughout this Pandas course.

import pandas as pd

df = pd.read_csv("dataplexa_pandas_sales.csv")

Why String Operations Matter

String operations are essential when:

  • Cleaning inconsistent text data
  • Standardizing names and categories
  • Extracting information from text fields
  • Filtering rows based on text patterns

Accessing String Methods with .str

Pandas string methods are accessed using the .str accessor.

Example: converting all product names to lowercase.

df["product_name"] = df["product_name"].str.lower()

This ensures consistent text formatting.


Changing Text Case

Common text transformations include:

  • lower() – convert to lowercase
  • upper() – convert to uppercase
  • title() – capitalize each word
df["product_name"].str.upper()
df["product_name"].str.title()

Removing Extra Spaces

Text data often contains unwanted spaces. Use strip(), lstrip(), and rstrip().

df["product_name"] = df["product_name"].str.strip()

This removes spaces from both ends of the text.


Replacing Text Values

Use str.replace() to modify specific parts of text.

Example: replace hyphens with spaces.

df["product_name"] = df["product_name"].str.replace("-", " ")

Checking if Text Contains a Pattern

The contains() method helps filter rows.

Example: find products containing the word "pro".

df[df["product_name"].str.contains("pro", case=False)]

This is commonly used in searching and filtering operations.


Splitting Text into Columns

You can split text into multiple parts using str.split().

Example: split category codes.

df["category_main"] = df["category"].str.split("_").str[0]

Extracting Substrings

Use str.slice() or str.extract() to extract parts of text.

df["product_code"] = df["product_id"].str.slice(0, 5)

Handling Missing Text Values

String operations automatically handle missing values, but it is good practice to fill them.

df["product_name"] = df["product_name"].fillna("Unknown")

Combining String Columns

You can combine multiple text columns easily.

df["product_label"] = df["product_name"] + " (" + df["category"] + ")"

Practice Exercise

Using the dataset:

  • Convert all product names to title case
  • Remove extra spaces
  • Search for a keyword in product names
  • Create a combined label column

What’s Next?

In the next lesson, you will learn how to work with date and time data in Pandas, including parsing dates and time-based operations.