Regular Expressions | Dataplexa

Regular Expressions (RegEx) in Python

Regular Expressions, commonly called RegEx, are a powerful tool used to search, match, and manipulate text. They allow you to find specific patterns inside large text data — something that is extremely useful in data cleaning, validation, form processing, and automation. If you have ever tried to check if an email is valid, extract dates, or find phone numbers inside documents, RegEx is the right tool for the job.

Python provides RegEx support through the built-in re module, which includes many functions for pattern matching. Even though RegEx looks complicated at first, once you understand the basic rules, it becomes one of the most powerful skills in programming.

What Is a Pattern?

A pattern is a sequence of characters that describes what you want to find inside a text. For example, a pattern can represent digits, letters, repeated characters, or even a complete format like an email or date. Instead of checking characters one by one, RegEx lets you describe the format and automatically find matches.

Importing the re Module

To use RegEx in Python, you must import the built-in re module. This module contains all the functions required for searching, replacing, and splitting text using patterns. Without importing it, Python will not recognize any RegEx operations.

import re

The re.search() Function

The re.search() function looks for the first location in the entire text where the pattern appears. If the pattern is found, it returns a match object; otherwise, it returns None. This is commonly used when you want to verify whether a text contains something specific, such as a keyword or number.

Example: Search for a Word

import re

text = "Welcome to Dataplexa Python Course"
match = re.search("Python", text)
print(match)

If the word exists, the match object shows the index location. This helps confirm whether the given text contains a particular pattern.

The re.findall() Function

The re.findall() function returns a list of every match found in the text. This is useful when a pattern repeats multiple times, such as finding all numbers, all words, or all email IDs in a document. Instead of manually looping through the text, RegEx finds everything at once.

Example: Find All Numbers in Text

import re

text = "Order numbers: 120, 450, 882"
numbers = re.findall(r"\d+", text)
print(numbers)

Here \d+ means “one or more digits.” This is extremely useful when cleaning data that contains numeric IDs, phone numbers, or codes.

The re.sub() Function

The re.sub() function replaces all occurrences of a pattern with a new value. It is widely used for cleaning text, removing unwanted symbols, or formatting data in automation pipelines. You can remove symbols, mask sensitive data, or convert text formats using this function.

Example: Remove All Digits From Text

import re

text = "User123 joined in 2025"
cleaned = re.sub(r"\d", "", text)
print(cleaned)

The pattern \d represents any digit, and replacing it with an empty string removes all numbers from the text.

Basic RegEx Symbols You Must Know

RegEx uses special characters to describe patterns. These symbols allow you to match digits, words, spaces, boundaries, and repeated characters efficiently. Learning these core symbols will make all RegEx operations easier to understand.

\d → Matches any digit (0–9)
\w → Matches letters, digits, and underscore
\s → Matches space or whitespace
+ → One or more repetitions
* → Zero or more repetitions
{n} → Exactly n repetitions
{n,m} → Between n and m repetitions
^ → Start of string
$ → End of string

These are the building blocks of RegEx. By combining them, you can match complex patterns like phone numbers, passwords, filenames, and much more.

Example: Validate an Email Format

Email validation is one of the most common uses of RegEx. It helps check whether a string follows standard email rules (characters, @ symbol, domain name). Even though email patterns can be very complex, here is a beginner-friendly version.

import re

email = "alex@example.com"
pattern = r"^[\w\.-]+@[\w\.-]+\.\w+$"

if re.match(pattern, email):
    print("Valid email")
else:
    print("Invalid email")

This pattern checks the essential structure of an email without going too deep into rare cases. It is suitable for most basic applications.

Example: Extract All Words Starting With Capital Letters

This is commonly required when processing names, titles, or locations from text data. Capital letters often represent proper nouns, and RegEx helps extract them easily.

import re

text = "Alex visited New York and met Emma"
result = re.findall(r"[A-Z][a-z]+", text)
print(result)

This returns all words that start with uppercase letters followed by lowercase letters.

Real-World Uses of Regular Expressions

Removing unwanted characters from data
Validating inputs like email, password, phone numbers
Extracting dates, names, and numbers from documents
Log file analysis and pattern detection
Text cleaning before training machine learning models

📝 Practice Exercises

Exercise 1

Extract all digits from the text: "Room 402 will be cleaned at 9 PM"

Exercise 2

Find all words that start with the letter “D” in the text "Dataplexa develops digital learning tools"

Exercise 3

Replace all spaces in the sentence with hyphens (-)

Exercise 4

Validate whether "emma.smith@company.org" follows email format

✅ Practice Answers

Answer 1

import re
text = "Room 402 will be cleaned at 9 PM"
digits = re.findall(r"\d+", text)
print(digits)

Answer 2

import re
text = "Dataplexa develops digital learning tools"
words = re.findall(r"\bD\w+", text)
print(words)

Answer 3

import re
sentence = "Python makes coding fun"
result = re.sub(r"\s", "-", sentence)
print(result)

Answer 4

import re
email = "emma.smith@company.org"
pattern = r"^[\w\.-]+@[\w\.-]+\.\w+$"
print("Valid" if re.match(pattern, email) else "Invalid")

← Previous Lesson Python Index Next ➜