Python Lesson 23 – Advanced Sets | Dataplexa

Advanced Sets

You were introduced to sets in Lesson 13 — unordered collections of unique items. Now it is time to go deeper. Sets in Python are not just a way to remove duplicates. They come with a full suite of mathematical operations — unions, intersections, differences, and more — that let you compare and combine collections with remarkable speed and clarity.

This lesson covers every set operation you will encounter professionally, explains the performance advantages that make sets the right tool for membership testing, and shows you set comprehensions as a natural extension of what you already know.

Quick Recap — What Makes a Set Unique

Before diving into operations, a quick grounding in what sets are and are not:

  • Sets store only unique values — duplicates are silently discarded
  • Sets are unordered — you cannot index or slice them
  • Set elements must be hashable — strings, numbers, and tuples work; lists and dicts do not
  • Sets are mutable — you can add and remove items after creation
  • A frozenset is the immutable version — hashable itself, usable as a dict key
# Creating sets and observing uniqueness

tags = {"python", "data", "python", "code", "data"}
print(tags)           # duplicates removed automatically

nums = set([1, 2, 2, 3, 3, 3])   # set from a list
print(nums)

empty = set()         # correct way — {} creates an empty dict, not a set
print(type(empty))
{'python', 'data', 'code'}
{1, 2, 3}
<class 'set'>

Adding and Removing Elements

Sets are mutable, so you can grow and shrink them after creation. Python provides several methods depending on whether you want to raise an error or fail silently when the item is not present.

# Mutating a set — add, remove, discard, pop, clear

fruits = {"apple", "banana", "cherry"}

fruits.add("mango")        # add one item (ignored if already present)
print(fruits)

fruits.remove("banana")    # remove — raises KeyError if not found
print(fruits)

fruits.discard("grape")    # discard — silent if not found (no error)
print(fruits)

item = fruits.pop()        # remove and return an arbitrary item
print("Popped:", item)

fruits.clear()             # remove all items
print(fruits)
{'apple', 'banana', 'cherry', 'mango'}
{'apple', 'cherry', 'mango'}
{'apple', 'cherry', 'mango'}
Popped: apple
set()
  • Use add() for a single item, update() to add multiple items from any iterable
  • Prefer discard() over remove() when you are not certain the item exists
  • pop() removes an arbitrary element — sets have no guaranteed order

1. Union — Combining Two Sets

A union produces a new set containing every element that appears in either set, with no duplicates. This answers the question: "give me everything from both."

Real-world use: merging the email subscriber lists from two different campaigns into one master list with no repeated addresses.

# Union — all elements from both sets

a = {"python", "sql", "excel"}
b = {"python", "tableau", "power bi"}

# Method syntax
combined = a.union(b)
print(combined)

# Operator syntax — identical result
combined2 = a | b
print(combined2)
{'python', 'sql', 'excel', 'tableau', 'power bi'}
{'python', 'sql', 'excel', 'tableau', 'power bi'}
  • a.union(b) and a | b produce the same result — choose the style that reads more clearly
  • Neither original set is modified — a new set is always returned
  • You can chain multiple sets: a | b | c or a.union(b, c)

2. Intersection — What Two Sets Share

An intersection produces only the elements that exist in both sets. This answers: "what do these two groups have in common?"

Real-world use: finding which users have both a free account and a paid subscription — the overlap between two user ID sets.

# Intersection — only elements present in both sets

skills_needed  = {"python", "sql", "machine learning", "excel"}
skills_you_have = {"python", "excel", "power bi", "tableau"}

# What skills do you already have that the job needs?
match = skills_needed.intersection(skills_you_have)
print(match)

# Operator syntax
match2 = skills_needed & skills_you_have
print(match2)
{'python', 'excel'}
{'python', 'excel'}
  • a.intersection(b) and a & b are equivalent
  • If the sets share nothing, the result is an empty set set()
  • a.intersection_update(b) modifies a in place instead of returning a new set

3. Difference — What One Set Has That the Other Does Not

A difference returns elements in the first set that are not present in the second. Order matters here — a - b is not the same as b - a.

Real-world use: finding which required skills you are still missing — the skills the job needs that you do not yet have.

# Difference — elements in a but not in b

skills_needed   = {"python", "sql", "machine learning", "excel"}
skills_you_have = {"python", "excel", "power bi", "tableau"}

# What skills do you still need to learn?
gaps = skills_needed.difference(skills_you_have)
print("Skills to learn:", gaps)

# Operator syntax
gaps2 = skills_needed - skills_you_have
print(gaps2)

# Reversed — what skills you have that the job doesn't require
extras = skills_you_have - skills_needed
print("Extra skills:", extras)
Skills to learn: {'sql', 'machine learning'}
{'sql', 'machine learning'}
Extra skills: {'power bi', 'tableau'}
  • a - b gives what is in a but not b — direction matters
  • a.difference_update(b) removes the overlapping elements from a in place

4. Symmetric Difference — What They Do Not Share

The symmetric difference returns all elements that are in one set or the other, but not in both — the opposite of an intersection.

Real-world use: finding which products were sold in one store but not the other — items that are unique to each location.

# Symmetric difference — in one or the other, but not both

store_a = {"apple", "banana", "cherry", "mango"}
store_b = {"banana", "mango", "grape", "peach"}

# Items exclusive to one store (not stocked by both)
unique = store_a.symmetric_difference(store_b)
print(unique)

# Operator syntax
unique2 = store_a ^ store_b
print(unique2)
{'apple', 'cherry', 'grape', 'peach'}
{'apple', 'cherry', 'grape', 'peach'}
  • a ^ b and a.symmetric_difference(b) are equivalent
  • The result is everything except the intersection — the inverse of &
  • Unlike difference, symmetric difference is commutative: a ^ b == b ^ a

Subset and Superset Checks

Python lets you test whether one set is entirely contained within another — or contains another entirely — using simple comparison methods or operators.

# Subset and superset testing

basics   = {"html", "css"}
frontend = {"html", "css", "javascript", "react"}

# Is basics a subset of frontend? (all of basics inside frontend?)
print(basics.issubset(frontend))     # True
print(basics <= frontend)            # True — operator form

# Is frontend a superset of basics? (frontend contains all of basics?)
print(frontend.issuperset(basics))   # True
print(frontend >= basics)            # True

# Proper subset — subset but not equal
print(basics < frontend)             # True  (basics != frontend)
print(frontend < frontend)           # False (a set is not a proper subset of itself)
True
True
True
True
True
False
  • a <= b means a is a subset of b (a could equal b)
  • a < b means a is a proper subset of b (a is inside b but not equal)
  • a.isdisjoint(b) returns True if the sets share no elements at all

Membership Testing — Why Sets Are Faster

One of the most important practical reasons to use a set over a list is membership testing speed. Checking whether a value is in a set takes constant time regardless of size. Checking a list requires scanning every element.

Real-world use: a spam filter holds millions of known-spam domains in a set. Every incoming email domain is checked against it instantly, no matter how large the set grows.

# Membership testing — set vs list

blocked_domains_list = ["spam.com", "junk.net", "phish.io"]   # list
blocked_domains_set  = {"spam.com", "junk.net", "phish.io"}   # set

email = "user@spam.com"
domain = email.split("@")[1]   # extract domain part

# Both work — but set lookup is O(1), list lookup is O(n)
print(domain in blocked_domains_list)   # True
print(domain in blocked_domains_set)    # True — much faster at scale
True
True
  • List in check: O(n) — gets slower as the list grows
  • Set in check: O(1) — constant time regardless of size, thanks to hash tables
  • When you only need to answer "is this value present?" and never need ordering or indexing, a set is almost always the better data structure

Set Comprehensions

Just like list and dictionary comprehensions, Python supports set comprehensions — a concise way to build a set from any iterable, with an optional filter condition. The syntax is identical to a list comprehension but uses curly braces.

# Set comprehension — unique squares, unique first letters

nums = [1, 2, 2, 3, 3, 3, 4]

# Build a set of squares — duplicates handled automatically
unique_squares = {n ** 2 for n in nums}
print(unique_squares)

# Extract unique first letters from a list of words
words = ["apple", "avocado", "banana", "blueberry", "cherry"]
first_letters = {w[0] for w in words}
print(first_letters)
{1, 4, 9, 16}
{'a', 'b', 'c'}
  • Structure: {expression for item in iterable} — same as list comprehension but with {}
  • Duplicate results in the expression are automatically collapsed — no extra deduplication needed
  • Add a filter: {x for x in data if x > 0} keeps only positive values

frozenset — The Immutable Set

A frozenset is a set that cannot be changed after creation. Because it is immutable and hashable, it can be used as a dictionary key or stored inside another set — something a regular set cannot do.

# frozenset — immutable, hashable set

roles_admin = frozenset({"read", "write", "delete"})
roles_guest = frozenset({"read"})

# Use frozensets as dictionary keys
permissions = {
    roles_admin: "full access",
    roles_guest: "read only"
}

print(permissions[roles_admin])
print(permissions[roles_guest])

# frozensets support all read operations but not mutation
print("write" in roles_admin)   # True
# roles_admin.add("execute")    # would raise AttributeError
full access
read only
True
  • frozenset() accepts any iterable as its argument
  • Supports all read operations: in, len(), iteration, and all set math operators
  • Cannot use add(), remove(), or any mutating method
  • Useful when you need a set to be a stable, hashable key in a dictionary

Summary Table

Operation Method Operator Returns
Union a.union(b) a | b All elements from both
Intersection a.intersection(b) a & b Only shared elements
Difference a.difference(b) a - b In a but not b
Symmetric Diff a.symmetric_difference(b) a ^ b In one but not both
Subset a.issubset(b) a <= b True if a inside b
Superset a.issuperset(b) a >= b True if a contains b
Disjoint a.isdisjoint(b) True if no overlap

Practice Questions

Practice 1. What operator is used for the union of two sets?



Practice 2. Which method removes an element from a set without raising an error if the element is not found?



Practice 3. What is the time complexity of membership testing in a set?



Practice 4. What operator returns all elements that are in one set or the other, but not in both?



Practice 5. What is the immutable version of a set called?



Quiz

Quiz 1. What does {1, 2, 3} & {2, 3, 4} return?






Quiz 2. What does {1, 2, 3} - {2, 3, 4} return?






Quiz 3. Which of the following correctly creates an empty set?






Quiz 4. What does {1, 2}.issubset({1, 2, 3}) return?






Quiz 5. Why can a frozenset be used as a dictionary key but a regular set cannot?






Next up — Regular Expressions teaches you how to search, match, and extract text patterns using Python's re module, one of the most powerful tools for working with real-world string data.