Python Course
Advanced Sets
You were introduced to sets in Lesson 13 — unordered collections of unique items. Now it is time to go deeper. Sets in Python are not just a way to remove duplicates. They come with a full suite of mathematical operations — unions, intersections, differences, and more — that let you compare and combine collections with remarkable speed and clarity.
This lesson covers every set operation you will encounter professionally, explains the performance advantages that make sets the right tool for membership testing, and shows you set comprehensions as a natural extension of what you already know.
Quick Recap — What Makes a Set Unique
Before diving into operations, a quick grounding in what sets are and are not:
- Sets store only unique values — duplicates are silently discarded
- Sets are unordered — you cannot index or slice them
- Set elements must be hashable — strings, numbers, and tuples work; lists and dicts do not
- Sets are mutable — you can add and remove items after creation
- A frozenset is the immutable version — hashable itself, usable as a dict key
# Creating sets and observing uniqueness
tags = {"python", "data", "python", "code", "data"}
print(tags) # duplicates removed automatically
nums = set([1, 2, 2, 3, 3, 3]) # set from a list
print(nums)
empty = set() # correct way — {} creates an empty dict, not a set
print(type(empty)){1, 2, 3}
<class 'set'>
Adding and Removing Elements
Sets are mutable, so you can grow and shrink them after creation. Python provides several methods depending on whether you want to raise an error or fail silently when the item is not present.
# Mutating a set — add, remove, discard, pop, clear
fruits = {"apple", "banana", "cherry"}
fruits.add("mango") # add one item (ignored if already present)
print(fruits)
fruits.remove("banana") # remove — raises KeyError if not found
print(fruits)
fruits.discard("grape") # discard — silent if not found (no error)
print(fruits)
item = fruits.pop() # remove and return an arbitrary item
print("Popped:", item)
fruits.clear() # remove all items
print(fruits){'apple', 'cherry', 'mango'}
{'apple', 'cherry', 'mango'}
Popped: apple
set()
- Use
add()for a single item,update()to add multiple items from any iterable - Prefer
discard()overremove()when you are not certain the item exists pop()removes an arbitrary element — sets have no guaranteed order
1. Union — Combining Two Sets
A union produces a new set containing every element that appears in either set, with no duplicates. This answers the question: "give me everything from both."
Real-world use: merging the email subscriber lists from two different campaigns into one master list with no repeated addresses.
# Union — all elements from both sets
a = {"python", "sql", "excel"}
b = {"python", "tableau", "power bi"}
# Method syntax
combined = a.union(b)
print(combined)
# Operator syntax — identical result
combined2 = a | b
print(combined2){'python', 'sql', 'excel', 'tableau', 'power bi'}
a.union(b)anda | bproduce the same result — choose the style that reads more clearly- Neither original set is modified — a new set is always returned
- You can chain multiple sets:
a | b | cora.union(b, c)
2. Intersection — What Two Sets Share
An intersection produces only the elements that exist in both sets. This answers: "what do these two groups have in common?"
Real-world use: finding which users have both a free account and a paid subscription — the overlap between two user ID sets.
# Intersection — only elements present in both sets
skills_needed = {"python", "sql", "machine learning", "excel"}
skills_you_have = {"python", "excel", "power bi", "tableau"}
# What skills do you already have that the job needs?
match = skills_needed.intersection(skills_you_have)
print(match)
# Operator syntax
match2 = skills_needed & skills_you_have
print(match2){'python', 'excel'}
a.intersection(b)anda & bare equivalent- If the sets share nothing, the result is an empty set
set() a.intersection_update(b)modifiesain place instead of returning a new set
3. Difference — What One Set Has That the Other Does Not
A difference returns elements in the first set that are not present in the second. Order matters here — a - b is not the same as b - a.
Real-world use: finding which required skills you are still missing — the skills the job needs that you do not yet have.
# Difference — elements in a but not in b
skills_needed = {"python", "sql", "machine learning", "excel"}
skills_you_have = {"python", "excel", "power bi", "tableau"}
# What skills do you still need to learn?
gaps = skills_needed.difference(skills_you_have)
print("Skills to learn:", gaps)
# Operator syntax
gaps2 = skills_needed - skills_you_have
print(gaps2)
# Reversed — what skills you have that the job doesn't require
extras = skills_you_have - skills_needed
print("Extra skills:", extras){'sql', 'machine learning'}
Extra skills: {'power bi', 'tableau'}
a - bgives what is inabut notb— direction mattersa.difference_update(b)removes the overlapping elements fromain place
4. Symmetric Difference — What They Do Not Share
The symmetric difference returns all elements that are in one set or the other, but not in both — the opposite of an intersection.
Real-world use: finding which products were sold in one store but not the other — items that are unique to each location.
# Symmetric difference — in one or the other, but not both
store_a = {"apple", "banana", "cherry", "mango"}
store_b = {"banana", "mango", "grape", "peach"}
# Items exclusive to one store (not stocked by both)
unique = store_a.symmetric_difference(store_b)
print(unique)
# Operator syntax
unique2 = store_a ^ store_b
print(unique2){'apple', 'cherry', 'grape', 'peach'}
a ^ banda.symmetric_difference(b)are equivalent- The result is everything except the intersection — the inverse of
& - Unlike difference, symmetric difference is commutative:
a ^ b == b ^ a
Subset and Superset Checks
Python lets you test whether one set is entirely contained within another — or contains another entirely — using simple comparison methods or operators.
# Subset and superset testing
basics = {"html", "css"}
frontend = {"html", "css", "javascript", "react"}
# Is basics a subset of frontend? (all of basics inside frontend?)
print(basics.issubset(frontend)) # True
print(basics <= frontend) # True — operator form
# Is frontend a superset of basics? (frontend contains all of basics?)
print(frontend.issuperset(basics)) # True
print(frontend >= basics) # True
# Proper subset — subset but not equal
print(basics < frontend) # True (basics != frontend)
print(frontend < frontend) # False (a set is not a proper subset of itself)True
True
True
True
False
a <= bmeans a is a subset of b (a could equal b)a < bmeans a is a proper subset of b (a is inside b but not equal)a.isdisjoint(b)returnsTrueif the sets share no elements at all
Membership Testing — Why Sets Are Faster
One of the most important practical reasons to use a set over a list is membership testing speed. Checking whether a value is in a set takes constant time regardless of size. Checking a list requires scanning every element.
Real-world use: a spam filter holds millions of known-spam domains in a set. Every incoming email domain is checked against it instantly, no matter how large the set grows.
# Membership testing — set vs list
blocked_domains_list = ["spam.com", "junk.net", "phish.io"] # list
blocked_domains_set = {"spam.com", "junk.net", "phish.io"} # set
email = "user@spam.com"
domain = email.split("@")[1] # extract domain part
# Both work — but set lookup is O(1), list lookup is O(n)
print(domain in blocked_domains_list) # True
print(domain in blocked_domains_set) # True — much faster at scaleTrue
- List
incheck: O(n) — gets slower as the list grows - Set
incheck: O(1) — constant time regardless of size, thanks to hash tables - When you only need to answer "is this value present?" and never need ordering or indexing, a set is almost always the better data structure
Set Comprehensions
Just like list and dictionary comprehensions, Python supports set comprehensions — a concise way to build a set from any iterable, with an optional filter condition. The syntax is identical to a list comprehension but uses curly braces.
# Set comprehension — unique squares, unique first letters
nums = [1, 2, 2, 3, 3, 3, 4]
# Build a set of squares — duplicates handled automatically
unique_squares = {n ** 2 for n in nums}
print(unique_squares)
# Extract unique first letters from a list of words
words = ["apple", "avocado", "banana", "blueberry", "cherry"]
first_letters = {w[0] for w in words}
print(first_letters){'a', 'b', 'c'}
- Structure:
{expression for item in iterable}— same as list comprehension but with{} - Duplicate results in the expression are automatically collapsed — no extra deduplication needed
- Add a filter:
{x for x in data if x > 0}keeps only positive values
frozenset — The Immutable Set
A frozenset is a set that cannot be changed after creation. Because it is immutable and hashable, it can be used as a dictionary key or stored inside another set — something a regular set cannot do.
# frozenset — immutable, hashable set
roles_admin = frozenset({"read", "write", "delete"})
roles_guest = frozenset({"read"})
# Use frozensets as dictionary keys
permissions = {
roles_admin: "full access",
roles_guest: "read only"
}
print(permissions[roles_admin])
print(permissions[roles_guest])
# frozensets support all read operations but not mutation
print("write" in roles_admin) # True
# roles_admin.add("execute") # would raise AttributeErrorread only
True
frozenset()accepts any iterable as its argument- Supports all read operations:
in,len(), iteration, and all set math operators - Cannot use
add(),remove(), or any mutating method - Useful when you need a set to be a stable, hashable key in a dictionary
Summary Table
| Operation | Method | Operator | Returns |
|---|---|---|---|
| Union | a.union(b) |
a | b |
All elements from both |
| Intersection | a.intersection(b) |
a & b |
Only shared elements |
| Difference | a.difference(b) |
a - b |
In a but not b |
| Symmetric Diff | a.symmetric_difference(b) |
a ^ b |
In one but not both |
| Subset | a.issubset(b) |
a <= b |
True if a inside b |
| Superset | a.issuperset(b) |
a >= b |
True if a contains b |
| Disjoint | a.isdisjoint(b) |
— | True if no overlap |
Practice Questions
Practice 1. What operator is used for the union of two sets?
Practice 2. Which method removes an element from a set without raising an error if the element is not found?
Practice 3. What is the time complexity of membership testing in a set?
Practice 4. What operator returns all elements that are in one set or the other, but not in both?
Practice 5. What is the immutable version of a set called?
Quiz
Quiz 1. What does {1, 2, 3} & {2, 3, 4} return?
Quiz 2. What does {1, 2, 3} - {2, 3, 4} return?
Quiz 3. Which of the following correctly creates an empty set?
Quiz 4. What does {1, 2}.issubset({1, 2, 3}) return?
Quiz 5. Why can a frozenset be used as a dictionary key but a regular set cannot?
Next up — Regular Expressions teaches you how to search, match, and extract text patterns using Python's re module, one of the most powerful tools for working with real-world string data.