Python Lesson 13 – Sets | Dataplexa

Sets in Python

In this lesson, you will learn about Sets — a data structure that automatically removes duplicates and makes comparing collections of data incredibly easy. Sets are used in real programs for filtering unique data, finding common items between groups, and performing fast membership checks. By the end of this lesson, you will use sets with full confidence.


1. What is a Set?

A set is a collection of items where every item is unique — no duplicates are allowed. If you try to add the same item twice, Python simply ignores the second one. Sets are also unordered, which means Python does not guarantee that items will appear in the same order you added them.

Think of a set like a bag of unique tokens. You can throw tokens in and take them out, but you can never have two identical tokens in the same bag.

Why does a set exist? In real programs, you often need to work with collections where duplicates must be eliminated. Imagine collecting all unique visitors to a website, finding skills that two candidates share, or checking which items from one list are missing from another. Doing all of this with lists would require many loops. Sets do it in one clean step.

Where are sets used in real life?

  • Removing duplicate entries from a database or user list
  • Finding common tags between two articles or posts
  • Checking which students are enrolled in both courses
  • Fast "does this item exist?" checks in large collections
  • Comparing permissions between two user roles

2. Creating a Set

A set is created using curly braces {} — just like a dictionary, but without key-value pairs. You can also use the built-in set() function to create one from a list. If you pass in duplicate values, Python automatically keeps only one copy of each.

# Creating a set of programming languages
# Notice: "Python" appears twice — the duplicate will be removed
languages = {"Python", "Java", "Python", "C++", "Java", "Go"}

print("Languages set:", languages)
print("Total unique languages:", len(languages))

# Creating a set from a list using set()
# This is the most common way to remove duplicates from a list
scores_list = [85, 90, 85, 78, 90, 92, 78]
unique_scores = set(scores_list)

print("\nOriginal list:", scores_list)
print("After removing duplicates:", unique_scores)
Languages set: {'Go', 'Java', 'C++', 'Python'}
Total unique languages: 4

Original list: [85, 90, 85, 78, 90, 92, 78]
After removing duplicates: {85, 90, 92, 78}

What just happened?

  • Even though "Python" and "Java" were added twice, the set kept only one copy of each — that is the core behaviour of a set.
  • The output order may look different from the input order — sets are unordered, so Python decides the arrangement internally.
  • set(scores_list) is the fastest and cleanest way to eliminate duplicates from any list in Python.

3. Creating an Empty Set — Important Rule

This is a very common beginner mistake. If you write {} with nothing inside, Python creates an empty dictionary, not an empty set. To create an empty set, you must use set() with no arguments.

# This looks like an empty set — but it is NOT
wrong = {}
print("Type of {}:", type(wrong))        # Shows: dict

# This is the correct way to create an empty set
correct = set()
print("Type of set():", type(correct))   # Shows: set

# You can then add items to it
correct.add("Python")
correct.add("Data Science")
print("Set after adding items:", correct)
Type of {}: <class 'dict'>
Type of set(): <class 'set'>
Set after adding items: {'Python', 'Data Science'}

What just happened?

  • {} creates an empty dictionary — Python interprets it that way because curly braces are also used for dicts.
  • set() creates a true empty set — always use this when you want to start with an empty set and add items later.
  • This is one of those Python quirks that trips up even intermediate developers — memorise it now.

4. Adding and Removing Items

Sets are mutable — you can add and remove items after creation. However, since sets are unordered, you cannot access items by index. You just add or remove by value.

# A set of enrolled courses
courses = {"Math", "Science", "English"}
print("Starting courses:", courses)

# add() — adds a single item
courses.add("History")
print("After add:", courses)

# add() ignores duplicates — no error, no change
courses.add("Math")
print("After adding duplicate Math:", courses)

# remove() — removes an item (raises KeyError if item not found)
courses.remove("English")
print("After remove:", courses)

# discard() — removes an item safely (no error if item not found)
courses.discard("Art")       # "Art" is not in the set — no crash
print("After discard missing item:", courses)

# pop() — removes and returns a random item (order is unpredictable)
removed = courses.pop()
print("Randomly removed:", removed)
print("Set after pop:", courses)
Starting courses: {'Math', 'Science', 'English'}
After add: {'Math', 'Science', 'English', 'History'}
After adding duplicate Math: {'Math', 'Science', 'English', 'History'}
After remove: {'Math', 'Science', 'History'}
After discard missing item: {'Math', 'Science', 'History'}
Randomly removed: Math
Set after pop: {'Science', 'History'}

What just happened?

  • add("Math") was called a second time but the set did not change — duplicates are silently ignored.
  • remove() works but will crash with a KeyError if the item is not found. Use it only when you are sure the item exists.
  • discard() is the safer version — it does nothing if the item is missing. Use this in most situations.
  • pop() removes a random item because sets have no order. Do not rely on which item gets removed.

5. Membership Testing — Checking if an Item Exists

One of the biggest advantages of sets over lists is speed of membership testing. Checking if an item exists in a set is extremely fast — even with millions of items. This is because sets use a special internal structure called a hash table. With a list, Python has to check every single item one by one. With a set, it jumps directly to the answer.

# A set of allowed usernames
allowed_users = {"admin", "editor", "moderator", "reviewer"}

# Check if a username is in the set
username = "editor"
if username in allowed_users:
    print(f"'{username}' has access.")
else:
    print(f"'{username}' is not allowed.")

# Check a username that does not exist
guest = "guest"
if guest not in allowed_users:
    print(f"'{guest}' does not have access.")

# Compare: same check using a list (works but slower for large data)
allowed_list = ["admin", "editor", "moderator", "reviewer"]
print("\nList check:", "admin" in allowed_list)   # Works, but slower
print("Set check: ", "admin" in allowed_users)   # Faster
'editor' has access.
'guest' does not have access.

List check: True
Set check: True

What just happened?

  • in and not in work exactly the same as with lists — the syntax is identical.
  • The difference is performance. For small collections it does not matter, but for thousands or millions of items, set lookups are dramatically faster than list lookups.
  • Whenever you have a large collection and only need to check "is this item in here?", always prefer a set over a list.

6. Set Operations — Union, Intersection, Difference

This is where sets become truly powerful. Python lets you compare two sets using mathematical set operations — the same ones you may have studied in school. These operations let you find what is common between two groups, what is unique to one group, or everything combined — all in a single line of code.

Imagine you have two groups of students — one enrolled in a Python course, one in a Data Science course. You want to know who is in both, who is in only one, or who is in either. Set operations answer all of these questions instantly.

# Students enrolled in two different courses
python_course   = {"Amit", "Priya", "Kiran", "Sneha", "Rahul"}
data_sci_course = {"Priya", "Sneha", "Arjun", "Meera", "Rahul"}

# UNION — all students in EITHER course (no duplicates)
all_students = python_course | data_sci_course
print("Union (all students):", all_students)

# INTERSECTION — students enrolled in BOTH courses
both_courses = python_course & data_sci_course
print("Intersection (in both):", both_courses)

# DIFFERENCE — students in Python course but NOT in Data Science
only_python = python_course - data_sci_course
print("Difference (only Python):", only_python)

# SYMMETRIC DIFFERENCE — students in ONE course but NOT both
either_not_both = python_course ^ data_sci_course
print("Symmetric diff (one only):", either_not_both)
Union (all students): {'Amit', 'Priya', 'Kiran', 'Sneha', 'Rahul', 'Arjun', 'Meera'}
Intersection (in both): {'Priya', 'Sneha', 'Rahul'}
Difference (only Python): {'Amit', 'Kiran'}
Symmetric diff (one only): {'Amit', 'Kiran', 'Arjun', 'Meera'}

What just happened?

  • Union | — combines both sets and removes any duplicates. All 7 unique students appear once.
  • Intersection & — only keeps items that appear in both sets. Priya, Sneha, and Rahul are in both courses.
  • Difference - — keeps items from the first set that are not in the second set. Amit and Kiran are only in the Python course.
  • Symmetric Difference ^ — keeps items that are in one set or the other, but not in both. It is the opposite of intersection.

7. Set Methods for the Same Operations

Every set operation above can also be done using named methods instead of symbols. Some developers prefer the method style because the names are more descriptive and readable, especially for beginners.

# Same two groups as before
python_course   = {"Amit", "Priya", "Kiran", "Sneha"}
data_sci_course = {"Priya", "Sneha", "Arjun", "Meera"}

# Method versions — same results as the symbol operators
print("union()              :", python_course.union(data_sci_course))
print("intersection()       :", python_course.intersection(data_sci_course))
print("difference()         :", python_course.difference(data_sci_course))
print("symmetric_difference :", python_course.symmetric_difference(data_sci_course))
union() : {'Amit', 'Priya', 'Kiran', 'Sneha', 'Arjun', 'Meera'}
intersection() : {'Priya', 'Sneha'}
difference() : {'Amit', 'Kiran'}
symmetric_difference : {'Amit', 'Kiran', 'Arjun', 'Meera'}

What just happened?

  • The methods produce identical results to the symbols — | and .union() are the same thing, just different styles.
  • Use symbols (|, &, -, ^) for short, compact code. Use methods when you want code that is easier for others to read.
  • Both styles are correct — choose the one that feels more natural to you.

8. Subset and Superset

Sometimes you need to check if one set is completely contained inside another. For example, do all required skills for a job exist in a candidate's skill set? This is called a subset check. The reverse — checking if one set contains all items of another — is a superset check.

# Required skills for a job
required_skills  = {"Python", "SQL", "Git"}

# Candidate A has all the required skills and more
candidate_a = {"Python", "SQL", "Git", "Django", "Docker"}

# Candidate B is missing some required skills
candidate_b = {"Python", "HTML", "CSS"}

# issubset() — checks if required_skills is fully inside candidate's skills
print("Candidate A meets all requirements:", required_skills.issubset(candidate_a))
print("Candidate B meets all requirements:", required_skills.issubset(candidate_b))

# issuperset() — checks if a set contains all items of another set
print("Candidate A is a superset:", candidate_a.issuperset(required_skills))

# isdisjoint() — checks if two sets share NO common items at all
set_a = {"cat", "dog", "bird"}
set_b = {"fish", "snake", "frog"}
print("No common animals:", set_a.isdisjoint(set_b))
Candidate A meets all requirements: True
Candidate B meets all requirements: False
Candidate A is a superset: True
No common animals: True

What just happened?

  • issubset() returns True if every item in the calling set exists inside the other set. Candidate A has all required skills, so it returns True. Candidate B is missing SQL and Git, so it returns False.
  • issuperset() is the reverse — it checks if the calling set contains all items of the other set.
  • isdisjoint() returns True when two sets share absolutely no items in common — they are completely separate.

9. Updating a Set In-Place

The operations you learned so far all return new sets without changing the originals. But sometimes you want to modify an existing set directly. Python provides update versions of each operation that change the set in-place instead of creating a new one.

# Starting skills of a developer
my_skills = {"Python", "SQL", "Git"}
print("Before:", my_skills)

# update() — adds all items from another set INTO this set (in-place union)
new_skills = {"Docker", "Django", "Python"}   # Python already exists — no duplicate
my_skills.update(new_skills)
print("After update():", my_skills)

# intersection_update() — keeps ONLY items that exist in both sets (in-place)
team_skills = {"Python", "SQL", "Docker"}
my_skills.intersection_update(team_skills)
print("After intersection_update():", my_skills)

# difference_update() — removes all items found in another set (in-place)
to_remove = {"SQL"}
my_skills.difference_update(to_remove)
print("After difference_update():", my_skills)
Before: {'Python', 'SQL', 'Git'}
After update(): {'Python', 'SQL', 'Git', 'Docker', 'Django'}
After intersection_update(): {'Python', 'SQL', 'Docker'}
After difference_update(): {'Python', 'Docker'}

What just happened?

  • update() adds all items from the new set directly into my_skills — it modifies the original instead of creating a new one.
  • intersection_update() shrinks my_skills to keep only items that also exist in team_skills.
  • difference_update() removes any items from my_skills that are found in the given set. SQL was removed because it was in to_remove.
  • Use the in-place versions when you do not need the original set anymore and want to save memory.

10. Looping Through a Set

You can loop through a set using a for loop exactly like a list. Just remember — the order is not guaranteed, so do not write code that depends on items appearing in a specific sequence.

# A set of unique website visitors
visitors = {"user_101", "user_205", "user_307", "user_412"}

# Simple loop through each item
print("Today's unique visitors:")
for visitor in visitors:
    print(" -", visitor)

# Practical use: check each item during a loop
blocked_users = {"user_205", "user_999"}

print("\nAccess check:")
for visitor in visitors:
    if visitor in blocked_users:
        print(f"  {visitor} → BLOCKED")
    else:
        print(f"  {visitor} → Allowed")
Today's unique visitors:
 - user_101
 - user_205
 - user_307
 - user_412

Access check:
  user_101 → Allowed
  user_205 → BLOCKED
  user_307 → Allowed
  user_412 → Allowed

What just happened?

  • The for loop visits every item in the set — the syntax is identical to looping through a list.
  • The second loop combines iteration with membership testing — checking each visitor against a blocked list. This is a very common real-world pattern for access control systems.
  • Your output order may differ slightly — sets are unordered, so the loop visits items in Python's internal arrangement.

11. Converting Between Set, List, and Tuple

You can convert freely between sets, lists, and tuples. The most common use case is converting a list to a set to remove duplicates, then converting back to a list if you need indexing or ordering again.

# A list with many duplicate entries
raw_tags = ["python", "coding", "python", "tutorial", "coding", "beginner"]
print("Original list:", raw_tags)
print("Length:", len(raw_tags))

# Convert to set → duplicates are removed automatically
unique_tags = set(raw_tags)
print("\nAfter set conversion:", unique_tags)
print("Unique count:", len(unique_tags))

# Convert back to list if you need to sort or index
sorted_tags = sorted(list(unique_tags))    # sorted() returns a sorted list
print("Sorted unique tags:", sorted_tags)

# Convert to tuple if you want the result to be immutable
frozen_tags = tuple(unique_tags)
print("As tuple:", frozen_tags)
Original list: ['python', 'coding', 'python', 'tutorial', 'coding', 'beginner']
Length: 6

After set conversion: {'coding', 'python', 'tutorial', 'beginner'}
Unique count: 4
Sorted unique tags: ['beginner', 'coding', 'python', 'tutorial']
As tuple: ('coding', 'python', 'tutorial', 'beginner')

What just happened?

  • The list had 6 items with 2 pairs of duplicates. After converting to a set, only 4 unique items remained.
  • sorted(list(unique_tags)) — first converts the set to a list, then sorts it alphabetically. This is the standard pattern when you need a sorted, duplicate-free list.
  • Converting to a tuple at the end creates an immutable, duplicate-free, fixed collection — useful when you want to protect the data from being changed.

12. Frozenset — An Immutable Set

A frozenset is exactly like a set, but it is immutable — you cannot add or remove items after creation. Because it is immutable, a frozenset is hashable and can be used as a dictionary key or stored inside another set — something a regular set cannot do.

# Creating a frozenset
permissions = frozenset({"read", "write", "execute"})
print("Frozenset:", permissions)
print("Type:", type(permissions))

# Membership testing works just like a regular set
print("Has read access:", "read" in permissions)

# Trying to add or remove will cause an error
try:
    permissions.add("delete")
except AttributeError as e:
    print("Error:", e)

# Practical use: frozenset as a dictionary key
# (regular sets cannot be dictionary keys — they are unhashable)
role_permissions = {
    frozenset({"read"})               : "viewer",
    frozenset({"read", "write"})      : "editor",
    frozenset({"read", "write", "execute"}): "admin"
}

my_access = frozenset({"read", "write"})
print("\nYour role:", role_permissions[my_access])
Frozenset: frozenset({'read', 'write', 'execute'})
Type: <class 'frozenset'>
Has read access: True
Error: 'frozenset' object has no attribute 'add'

Your role: editor

What just happened?

  • A frozenset looks and behaves like a set for reading and membership testing.
  • Calling .add() raises an AttributeError because frozensets have no methods to modify them.
  • The dictionary example shows a real-world use — mapping a fixed set of permissions to a role. Since the permission sets are frozen (will never change), they can safely be used as dictionary keys.

13. Real-World Example — Finding Common and Exclusive Items

Here is a complete practical example that puts multiple set operations together — the kind of code you would write in a real application to analyse and compare data between two groups.

# Products available in two different warehouses
warehouse_a = {"Laptop", "Mouse", "Keyboard", "Monitor", "Webcam"}
warehouse_b = {"Mouse", "Keyboard", "Headphones", "Webcam", "Desk Lamp"}

# Products available in BOTH warehouses
common = warehouse_a & warehouse_b
print("Available in both  :", common)

# Products available ONLY in Warehouse A (not in B)
only_in_a = warehouse_a - warehouse_b
print("Only in Warehouse A:", only_in_a)

# Products available ONLY in Warehouse B (not in A)
only_in_b = warehouse_b - warehouse_a
print("Only in Warehouse B:", only_in_b)

# ALL unique products across both warehouses
all_products = warehouse_a | warehouse_b
print("Total unique items  :", all_products)
print("Total product count :", len(all_products))
Available in both : {'Mouse', 'Keyboard', 'Webcam'}
Only in Warehouse A: {'Laptop', 'Monitor'}
Only in Warehouse B: {'Headphones', 'Desk Lamp'}
Total unique items : {'Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam', 'Headphones', 'Desk Lamp'}
Total product count : 7

What just happened?

  • Four real business questions were answered in four lines using set operations — no loops, no conditional checks.
  • This exact pattern is used in inventory management systems, e-commerce stock comparison tools, and data reconciliation pipelines.
  • Notice how clean and readable the code is — the symbols &, -, and | communicate the intent instantly.

14. Lesson Summary

Here is your complete reference for everything covered in Lesson 13. Use this as a cheat sheet while practising.

Concept Syntax / Example What It Does
Create a set s = {1, 2, 3} Unordered, unique items only
Empty set s = set() Must use set() — not {}
From list set([1,2,2,3]) Removes duplicates automatically
Add item s.add("x") Adds one item; ignores duplicates
Remove (strict) s.remove("x") Removes item; KeyError if missing
Remove (safe) s.discard("x") Removes item; no error if missing
Membership "x" in s Fast True/False check
Union a | b All items from both sets
Intersection a & b Items in BOTH sets
Difference a - b Items in a but NOT in b
Symmetric diff a ^ b Items in one but NOT both
Subset check a.issubset(b) True if all of a is inside b
Superset check a.issuperset(b) True if a contains all of b
Disjoint check a.isdisjoint(b) True if no items are shared
In-place merge a.update(b) Adds b into a directly
Frozenset frozenset({1,2,3}) Immutable set; usable as dict key
Set to list list(s) Converts set to a list

🧪 Practice Questions

Answer based on what you learned in this lesson.

1. What is the correct way to create an empty set in Python?




2. Which method removes an item from a set WITHOUT raising an error if the item is not found?




3. Which operator is used to find items that exist in BOTH sets (intersection)?




4. What type of set is immutable and can be used as a dictionary key?




5. Sets are __________, meaning you cannot rely on items appearing in a specific sequence.



🎯 Quiz — Test Your Understanding

Q1. What is the output of set([1, 2, 2, 3, 3, 3])?







Q2. What error does s.remove("x") raise if "x" is not in the set?







Q3. What is the result of {1, 2, 3} & {2, 3, 4}?







Q4. What does {1, 2}.issubset({1, 2, 3, 4}) return?







Q5. What is the result of {1, 2, 3} ^ {2, 3, 4}?