Collision Handling
In the previous lesson, we introduced Hash Tables and understood how hash functions allow near constant-time insertion and search.
However, we also discovered an unavoidable problem — collisions.
In this lesson, we will deeply understand why collisions happen and how professional systems handle them efficiently.
Why Collisions Are Inevitable
A hash table has a limited number of slots.
But the number of possible keys can be extremely large.
This means two different keys can produce the same hash value, even with a good hash function.
This situation is called a collision.
What Happens If We Ignore Collisions?
If collisions are not handled properly:
One value may overwrite another, or search operations may return incorrect results.
In real systems, this would be disastrous.
Main Collision Handling Techniques
There are two major families of collision resolution techniques:
- Chaining
- Open Addressing
Let us understand each one clearly.
Chaining (Separate Chaining)
In chaining, each index of the hash table stores a list (or linked list) of values.
When multiple keys hash to the same index, they are stored together in that list.
table_size = 10
hash_table = [[] for _ in range(table_size)]
def insert(key, value):
index = key % table_size
hash_table[index].append((key, value))
This approach is simple and very reliable.
Searching with Chaining
def search(key):
index = key % table_size
for k, v in hash_table[index]:
if k == key:
return v
return None
Search time depends on how many items are stored in each chain.
Time Complexity of Chaining
Average case:
O(1) when the table is well-distributed.
Worst case:
O(n) when all keys fall into one bucket.
Open Addressing
In open addressing, all values are stored directly inside the hash table.
If a collision occurs, we search for another empty slot using a predefined probing sequence.
Linear Probing
Linear probing checks the next available slot sequentially.
def insert_linear(key, value):
index = key % table_size
while hash_table[index] is not None:
index = (index + 1) % table_size
hash_table[index] = (key, value)
This method is simple but can cause clustering.
Primary Clustering Problem
When values cluster together, future collisions take longer to resolve.
This reduces performance significantly.
Quadratic Probing (Brief)
Instead of checking sequentially, quadratic probing jumps in increasing steps.
This reduces clustering but is more complex.
Chaining vs Open Addressing
Chaining:
- Easier to implement
- Handles high load factors well
Open Addressing:
- Uses memory efficiently
- Performance degrades faster with load
Real-World Systems
Most modern languages use chaining or hybrid approaches:
- Python dictionaries
- Java HashMap
- Database hash indexes
Exercises
Exercise 1:
Why does chaining avoid overwriting data?
Exercise 2:
What is the main drawback of linear probing?
Exercise 3:
Which approach handles high load factor better?
Quick Quiz
Q1. What causes collisions?
Q2. Which collision method stores lists?
In the next lesson, we will explore hashing applications and see how hash tables are used in real systems.