Algorithms Lesson 43 – Collision Handling | Dataplexa

Collision Handling

In the previous lesson, we introduced Hash Tables and understood how hash functions allow near constant-time insertion and search.

However, we also discovered an unavoidable problem — collisions.

In this lesson, we will deeply understand why collisions happen and how professional systems handle them efficiently.


Why Collisions Are Inevitable

A hash table has a limited number of slots.

But the number of possible keys can be extremely large.

This means two different keys can produce the same hash value, even with a good hash function.

This situation is called a collision.


What Happens If We Ignore Collisions?

If collisions are not handled properly:

One value may overwrite another, or search operations may return incorrect results.

In real systems, this would be disastrous.


Main Collision Handling Techniques

There are two major families of collision resolution techniques:

  • Chaining
  • Open Addressing

Let us understand each one clearly.


Chaining (Separate Chaining)

In chaining, each index of the hash table stores a list (or linked list) of values.

When multiple keys hash to the same index, they are stored together in that list.

table_size = 10
hash_table = [[] for _ in range(table_size)]

def insert(key, value):
    index = key % table_size
    hash_table[index].append((key, value))

This approach is simple and very reliable.


Searching with Chaining

def search(key):
    index = key % table_size
    for k, v in hash_table[index]:
        if k == key:
            return v
    return None

Search time depends on how many items are stored in each chain.


Time Complexity of Chaining

Average case:

O(1) when the table is well-distributed.

Worst case:

O(n) when all keys fall into one bucket.


Open Addressing

In open addressing, all values are stored directly inside the hash table.

If a collision occurs, we search for another empty slot using a predefined probing sequence.


Linear Probing

Linear probing checks the next available slot sequentially.

def insert_linear(key, value):
    index = key % table_size
    while hash_table[index] is not None:
        index = (index + 1) % table_size
    hash_table[index] = (key, value)

This method is simple but can cause clustering.


Primary Clustering Problem

When values cluster together, future collisions take longer to resolve.

This reduces performance significantly.


Quadratic Probing (Brief)

Instead of checking sequentially, quadratic probing jumps in increasing steps.

This reduces clustering but is more complex.


Chaining vs Open Addressing

Chaining:

  • Easier to implement
  • Handles high load factors well

Open Addressing:

  • Uses memory efficiently
  • Performance degrades faster with load

Real-World Systems

Most modern languages use chaining or hybrid approaches:

  • Python dictionaries
  • Java HashMap
  • Database hash indexes

Exercises

Exercise 1:
Why does chaining avoid overwriting data?

Because multiple values are stored as a list at the same index.

Exercise 2:
What is the main drawback of linear probing?

Primary clustering that slows down operations.

Exercise 3:
Which approach handles high load factor better?

Chaining.

Quick Quiz

Q1. What causes collisions?

Different keys mapping to the same index.

Q2. Which collision method stores lists?

Chaining.

In the next lesson, we will explore hashing applications and see how hash tables are used in real systems.