Huffman Coding
In the previous lesson, we studied the Fractional Knapsack Problem and saw how greedy choices based on ratios lead to optimal solutions.
In this lesson, we explore another powerful greedy algorithm — Huffman Coding.
Huffman Coding is widely used in data compression and forms the foundation of formats like ZIP, JPEG, and MP3.
What Is Huffman Coding?
Huffman Coding is an algorithm used to compress data by assigning variable-length binary codes to characters.
The key idea is simple:
Characters that appear more frequently get shorter codes, while rare characters get longer codes.
This reduces the total number of bits needed to represent the data.
Why Compression Matters
Imagine sending a large text file over the internet.
If we can reduce the file size, we save:
• Storage space
• Transmission time
• Network bandwidth
Huffman Coding achieves this without losing any information.
Greedy Strategy Behind Huffman Coding
Huffman Coding uses a greedy approach based on frequency.
At each step:
We combine the two characters with the smallest frequencies.
This local greedy decision leads to a globally optimal prefix code.
Important Property: Prefix Codes
Huffman Codes are prefix-free.
This means:
No character’s code is a prefix of another character’s code.
Because of this property, the encoded data can be decoded uniquely without ambiguity.
Step-by-Step Construction
Let us walk through the process conceptually.
Step 1: Calculate the frequency of each character.
Step 2: Create a leaf node for each character and insert them into a min-heap based on frequency.
Step 3: Remove the two nodes with the smallest frequencies.
Step 4: Create a new internal node with frequency equal to their sum and insert it back into the heap.
Step 5: Repeat until only one node remains.
This final node becomes the root of the Huffman Tree.
Simple Example
Suppose we have the following characters:
A: 5
B: 9
C: 12
D: 13
E: 16
F: 45
Characters with smaller frequencies end up deeper in the tree, receiving longer binary codes.
Frequent characters stay near the root and get shorter codes.
Pseudocode Overview
create a min-heap of nodes (frequency, character)
while heap size > 1:
left = extract min
right = extract min
newNode.frequency = left.frequency + right.frequency
newNode.left = left
newNode.right = right
insert newNode into heap
return root of the tree
Once the tree is built, binary codes are generated by traversing left (0) and right (1).
Time Complexity
Let n be the number of characters.
Building the heap takes O(n).
Each extraction and insertion takes O(log n).
Overall time complexity:
O(n log n)
Why Greedy Works Here
By always combining the two least frequent characters, we ensure that the longest codes are assigned to the least frequent characters.
This minimizes the total encoded length, which is the optimal goal.
Mini Practice
Think about this:
- Why would assigning long codes to frequent characters be inefficient?
- How does prefix-free coding simplify decoding?
Exercises
Exercise 1:
What is the greedy choice in Huffman Coding?
Exercise 2:
Why must Huffman codes be prefix-free?
Exercise 3:
Is Huffman Coding lossy or lossless?
Quick Quiz
Q1. What data structure is used to build Huffman Trees?
Q2. What is the main advantage of Huffman Coding?
Huffman Coding shows how greedy strategies can be applied to real-world systems like file compression and media storage.
In the next lesson, we will explore graph-based greedy algorithms starting with Dijkstra’s Shortest Path Algorithm.