Algorithms Lesson 35 – MST-Kruskal| Dataplexa

Minimum Spanning Tree – Kruskal’s Algorithm

In the previous lesson, we studied Prim’s algorithm, which builds a Minimum Spanning Tree by growing outward from a chosen starting node.

In this lesson, we explore another powerful approach — Kruskal’s Algorithm.

While Prim focuses on vertices, Kruskal focuses on edges. Understanding both gives you strong control over graph optimization problems.


Recap: What Is a Minimum Spanning Tree?

A Minimum Spanning Tree (MST) connects all vertices of a weighted, undirected graph such that:

  • No cycles are formed
  • All vertices are connected
  • Total edge weight is minimized

Multiple MSTs may exist, but all have the same minimum total cost.


The Core Idea of Kruskal’s Algorithm

Kruskal’s algorithm follows a simple greedy idea:

Always pick the cheapest edge available that does not create a cycle.

Instead of growing from a single node, Kruskal builds the MST by combining smaller components until the entire graph becomes connected.


How Kruskal’s Algorithm Works

The algorithm proceeds step by step:

  • Sort all edges by increasing weight
  • Pick the smallest edge
  • Add it if it does not form a cycle
  • Repeat until all vertices are connected

Cycle detection is the key challenge here.


Why Cycle Detection Is Important

Adding an edge that forms a cycle means adding unnecessary cost.

To efficiently detect cycles, Kruskal’s algorithm uses a data structure called Disjoint Set Union (Union-Find).


Union-Find (Disjoint Set) Concept

Union-Find helps answer two questions efficiently:

  • Are two vertices already connected?
  • If not, connect them

This allows Kruskal’s algorithm to remain fast even for large graphs.


Example Graph

We will use the same weighted graph as before:

edges = [
    ('A', 'B', 2),
    ('A', 'C', 3),
    ('B', 'C', 1),
    ('B', 'D', 1),
    ('C', 'D', 4)
]

Kruskal’s Algorithm – Implementation

Below is a clean implementation using Union-Find.

parent = {}

def find(node):
    if parent[node] != node:
        parent[node] = find(parent[node])
    return parent[node]

def union(a, b):
    rootA = find(a)
    rootB = find(b)
    if rootA != rootB:
        parent[rootB] = rootA

def kruskal(vertices, edges):
    for v in vertices:
        parent[v] = v

    mst_cost = 0
    edges.sort(key=lambda x: x[2])

    for u, v, weight in edges:
        if find(u) != find(v):
            union(u, v)
            mst_cost += weight

    return mst_cost

vertices = {'A', 'B', 'C', 'D'}
print(kruskal(vertices, edges))

This algorithm guarantees the minimum total edge weight.


How Kruskal Thinks

Kruskal’s algorithm does not care where the edge is located.

It only asks:

  • Is this the cheapest edge available?
  • Will adding it create a cycle?

If the answer to the second question is no, the edge is accepted.


Time Complexity

The dominant cost comes from sorting edges.

Time complexity:

O(E log E)

Union-Find operations are nearly constant time.


Kruskal vs Prim

Both algorithms solve the same problem, but their behavior differs.

Prim grows a single tree from a start node.

Kruskal builds multiple small trees and merges them gradually.

In sparse graphs, Kruskal is often preferred.


Real-World Interpretation

Imagine connecting cities with roads.

Kruskal’s approach:

Build the cheapest roads first, regardless of where they are, as long as they don’t create loops.

Eventually, all cities are connected with minimum total cost.


Exercises

Exercise 1:
Why must edges be sorted in Kruskal’s algorithm?

To ensure the algorithm always picks the lowest-cost edge first.

Exercise 2:
What happens if cycle detection is skipped?

The result may contain cycles and will not be a valid MST.

Exercise 3:
Which data structure helps Kruskal avoid cycles?

Union-Find (Disjoint Set Union).

Quick Quiz

Q1. Is Kruskal’s algorithm greedy?

Yes. It always chooses the cheapest valid edge.

Q2. Does Kruskal require a starting node?

No. It works globally on sorted edges.

In the next lesson, we will move from MSTs to shortest path algorithms and begin with the Bellman–Ford algorithm.