Pandas Lesson 26 – MultiIndex | Dataplexa

MultiIndex (Hierarchical Indexing) in Pandas

As datasets grow more complex, a single index is often not enough. In real-world data, information is usually grouped by multiple levels, such as year and month, country and city, or category and sub-category.

Pandas solves this using MultiIndex, also called hierarchical indexing.


What Is a MultiIndex?

A MultiIndex allows a DataFrame or Series to have more than one index level.

Think of it like an Excel pivot table, where rows are grouped by multiple fields.

Example scenarios:

  • Sales by year and month
  • Revenue by country and city
  • Scores by class and student

Creating a MultiIndex Using GroupBy

The most common way to create a MultiIndex is by grouping data using multiple columns.

Example: Group sales by region and product.

import pandas as pd

sales = pd.read_csv("dataplexa_pandas_sales.csv")

grouped = sales.groupby(["region", "product"])["sales_amount"].sum()

The result is a Series with two index levels:

  • Level 0 → region
  • Level 1 → product

Viewing MultiIndex Levels

You can inspect the levels of a MultiIndex.

grouped.index

This shows all index levels and their unique values.


Selecting Data from a MultiIndex

To access specific values, you can use tuples inside loc.

Example: Get sales for North region and Laptop product.

grouped.loc[("North", "Laptop")]

Selecting One Level Only

You can slice one level of the index while keeping the other.

Example: All products in the South region.

grouped.loc["South"]

Using Index Slicing

For advanced slicing, Pandas provides IndexSlice.

idx = pd.IndexSlice

grouped.loc[idx[:, "Phone"]]

This selects all regions where the product is Phone.


Converting MultiIndex to Columns

Sometimes MultiIndex is hard to work with. You can convert it back into columns.

grouped.reset_index()

This turns index levels back into regular columns.


Setting a MultiIndex Manually

You can also create a MultiIndex by setting multiple columns as index.

sales.set_index(["region", "product"], inplace=True)

Sorting a MultiIndex

Sorting improves performance and readability.

sales.sort_index(inplace=True)

Why MultiIndex Matters

MultiIndex allows you to:

  • Represent complex relationships cleanly
  • Perform advanced slicing and aggregation
  • Analyze grouped data efficiently

However, avoid MultiIndex when simple columns are enough. Clarity is always more important than cleverness.


Practice Exercise

Using the dataset:

  • Group sales by region and product
  • Extract data for one region
  • Reset the index back to columns
  • Sort the MultiIndex

What’s Next?

In the next lesson, you will learn Advanced Filtering, including complex conditions and logical operations.