MultiIndex (Hierarchical Indexing) in Pandas
As datasets grow more complex, a single index is often not enough. In real-world data, information is usually grouped by multiple levels, such as year and month, country and city, or category and sub-category.
Pandas solves this using MultiIndex, also called hierarchical indexing.
What Is a MultiIndex?
A MultiIndex allows a DataFrame or Series to have more than one index level.
Think of it like an Excel pivot table, where rows are grouped by multiple fields.
Example scenarios:
- Sales by year and month
- Revenue by country and city
- Scores by class and student
Creating a MultiIndex Using GroupBy
The most common way to create a MultiIndex is by grouping data using multiple columns.
Example: Group sales by region and product.
import pandas as pd
sales = pd.read_csv("dataplexa_pandas_sales.csv")
grouped = sales.groupby(["region", "product"])["sales_amount"].sum()
The result is a Series with two index levels:
- Level 0 → region
- Level 1 → product
Viewing MultiIndex Levels
You can inspect the levels of a MultiIndex.
grouped.index
This shows all index levels and their unique values.
Selecting Data from a MultiIndex
To access specific values,
you can use tuples inside loc.
Example: Get sales for North region and Laptop product.
grouped.loc[("North", "Laptop")]
Selecting One Level Only
You can slice one level of the index while keeping the other.
Example: All products in the South region.
grouped.loc["South"]
Using Index Slicing
For advanced slicing, Pandas provides IndexSlice.
idx = pd.IndexSlice
grouped.loc[idx[:, "Phone"]]
This selects all regions where the product is Phone.
Converting MultiIndex to Columns
Sometimes MultiIndex is hard to work with. You can convert it back into columns.
grouped.reset_index()
This turns index levels back into regular columns.
Setting a MultiIndex Manually
You can also create a MultiIndex by setting multiple columns as index.
sales.set_index(["region", "product"], inplace=True)
Sorting a MultiIndex
Sorting improves performance and readability.
sales.sort_index(inplace=True)
Why MultiIndex Matters
MultiIndex allows you to:
- Represent complex relationships cleanly
- Perform advanced slicing and aggregation
- Analyze grouped data efficiently
However, avoid MultiIndex when simple columns are enough. Clarity is always more important than cleverness.
Practice Exercise
Using the dataset:
- Group sales by region and product
- Extract data for one region
- Reset the index back to columns
- Sort the MultiIndex
What’s Next?
In the next lesson, you will learn Advanced Filtering, including complex conditions and logical operations.