Working with Indexes in Pandas
In Pandas, an index is used to uniquely identify rows in a DataFrame. While it may look like just row numbers, the index plays a critical role in data selection, alignment, and performance.
In this lesson, you will learn how to view, set, reset, and work effectively with indexes.
Loading the Dataset
We continue using the same dataset used in previous lessons.
import pandas as pd
df = pd.read_csv("dataplexa_pandas_sales.csv")
Understanding the Default Index
By default, Pandas assigns an integer-based index starting from 0.
df.head()
The numbers on the left side of the DataFrame are the index values.
Viewing Index Information
You can inspect the index directly using:
df.index
This tells you the index type, range, and length.
Setting a Column as Index
Often, a column like an order ID or date is better suited as an index.
Example: Set order_id as the index.
df.set_index("order_id", inplace=True)
Now each row is uniquely identified by order_id.
Accessing Rows Using the Index
Once a column becomes the index, you can access rows directly.
df.loc[1005]
This retrieves the row where the index value equals 1005.
Resetting the Index
If you no longer want a custom index, you can reset it back to default.
df.reset_index(inplace=True)
The old index becomes a regular column again.
Dropping the Index While Resetting
Sometimes you don’t need the old index at all.
df.reset_index(drop=True, inplace=True)
This removes the index completely.
Renaming the Index
You can give the index a meaningful name.
df.index.name = "row_number"
Sorting by Index
Indexes can be sorted just like columns.
df.sort_index(inplace=True)
This is especially useful for time-series or ID-based data.
Why Indexes Matter
- Faster data access
- Cleaner row identification
- Better alignment during joins and merges
- Essential for time-series analysis
Common Index Mistakes
- Using non-unique values as index
- Forgetting to reset index after filtering
- Dropping important columns unintentionally
Practice Exercise
Using the dataset:
- Set an ID column as the index
- Access a row using
.loc - Reset the index
- Rename the index
What’s Next?
In the next lesson, you will learn how to detect and handle duplicate data to keep datasets clean and accurate.