Sorting and Ranking Data in Pandas
After cleaning the dataset, the next step is organizing the data in a meaningful order.
Sorting and ranking help you quickly identify top values, lowest values, trends, and priorities.
Why Sorting and Ranking Matter
Sorting and ranking allow you to:
- Find highest and lowest values
- Identify top-performing products or regions
- Prepare data for reporting and visualization
- Make comparisons easier
Loading the Dataset
We continue using the cleaned dataset from previous lessons.
import pandas as pd
df = pd.read_csv("dataplexa_pandas_sales.csv")
Sorting Data by One Column
To sort data by a single column,
use the sort_values() method.
Example: Sort rows by sales in ascending order.
df.sort_values(by="sales")
This places the lowest sales values at the top.
Sorting in Descending Order
To view the highest values first,
set ascending=False.
df.sort_values(by="sales", ascending=False)
This is commonly used to find top-performing records.
Sorting by Multiple Columns
Sometimes one column is not enough. You can sort using multiple columns.
Example: Sort by region first,
then by sales within each region.
df.sort_values(by=["region", "sales"], ascending=[True, False])
This groups regions alphabetically and sorts sales from highest to lowest inside each group.
Sorting Rows by Index
You can also sort rows using the DataFrame index.
df.sort_index()
Index sorting is useful after filtering or merging datasets.
Ranking Data
Ranking assigns a numerical rank to values, based on their size.
Example: Rank sales values from lowest to highest.
df["sales_rank"] = df["sales"].rank()
Each row receives a rank number.
Ranking in Descending Order
To rank highest values as number one:
df["sales_rank_desc"] = df["sales"].rank(ascending=False)
This is useful for leaderboards and performance reports.
Handling Ties in Ranking
When two values are equal, Pandas offers different ranking methods.
average– average rank (default)min– lowest rankmax– highest rankdense– no gaps in ranking
Example using dense ranking:
df["dense_rank"] = df["sales"].rank(method="dense", ascending=False)
Selecting Top Records
After sorting, you often want only the top rows.
Example: Top 5 sales records.
df.sort_values(by="sales", ascending=False).head(5)
This is commonly used in dashboards and reports.
Practice Exercise
Using the dataset:
- Sort sales in descending order
- Find the top 5 highest sales
- Create a ranking column for sales
- Use dense ranking to handle ties
What’s Next?
In the next lesson, you will learn how to rename columns to make datasets cleaner and more readable.