Data Science
Matplotlib
Master Python's foundational plotting library and build publication-ready charts from scratch
Matplotlib is the granddaddy of Python visualization. Every other plotting library either builds on it or competes with it. Think of it as the Excel charts of Python — powerful, flexible, but requiring more setup than newer alternatives.Honestly, Matplotlib gets a bad rap for being verbose. But that verbosity gives you control. Need to adjust the exact position of a legend? Change the color of specific data points? Matplotlib lets you tweak everything.
The library operates on a simple principle: figure and axes. A figure is your canvas. Axes are the individual plots on that canvas. Master this concept and everything else clicks.
Essential Chart Types
Matplotlib supports every chart type you'll need in business analysis. But some perform better than others in real-world scenarios. Here's what actually gets used:
Line Charts
Time series, trends, continuous data. Perfect for revenue over time.
Bar Charts
Categories, comparisons. Sales by city or product category.
Scatter Plots
Correlations, relationships. Price vs rating patterns.
Pie Charts
Proportions, market share. Use sparingly — bars often work better.
Common Mistake: Overcomplicating Simple Charts
New users add too many colors, labels, and decorations. Start simple. A clean bar chart beats a fancy mess every time.
Setting Up Your First Plot
The scenario: You're analyzing Myntra's sales data and need to show revenue by product category. Your manager wants it in 30 minutes for a board presentation.
# Import the plotting library
import matplotlib.pyplot as plt
# Import pandas to load our data
import pandas as pd
# Load the ecommerce dataset
df = pd.read_csv('dataplexa_ecommerce.csv')
Successfully imported matplotlib.pyplot and pandas Dataset loaded: 10,000 rows × 11 columns
What just happened?
We imported matplotlib.pyplot as plt — the standard alias everyone uses. The dataset contains our ecommerce transaction data with columns like revenue, product_category, and customer_age. Try this: Check df.columns to see all available fields.
Now we need to summarize revenue by product category. This is basic pandas work before we plot anything:
# Group by product category and sum the revenue
category_sales = df.groupby('product_category')['revenue'].sum()
# Convert to millions for easier reading
category_sales = category_sales / 1000000
# Check what we have
print(category_sales)
product_category Books 4.3 Clothing 19.2 Electronics 28.4 Food 8.7 Home 11.5 Name: revenue, dtype: float64
What just happened?
We grouped our 10,000 transactions by product_category and summed revenue for each. Electronics dominates at ₹28.4 million, followed by Clothing at ₹19.2 million. Books perform worst at ₹4.3 million. Try this: Use category_sales.sort_values(ascending=False) to see rankings clearly.
Creating Your First Bar Chart
Time to visualize this data. Bar charts work best for categorical comparisons — perfect for our product categories:
# Create a figure and axis
fig, ax = plt.subplots(figsize=(10, 6))
# Create the bar chart
ax.bar(category_sales.index, category_sales.values)
# Add title and labels
ax.set_title('Revenue by Product Category (₹ Millions)')
ax.set_xlabel('Product Category')
ax.set_ylabel('Revenue (Millions INR)')
[Bar chart displayed] Title: Revenue by Product Category (₹ Millions) 5 bars showing: Electronics (28.4), Clothing (19.2), Home (11.5), Food (8.7), Books (4.3)
Electronics and Clothing dominate revenue — focus inventory and marketing here
The chart immediately shows Electronics generating 40% of total revenue, with Clothing close behind. Books clearly need attention — either better marketing or consider discontinuing slow-moving titles.
This visualization supports a clear business decision: allocate more marketing budget to Electronics and Clothing, investigate why Books underperform, and consider expanding the Home category that shows solid middle performance.
What just happened?
We used plt.subplots() to create a figure and axis object. The ax.bar() function created bars using category names as x-axis and revenue values as heights. The figsize=(10, 6) made it wide enough to read category names clearly. Try this: Add plt.xticks(rotation=45) to rotate labels if they overlap.
Line Charts for Trends
Your Flipkart analytics team needs to track daily revenue trends over the past month. Line charts excel at showing how metrics change over time:
# Convert date column to datetime
df['date'] = pd.to_datetime(df['date'])
# Group by date and sum revenue
daily_sales = df.groupby('date')['revenue'].sum() / 1000000
# Get first 10 days for cleaner visualization
daily_sales = daily_sales.head(10)
date 2023-01-05 7.2 2023-01-06 8.1 2023-01-07 6.8 2023-01-08 9.3 2023-01-09 7.9 2023-01-10 8.7 2023-01-11 7.4 2023-01-12 8.9 2023-01-13 6.5 2023-01-14 9.1 Name: revenue, dtype: float64
# Create the line chart
plt.figure(figsize=(12, 6))
plt.plot(daily_sales.index, daily_sales.values, marker='o')
plt.title('Daily Revenue Trend')
plt.xlabel('Date')
plt.ylabel('Revenue (₹ Millions)')
plt.xticks(rotation=45)
[Line chart displayed] X-axis: Dates from 2023-01-05 to 2023-01-14 Y-axis: Revenue ranging from 6.5 to 9.3 million Line with circle markers showing daily fluctuations
Revenue fluctuates between ₹6.5-9.3M daily — identify patterns for better inventory planning
The line chart reveals revenue volatility — some days hit ₹9+ million while others drop to ₹6.5 million. Notice the pattern: January 8th, 12th, and 14th show peaks, suggesting certain days of the week perform better.
This trend analysis helps with staffing decisions and inventory management. If Mondays consistently underperform, adjust marketing campaigns or customer service hours accordingly.
Scatter Plots for Relationships
Swiggy wants to understand if higher prices correlate with better ratings. Scatter plots reveal relationships between two continuous variables:
# Take a sample for cleaner visualization
sample_data = df.sample(200, random_state=42)
# Create scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(sample_data['unit_price'], sample_data['rating'], alpha=0.6)
plt.title('Price vs Rating Relationship')
plt.xlabel('Unit Price (₹)')
plt.ylabel('Customer Rating')
[Scatter plot displayed] 200 points scattered across the chart X-axis: Unit prices from ₹500 to ₹200,000 Y-axis: Ratings from 1.0 to 5.0 Points show scattered relationship pattern
No clear price-rating correlation — focus on quality over pricing strategy
The scatter plot shows no strong correlation between price and rating. Products across all price ranges receive similar ratings between 3.5-4.5. This breaks the common assumption that expensive equals better quality.
For business strategy, this means competing on price alone won't improve customer satisfaction. Focus on product quality, customer service, and delivery experience instead of premium pricing.
📊 Data Insight
Products priced at ₹800-3000 maintain 4.2+ average ratings, suggesting this is the sweet spot for customer satisfaction without premium pricing pressure.
Customizing Your Charts
Basic plots work for exploration, but presentations need polish. Colors, fonts, and spacing matter when your CEO reviews quarterly results:
# Create professional-looking chart
plt.figure(figsize=(12, 7))
colors = ['#0f766e', '#1d4ed8', '#7c3aed', '#dc2626', '#d97706']
bars = plt.bar(category_sales.index, category_sales.values, color=colors)
# Add value labels on top of bars
for bar, value in zip(bars, category_sales.values):
plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5,
f'₹{value:.1f}M', ha='center', va='bottom', fontweight='bold')
[Enhanced bar chart displayed] Each bar now has a different color Value labels appear above each bar: ₹28.4M, ₹19.2M, ₹11.5M, ₹8.7M, ₹4.3M Professional color scheme applied
# Add professional styling
plt.title('Q1 2023 Revenue by Product Category', fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Product Category', fontsize=12, fontweight='bold')
plt.ylabel('Revenue (₹ Millions)', fontsize=12, fontweight='bold')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()
[Final polished chart displayed] Title: "Q1 2023 Revenue by Product Category" in bold Horizontal gridlines for easier value reading Tight layout removes excess whitespace Professional presentation-ready appearance
What just happened?
We added value labels using plt.text() positioned at the center of each bar. The colors list assigned different colors to each category. plt.grid(axis='y', alpha=0.3) added subtle horizontal lines for easier value reading. Try this: Use plt.style.use('seaborn') before plotting for an instant professional look.
Pro Tip: Save charts directly to files using plt.savefig('chart.png', dpi=300, bbox_inches='tight') before plt.show(). High DPI ensures crisp images for presentations.
Quiz
1. You're creating a revenue dashboard for Zomato. What's the key difference between using plt.bar() and ax.bar() in matplotlib?
2. Your Paytm analytics team wants to analyze if customer age correlates with transaction amount. Which matplotlib function should you use?
3. Your matplotlib bar chart shows product categories but the labels overlap and are unreadable. What's the best fix for presentation to HDFC Bank executives?
Up Next
Seaborn
Build statistical visualizations with less code using Seaborn's high-level interface that makes matplotlib charts beautiful by default.