Data Science Lesson 27 – Matplotlib | Dataplexa
Data Visualization · Lesson 27

Matplotlib

Master Python's foundational plotting library and build publication-ready charts from scratch

Matplotlib is the granddaddy of Python visualization. Every other plotting library either builds on it or competes with it. Think of it as the Excel charts of Python — powerful, flexible, but requiring more setup than newer alternatives.

Honestly, Matplotlib gets a bad rap for being verbose. But that verbosity gives you control. Need to adjust the exact position of a legend? Change the color of specific data points? Matplotlib lets you tweak everything.

The library operates on a simple principle: figure and axes. A figure is your canvas. Axes are the individual plots on that canvas. Master this concept and everything else clicks.

Essential Chart Types

Matplotlib supports every chart type you'll need in business analysis. But some perform better than others in real-world scenarios. Here's what actually gets used:

Line Charts

Time series, trends, continuous data. Perfect for revenue over time.

Bar Charts

Categories, comparisons. Sales by city or product category.

Scatter Plots

Correlations, relationships. Price vs rating patterns.

Pie Charts

Proportions, market share. Use sparingly — bars often work better.

Common Mistake: Overcomplicating Simple Charts

New users add too many colors, labels, and decorations. Start simple. A clean bar chart beats a fancy mess every time.

Setting Up Your First Plot

The scenario: You're analyzing Myntra's sales data and need to show revenue by product category. Your manager wants it in 30 minutes for a board presentation.

# Import the plotting library
import matplotlib.pyplot as plt
# Import pandas to load our data
import pandas as pd
# Load the ecommerce dataset
df = pd.read_csv('dataplexa_ecommerce.csv')

What just happened?

We imported matplotlib.pyplot as plt — the standard alias everyone uses. The dataset contains our ecommerce transaction data with columns like revenue, product_category, and customer_age. Try this: Check df.columns to see all available fields.

Now we need to summarize revenue by product category. This is basic pandas work before we plot anything:

# Group by product category and sum the revenue
category_sales = df.groupby('product_category')['revenue'].sum()
# Convert to millions for easier reading
category_sales = category_sales / 1000000
# Check what we have
print(category_sales)

What just happened?

We grouped our 10,000 transactions by product_category and summed revenue for each. Electronics dominates at ₹28.4 million, followed by Clothing at ₹19.2 million. Books perform worst at ₹4.3 million. Try this: Use category_sales.sort_values(ascending=False) to see rankings clearly.

Creating Your First Bar Chart

Time to visualize this data. Bar charts work best for categorical comparisons — perfect for our product categories:

# Create a figure and axis
fig, ax = plt.subplots(figsize=(10, 6))
# Create the bar chart
ax.bar(category_sales.index, category_sales.values)
# Add title and labels
ax.set_title('Revenue by Product Category (₹ Millions)')
ax.set_xlabel('Product Category')
ax.set_ylabel('Revenue (Millions INR)')

Electronics and Clothing dominate revenue — focus inventory and marketing here

The chart immediately shows Electronics generating 40% of total revenue, with Clothing close behind. Books clearly need attention — either better marketing or consider discontinuing slow-moving titles.

This visualization supports a clear business decision: allocate more marketing budget to Electronics and Clothing, investigate why Books underperform, and consider expanding the Home category that shows solid middle performance.

What just happened?

We used plt.subplots() to create a figure and axis object. The ax.bar() function created bars using category names as x-axis and revenue values as heights. The figsize=(10, 6) made it wide enough to read category names clearly. Try this: Add plt.xticks(rotation=45) to rotate labels if they overlap.

Line Charts for Trends

Your Flipkart analytics team needs to track daily revenue trends over the past month. Line charts excel at showing how metrics change over time:

# Convert date column to datetime
df['date'] = pd.to_datetime(df['date'])
# Group by date and sum revenue
daily_sales = df.groupby('date')['revenue'].sum() / 1000000
# Get first 10 days for cleaner visualization  
daily_sales = daily_sales.head(10)
# Create the line chart
plt.figure(figsize=(12, 6))
plt.plot(daily_sales.index, daily_sales.values, marker='o')
plt.title('Daily Revenue Trend')
plt.xlabel('Date')
plt.ylabel('Revenue (₹ Millions)')
plt.xticks(rotation=45)

Revenue fluctuates between ₹6.5-9.3M daily — identify patterns for better inventory planning

The line chart reveals revenue volatility — some days hit ₹9+ million while others drop to ₹6.5 million. Notice the pattern: January 8th, 12th, and 14th show peaks, suggesting certain days of the week perform better.

This trend analysis helps with staffing decisions and inventory management. If Mondays consistently underperform, adjust marketing campaigns or customer service hours accordingly.

Scatter Plots for Relationships

Swiggy wants to understand if higher prices correlate with better ratings. Scatter plots reveal relationships between two continuous variables:

# Take a sample for cleaner visualization
sample_data = df.sample(200, random_state=42)
# Create scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(sample_data['unit_price'], sample_data['rating'], alpha=0.6)
plt.title('Price vs Rating Relationship')
plt.xlabel('Unit Price (₹)')
plt.ylabel('Customer Rating')

No clear price-rating correlation — focus on quality over pricing strategy

The scatter plot shows no strong correlation between price and rating. Products across all price ranges receive similar ratings between 3.5-4.5. This breaks the common assumption that expensive equals better quality.

For business strategy, this means competing on price alone won't improve customer satisfaction. Focus on product quality, customer service, and delivery experience instead of premium pricing.

📊 Data Insight

Products priced at ₹800-3000 maintain 4.2+ average ratings, suggesting this is the sweet spot for customer satisfaction without premium pricing pressure.

Customizing Your Charts

Basic plots work for exploration, but presentations need polish. Colors, fonts, and spacing matter when your CEO reviews quarterly results:

# Create professional-looking chart
plt.figure(figsize=(12, 7))
colors = ['#0f766e', '#1d4ed8', '#7c3aed', '#dc2626', '#d97706']
bars = plt.bar(category_sales.index, category_sales.values, color=colors)
# Add value labels on top of bars
for bar, value in zip(bars, category_sales.values):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5,
             f'₹{value:.1f}M', ha='center', va='bottom', fontweight='bold')
# Add professional styling
plt.title('Q1 2023 Revenue by Product Category', fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Product Category', fontsize=12, fontweight='bold')
plt.ylabel('Revenue (₹ Millions)', fontsize=12, fontweight='bold')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

What just happened?

We added value labels using plt.text() positioned at the center of each bar. The colors list assigned different colors to each category. plt.grid(axis='y', alpha=0.3) added subtle horizontal lines for easier value reading. Try this: Use plt.style.use('seaborn') before plotting for an instant professional look.

Pro Tip: Save charts directly to files using plt.savefig('chart.png', dpi=300, bbox_inches='tight') before plt.show(). High DPI ensures crisp images for presentations.

Quiz

1. You're creating a revenue dashboard for Zomato. What's the key difference between using plt.bar() and ax.bar() in matplotlib?


2. Your Paytm analytics team wants to analyze if customer age correlates with transaction amount. Which matplotlib function should you use?


3. Your matplotlib bar chart shows product categories but the labels overlap and are unreadable. What's the best fix for presentation to HDFC Bank executives?


Up Next

Seaborn

Build statistical visualizations with less code using Seaborn's high-level interface that makes matplotlib charts beautiful by default.