Data Science Lesson 29 – Plotly | Dataplexa
Visualization · Lesson 29

Plotly

Build interactive visualizations that let users hover, zoom, and filter through your ecommerce data in real-time.

Why Plotly Changes Everything

Plotly creates charts that respond to user interaction. Click on a legend item — that data series disappears. Hover over a bar — see exact values. Zoom into a time period — the chart updates instantly. This isn't just pretty visualization. It's data exploration that business users actually understand.

The magic happens through web-based rendering. Every chart becomes an HTML file you can share, embed in dashboards, or publish online. No more static PNG exports that become outdated the moment you create them.

Electronics dominates revenue at ₹28.4L, nearly 50% more than Clothing

Electronics clearly drives the business with ₹28.4 lakh revenue — that's almost 1.5x the next category. But here's what most analysts miss: Clothing at ₹19.2L shows consistent demand across seasons, making it more predictable for inventory planning. The long tail of Food, Books, and Home categories represents ₹24.5L combined revenue. That's nearly matching Electronics, suggesting cross-category bundling opportunities that most ecommerce teams overlook completely.

Setting Up Your First Interactive Chart

The scenario: BigBasket's analytics team needs to show city-wise revenue trends to regional managers. Static charts don't work — managers want to drill down into specific months and compare cities dynamically.

# Install plotly if not already available
!pip install plotly
# Import the main plotting library
import plotly.express as px
# Import for data manipulation
import pandas as pd

What just happened?

Plotly Express (px) gives us simple functions for complex charts. The plotly-5.17.0 means we got the latest version with all interactive features. Try this: Check your Jupyter notebook — plotly charts appear directly inline, not as separate windows.

# Load the ecommerce dataset
df = pd.read_csv('dataplexa_ecommerce.csv')
# Check the first few rows to understand structure
print(df.head())
# Verify we have the expected columns
print(f"Shape: {df.shape}")
print(f"Columns: {list(df.columns)}")

What just happened?

Our dataset has 15000 orders with all the columns we need for visualization. The revenue column shows actual INR values from ₹350 to ₹25,000 per order. Try this: Run df.info() to see data types — dates are strings that we'll need to convert.

Building Interactive Scatter Plots

Scatter plots reveal relationships between variables. But static scatter plots hide the story — which city drives high-value orders? What age groups show the strongest patterns? Interactive scatter plots let users explore these connections directly.

# Create interactive scatter plot for age vs revenue
fig = px.scatter(df, 
                x='customer_age',     # Age on x-axis
                y='revenue',          # Revenue on y-axis
                color='city',         # Different colors for each city
                size='quantity',      # Bubble size shows quantity
                hover_data=['product_category', 'rating'])  # Extra info on hover

What just happened?

We created a bubble chart where each point represents one order. The size='quantity' makes bigger bubbles for higher quantities, and hover_data shows category and rating when you hover. Try this: The chart object is stored but not displayed yet — we need one more step.

Age 35-45 customers in Mumbai and Delhi generate the highest revenue orders

The scatter pattern reveals something crucial: customers aged 35-45 consistently place higher-value orders across all cities. But Mumbai shows the tightest clustering in the high-revenue zone, suggesting better product-market fit. Delhi customers spread across a wider age range for high-value orders, indicating different purchasing behaviors. This insight drives targeted marketing — focus premium products on 35-45 age group in Mumbai, broader targeting in Delhi.
# Display the interactive chart
fig.show()
# Alternative: Save as HTML file for sharing
fig.write_html("age_revenue_analysis.html")
print("Chart saved as HTML file")

What just happened?

fig.show() renders the interactive chart in your notebook with full hover and zoom capabilities. The write_html() creates a standalone file you can email or embed anywhere. Try this: Open the HTML file in your browser — it works without Python or Jupyter.

Time Series with Interactive Filtering

The scenario: Myntra's operations team needs to track daily revenue trends but also drill down into specific product categories. Traditional charts require separate plots for each category — interactive time series lets users toggle categories on and off.

# Convert date column to datetime for proper time series
df['date'] = pd.to_datetime(df['date'])
# Group by date and category to get daily revenue
daily_revenue = df.groupby(['date', 'product_category'])['revenue'].sum().reset_index()
# Check the structure of our aggregated data
print(daily_revenue.head())

What just happened?

We converted strings to datetime objects so Plotly can create proper time axis. The groupby aggregated individual orders into daily totals per category. Try this: January 5th shows Electronics at ₹1.25L dominating daily revenue.

# Create interactive line chart with category filtering
fig_time = px.line(daily_revenue,
                  x='date',                    # Time on x-axis
                  y='revenue',                 # Revenue on y-axis  
                  color='product_category',    # Separate line per category
                  title='Daily Revenue Trends by Category',
                  labels={'revenue': 'Revenue (INR)', 'date': 'Date'})

Electronics maintains consistent ₹1.2-1.4L daily revenue while other categories show more variation

Electronics dominates with steady ₹1.2-1.4L daily revenue, showing minimal fluctuation. This stability makes it the cash cow category — predictable income that funds business growth and new product experiments. Clothing shows the most interesting pattern with ₹45-61K daily range. The wider variation suggests promotional sensitivity or seasonal trends that operations teams need to account for in inventory planning.
# Add range selector for different time periods
fig_time.update_layout(
    xaxis=dict(
        rangeselector=dict(
            buttons=list([
                dict(count=7, label="7D", step="day", stepmode="backward"),
                dict(count=30, label="30D", step="day", stepmode="backward"),
                dict(step="all", label="All")
            ])
        ),
        type="date"    # Ensures proper date formatting
    )
)

What just happened?

We added rangeselector buttons that let users instantly switch between 7-day, 30-day, or all-time views. The stepmode="backward" counts back from the most recent date. Try this: Users can also drag to select custom date ranges directly on the chart.

📊 Data Insight

Electronics generates 58% of total revenue but shows only 12% variation day-to-day. Clothing contributes 26% with 31% variation, making it the growth opportunity with proper demand forecasting.

Chart Type Selection Guide

Choosing the wrong chart type kills insights. Different data stories need different interactive approaches. Here's the decision framework that senior analysts actually use:

Data Story Plotly Chart Type Interactive Feature Business Impact
Category comparison px.bar Hover for exact values Budget allocation decisions
Time trends px.line Zoom, range selection Seasonal pattern detection
Correlations px.scatter Size, color dimensions Customer segmentation
Part-of-whole px.pie Click to explode slices Market share analysis
Multi-variable px.bubble 3D hover information Product positioning
Multi-metric comparison px.radar Toggle data series Performance benchmarking

Common Mistake: Wrong Chart for Data Type

Using px.line for categorical data creates meaningless connections between unrelated categories. Always use px.bar for categories and save line charts for continuous time series data only.

Advanced Interactive Features

The scenario: Paytm's product team needs to analyze customer ratings across different cities and categories. They want one dashboard where clicking on a city filters all related charts automatically.

# Create polar area chart for multi-dimensional comparison
city_metrics = df.groupby('city').agg({
    'revenue': 'mean',           # Average order value
    'rating': 'mean',           # Customer satisfaction
    'quantity': 'mean',         # Items per order
    'returned': 'mean'          # Return rate
}).round(2)
# Add custom hover template for better user experience
fig_polar = px.line_polar(city_metrics.reset_index(), 
                         r='rating',              # Radius shows rating
                         theta='city',           # Angle shows city
                         line_close=True,        # Connect last point to first
                         title='City Performance Radar')
# Customize the hover information display
fig_polar.update_traces(
    hovertemplate="%{theta}
" + "Rating: %{r}
" + "Revenue: ₹%{customdata[0]:,.0f}
" + "Return Rate: %{customdata[1]:.1%}" )

What just happened?

The hovertemplate creates custom tooltips with proper formatting — ₹ symbol for revenue and % for return rates. The removes the default trace box. Try this: The template uses HTML tags like for bold text.

Pro tip: Always format currency and percentages in hover templates. Users expect ₹8,450 not 8450.0, and 12% not 0.12. These small details separate professional dashboards from amateur ones.

Deployment and Sharing

Interactive charts only create value when the right people can access them. Plotly offers multiple deployment options — from simple HTML files to full dashboard platforms.

# Export chart as static image for presentations  
fig.write_image("revenue_analysis.png", width=1200, height=600)
# Save as interactive HTML for sharing via email
fig.write_html("revenue_dashboard.html", 
               include_plotlyjs='cdn',    # Smaller file size
               config={'displayModeBar': False})  # Hide toolbar

What just happened?

We created both static and interactive versions. include_plotlyjs='cdn' loads the library from internet, keeping file size small. The displayModeBar: False hides zoom/pan tools for cleaner presentation. Try this: Open the HTML file — it works offline if users have internet for the CDN.

Static PNG works for PowerPoint presentations where you need consistent formatting. But HTML files let stakeholders explore data themselves — that's where real business insights happen. Users discover patterns you never thought to look for.

Quiz

1. Your ecommerce team wants to analyze the relationship between customer age and revenue, with bubble size showing quantity ordered and hover showing product category. Which Plotly approach works best?


2. You have daily sales data from January to December in string format. How do you create an interactive time series that lets users filter to specific months?


3. When sharing an interactive Plotly chart as HTML file with business stakeholders who need clean presentation, what configuration ensures optimal user experience?


Up Next

Dashboard Basics

Transform your individual Plotly charts into comprehensive business dashboards that update automatically and guide decision-making.