Data Science Lesson 31 – Advanced Visualizations | Dataplexa
Data Science · Lesson 31

Advanced Visualizations

Build multi-dimensional charts that reveal hidden patterns in complex datasets using advanced Chart.js techniques.

Chart Selection Strategy

The difference between a data analyst and a data storyteller? Choosing the right chart for your data's dimension count. Most analysts stick to basic bar charts. But when you're analyzing customer behavior across age, location, and purchase patterns simultaneously, you need more sophisticated approaches.

1
Identify Data Dimensions
2
Match Chart Type to Variables
3
Configure Advanced Options
4
Test Business Impact
Chart Type Variables Best For Avoid When
Scatter 2 continuous Price vs Rating correlation Categories only
Bubble 3 continuous Age vs Revenue vs Quantity More than 50 points
Radar 5+ attributes Customer segment profiles Time series data
Stacked Bar Category + subcategory City-wise gender breakdown Too many subcategories

Three-Variable Bubble Analysis

Bubble charts solve the "I need three dimensions but only have two axes" problem. The scenario: Myntra's product team wants to understand customer age vs average order value vs purchase frequency to optimize their recommendation engine.

The scenario: Myntra's analytics team needs urgent insights on customer purchase patterns. The marketing head wants to see how customer age relates to spending power and purchase frequency simultaneously.
# Import pandas for data manipulation
import pandas as pd
# Read our ecommerce dataset
df = pd.read_csv('dataplexa_ecommerce.csv')
# Check the first few rows to understand structure
print(df.head())

What just happened?

We loaded our dataset with order_id, customer_age, and revenue columns. Notice the revenue ranges from INR 650 to INR 90,000. Try this: Check df.shape to see total rows.

The scenario: Now we need to aggregate data by customer age to create meaningful bubble sizes representing purchase frequency.
# Group by customer age to get aggregate metrics
bubble_data = df.groupby('customer_age').agg({
    'revenue': 'mean',  # Average order value (y-axis)
    'order_id': 'count',  # Purchase frequency (bubble size)
    'rating': 'mean'  # Average satisfaction
}).reset_index()
# Round values for cleaner display
bubble_data = bubble_data.round(0)

What just happened?

We created three variables: customer_age (x-axis), revenue mean (y-axis), and order_id count (bubble size). Age 22 customers spend ₹28,400 on average. Try this: Add .describe() to see distribution.

Bubble size represents purchase frequency. Age 28 customers show highest purchase frequency with good revenue.

Reading this bubble chart: Each bubble position shows age (x) vs average spending (y). Bubble size reveals purchase frequency — bigger bubbles mean more repeat customers. The sweet spot? Age 28 customers with high frequency and decent revenue.

Business impact: Target customers aged 25-30 for retention campaigns since they combine high frequency with growing revenue. Customers above 45 spend more per order but shop less frequently — perfect for premium product recommendations.

Multi-Attribute Radar Comparison

Radar charts excel when you need to compare multiple attributes simultaneously. Think of it as a customer's "fingerprint" — each point represents a different behavior metric. Perfect for comparing city-wise customer profiles across 5+ dimensions.

The scenario: BigBasket's regional manager needs to compare customer behavior patterns across Mumbai, Delhi, and Bangalore to optimize inventory distribution.
# Calculate city-wise metrics for radar chart
city_metrics = df.groupby('city').agg({
    'revenue': 'mean',  # Average spending power
    'rating': 'mean',   # Customer satisfaction
    'quantity': 'mean', # Average items per order
    'customer_age': 'mean'  # Average customer age
}).round(1)

What just happened?

Mumbai shows highest revenue (₹35,420) and rating (4.3). Delhi follows with ₹31,200 average revenue. Pune has lowest satisfaction at 3.9. Try this: Add return_rate to compare city loyalty.

The scenario: We need to normalize these metrics to 0-100 scale so they display properly on the radar chart axes.
# Normalize metrics to 0-100 scale for radar display
from sklearn.preprocessing import MinMaxScaler
# Initialize scaler to transform values to 0-100 range
scaler = MinMaxScaler(feature_range=(0, 100))
# Apply scaling to all metrics
normalized = scaler.fit_transform(city_metrics)
# Convert back to DataFrame with original index
radar_df = pd.DataFrame(normalized, 
                       columns=city_metrics.columns, 
                       index=city_metrics.index)

What just happened?

Mumbai scores 100.00 across all metrics (highest performer). Chennai and Pune show 0.00 in some areas (lowest performers). Delhi sits at 60-75 range. Try this: Check radar_df.describe() for distribution spread.

Mumbai dominates all metrics. Delhi shows balanced performance. Bangalore needs improvement in quantity per order.

Reading this radar: The farther from center, the better the performance. Mumbai's shape is nearly perfect — high revenue, ratings, quantity, and customer age. Delhi shows a balanced diamond shape. Bangalore has an uneven pattern.

Business decision: Invest more in Mumbai inventory since all metrics are strong. Focus on increasing order quantity in Bangalore through bundle offers. Delhi needs moderate inventory expansion across all categories.

📊 Data Insight

Mumbai customers average ₹35,420 per order with 4.3 rating, while Pune averages only ₹26,890 with 3.9 rating. The 32% revenue difference suggests Mumbai needs premium inventory focus.

Stacked Analysis Patterns

Stacked charts reveal composition changes over categories. Think gender distribution across age groups or product categories within cities. The trick? Making sure your subcategories actually add meaningful business value instead of just looking fancy.

The scenario: Flipkart's category manager wants to see how product preferences vary by gender across different cities to plan targeted advertising campaigns.
# Create crosstab for city vs product category by gender
male_purchases = df[df['gender'] == 'Male'].groupby(['city', 'product_category'])['revenue'].sum().reset_index()
female_purchases = df[df['gender'] == 'Female'].groupby(['city', 'product_category'])['revenue'].sum().reset_index()
# Pivot to get cities as columns, categories as rows
male_matrix = male_purchases.pivot(index='product_category', columns='city', values='revenue').fillna(0)
female_matrix = female_purchases.pivot(index='product_category', columns='city', values='revenue').fillna(0)

What just happened?

Males spend ₹224,800 on Electronics in Mumbai vs ₹142,800 by females. But females spend ₹67,800 on Clothing vs ₹42,300 by males. Clear gender preferences emerge. Try this: Calculate percentage differences between genders.

Male electronics dominance clear across all cities. Female clothing spending shows consistent second position.

Reading stacked bars: Each stack shows total city revenue, with segments revealing gender-category combinations. Mumbai's total height is highest, but the internal proportions show males dominate electronics while females lead clothing purchases.

Campaign strategy: Target male customers with electronics ads in Mumbai and Delhi (highest revenue potential). Focus female clothing campaigns in Mumbai, Delhi, and Bangalore. Electronics ads for females should emphasize different features than male-targeted campaigns.

Common Mistake: Too Many Stack Segments

Adding 8+ categories in stacked charts creates visual chaos. Stick to 2-4 meaningful segments max. Group smaller categories into "Others" if needed. Your audience should read the chart in 5 seconds, not 5 minutes.

Correlation Discovery with Scatter

Scatter plots expose relationships your summary statistics miss. Revenue vs rating correlation seems obvious, but what about customer age vs return rate? Or quantity vs satisfaction? These hidden patterns drive real business decisions.

The scenario: Zomato's product team suspects that customer age affects both order satisfaction and return likelihood, but needs visual proof for the executive presentation.
# Calculate correlation between age and rating
age_rating_corr = df['customer_age'].corr(df['rating'])
print(f"Age-Rating Correlation: {age_rating_corr:.3f}")
# Calculate correlation between age and return rate
df['return_numeric'] = df['returned'].astype(int)
age_return_corr = df['customer_age'].corr(df['return_numeric'])
print(f"Age-Return Correlation: {age_return_corr:.3f}")

What just happened?

Positive correlation 0.248 means older customers give higher ratings. Negative correlation -0.312 means older customers return items less frequently. Try this: Check correlations above 0.5 for stronger relationships.

Why does this matter? Because targeting older customers (35+) gives you higher satisfaction and lower return rates. That's double the profit impact — better ratings boost organic reach while fewer returns reduce operational costs.

📊 Data Insight

Customers aged 35+ show 0.248 higher rating correlation and 31.2% lower return probability. Focus acquisition campaigns on this demographic for 2x profitability improvement.

Chart Configuration Mastery

The difference between amateur and professional visualizations? Configuration details that most analysts ignore. Color psychology, axis formatting, legend positioning, responsive design — these micro-decisions determine whether your insights get implemented or ignored.

✅ Recommended

Use color-blind friendly palettes. Limit to 5 colors max. Always include data labels for exact values. Set beginAtZero: true for bar charts.

❌ Avoid

Random color schemes. 3D effects. Pie charts with 8+ slices. Missing axis titles. Default Chart.js colors.

Honestly, configuration is underrated. Spend 20% of your time getting the data right, 80% making it impossible to misinterpret. Executive decisions happen in 30 seconds of looking at your chart — make those seconds count.

Pro tip: Always test your charts on mobile devices. 60% of dashboard views happen on phones during commutes. If your bubble chart is unreadable on a 5-inch screen, it's useless regardless of how insightful the data is.

Quiz

1. Your manager at Swiggy wants to analyze delivery time vs customer rating vs order frequency for different restaurants. Which chart type and configuration would be most effective?


2. You need to create a stacked bar chart showing male vs female revenue across product categories for Myntra. What Chart.js configuration is required?


3. A Flipkart analyst finds a correlation of -0.312 between customer age and return rate, and wants to compare customer behavior profiles across 6 cities using 5 different metrics. What's the best approach?


Up Next

SQL Select & Filters

Master database querying fundamentals to extract and filter the exact data your advanced visualizations need from production systems.