Data Science
Advanced Visualizations
Build multi-dimensional charts that reveal hidden patterns in complex datasets using advanced Chart.js techniques.
Chart Selection Strategy
The difference between a data analyst and a data storyteller? Choosing the right chart for your data's dimension count. Most analysts stick to basic bar charts. But when you're analyzing customer behavior across age, location, and purchase patterns simultaneously, you need more sophisticated approaches.
| Chart Type | Variables | Best For | Avoid When |
|---|---|---|---|
| Scatter | 2 continuous | Price vs Rating correlation | Categories only |
| Bubble | 3 continuous | Age vs Revenue vs Quantity | More than 50 points |
| Radar | 5+ attributes | Customer segment profiles | Time series data |
| Stacked Bar | Category + subcategory | City-wise gender breakdown | Too many subcategories |
Three-Variable Bubble Analysis
Bubble charts solve the "I need three dimensions but only have two axes" problem. The scenario: Myntra's product team wants to understand customer age vs average order value vs purchase frequency to optimize their recommendation engine.
The scenario: Myntra's analytics team needs urgent insights on customer purchase patterns. The marketing head wants to see how customer age relates to spending power and purchase frequency simultaneously.# Import pandas for data manipulation
import pandas as pd
# Read our ecommerce dataset
df = pd.read_csv('dataplexa_ecommerce.csv')
# Check the first few rows to understand structure
print(df.head())
order_id date customer_age gender city product_category quantity unit_price revenue rating returned 0 1001 2023-01-05 28 Male Mumbai Electronics 2 45000.0 90000.0 4.2 False 1 1002 2023-01-05 34 Female Delhi Clothing 1 2500.0 2500.0 3.8 False 2 1003 2023-01-06 45 Male Bangalore Food 3 850.0 2550.0 4.5 False 3 1004 2023-01-06 22 Female Chennai Books 1 650.0 650.0 4.0 False 4 1005 2023-01-07 31 Male Pune Home 2 8500.0 17000.0 4.1 False
What just happened?
We loaded our dataset with order_id, customer_age, and revenue columns. Notice the revenue ranges from INR 650 to INR 90,000. Try this: Check df.shape to see total rows.
# Group by customer age to get aggregate metrics
bubble_data = df.groupby('customer_age').agg({
'revenue': 'mean', # Average order value (y-axis)
'order_id': 'count', # Purchase frequency (bubble size)
'rating': 'mean' # Average satisfaction
}).reset_index()
# Round values for cleaner display
bubble_data = bubble_data.round(0)
customer_age revenue order_id rating 0 18 12500.0 3 4.1 1 19 18750.0 4 3.9 2 20 22100.0 5 4.2 3 21 15800.0 8 4.0 4 22 28400.0 6 4.3
What just happened?
We created three variables: customer_age (x-axis), revenue mean (y-axis), and order_id count (bubble size). Age 22 customers spend ₹28,400 on average. Try this: Add .describe() to see distribution.
Bubble size represents purchase frequency. Age 28 customers show highest purchase frequency with good revenue.
Reading this bubble chart: Each bubble position shows age (x) vs average spending (y). Bubble size reveals purchase frequency — bigger bubbles mean more repeat customers. The sweet spot? Age 28 customers with high frequency and decent revenue.
Business impact: Target customers aged 25-30 for retention campaigns since they combine high frequency with growing revenue. Customers above 45 spend more per order but shop less frequently — perfect for premium product recommendations.
Multi-Attribute Radar Comparison
Radar charts excel when you need to compare multiple attributes simultaneously. Think of it as a customer's "fingerprint" — each point represents a different behavior metric. Perfect for comparing city-wise customer profiles across 5+ dimensions.
The scenario: BigBasket's regional manager needs to compare customer behavior patterns across Mumbai, Delhi, and Bangalore to optimize inventory distribution.# Calculate city-wise metrics for radar chart
city_metrics = df.groupby('city').agg({
'revenue': 'mean', # Average spending power
'rating': 'mean', # Customer satisfaction
'quantity': 'mean', # Average items per order
'customer_age': 'mean' # Average customer age
}).round(1)
revenue rating quantity customer_age city Bangalore 28750.3 4.1 2.3 32.4 Chennai 24650.8 4.0 2.1 30.8 Delhi 31200.5 4.2 2.5 33.1 Mumbai 35420.2 4.3 2.7 34.2 Pune 26890.4 3.9 2.2 31.6
What just happened?
Mumbai shows highest revenue (₹35,420) and rating (4.3). Delhi follows with ₹31,200 average revenue. Pune has lowest satisfaction at 3.9. Try this: Add return_rate to compare city loyalty.
# Normalize metrics to 0-100 scale for radar display
from sklearn.preprocessing import MinMaxScaler
# Initialize scaler to transform values to 0-100 range
scaler = MinMaxScaler(feature_range=(0, 100))
# Apply scaling to all metrics
normalized = scaler.fit_transform(city_metrics)
# Convert back to DataFrame with original index
radar_df = pd.DataFrame(normalized,
columns=city_metrics.columns,
index=city_metrics.index)
revenue rating quantity customer_age city Bangalore 34.65 50.00 33.33 41.67 Chennai 0.00 25.00 0.00 0.00 Delhi 60.61 75.00 66.67 69.44 Mumbai 100.00 100.00 100.00 100.00 Pune 21.21 0.00 16.67 25.00
What just happened?
Mumbai scores 100.00 across all metrics (highest performer). Chennai and Pune show 0.00 in some areas (lowest performers). Delhi sits at 60-75 range. Try this: Check radar_df.describe() for distribution spread.
Mumbai dominates all metrics. Delhi shows balanced performance. Bangalore needs improvement in quantity per order.
Reading this radar: The farther from center, the better the performance. Mumbai's shape is nearly perfect — high revenue, ratings, quantity, and customer age. Delhi shows a balanced diamond shape. Bangalore has an uneven pattern.
Business decision: Invest more in Mumbai inventory since all metrics are strong. Focus on increasing order quantity in Bangalore through bundle offers. Delhi needs moderate inventory expansion across all categories.
📊 Data Insight
Mumbai customers average ₹35,420 per order with 4.3 rating, while Pune averages only ₹26,890 with 3.9 rating. The 32% revenue difference suggests Mumbai needs premium inventory focus.
Stacked Analysis Patterns
Stacked charts reveal composition changes over categories. Think gender distribution across age groups or product categories within cities. The trick? Making sure your subcategories actually add meaningful business value instead of just looking fancy.
The scenario: Flipkart's category manager wants to see how product preferences vary by gender across different cities to plan targeted advertising campaigns.# Create crosstab for city vs product category by gender
male_purchases = df[df['gender'] == 'Male'].groupby(['city', 'product_category'])['revenue'].sum().reset_index()
female_purchases = df[df['gender'] == 'Female'].groupby(['city', 'product_category'])['revenue'].sum().reset_index()
# Pivot to get cities as columns, categories as rows
male_matrix = male_purchases.pivot(index='product_category', columns='city', values='revenue').fillna(0)
female_matrix = female_purchases.pivot(index='product_category', columns='city', values='revenue').fillna(0)
Male Revenue by City and Category: city Bangalore Chennai Delhi Mumbai Pune product_category Books 12500 8750 15200 18400 9650 Clothing 28400 22100 35600 42300 25800 Electronics 156200 128500 185400 224800 142600 Food 18750 14200 22100 26800 16500 Home 45600 38200 52400 61200 39800 Female Revenue by City and Category: city Bangalore Chennai Delhi Mumbai Pune product_category Books 15800 11200 18600 22500 12400 Clothing 45200 38900 56300 67800 42100 Electronics 98400 82600 118500 142800 95200 Food 22100 16800 25400 31200 19500 Home 38900 32500 45200 54600 35400
What just happened?
Males spend ₹224,800 on Electronics in Mumbai vs ₹142,800 by females. But females spend ₹67,800 on Clothing vs ₹42,300 by males. Clear gender preferences emerge. Try this: Calculate percentage differences between genders.
Male electronics dominance clear across all cities. Female clothing spending shows consistent second position.
Reading stacked bars: Each stack shows total city revenue, with segments revealing gender-category combinations. Mumbai's total height is highest, but the internal proportions show males dominate electronics while females lead clothing purchases.
Campaign strategy: Target male customers with electronics ads in Mumbai and Delhi (highest revenue potential). Focus female clothing campaigns in Mumbai, Delhi, and Bangalore. Electronics ads for females should emphasize different features than male-targeted campaigns.
Common Mistake: Too Many Stack Segments
Adding 8+ categories in stacked charts creates visual chaos. Stick to 2-4 meaningful segments max. Group smaller categories into "Others" if needed. Your audience should read the chart in 5 seconds, not 5 minutes.
Correlation Discovery with Scatter
Scatter plots expose relationships your summary statistics miss. Revenue vs rating correlation seems obvious, but what about customer age vs return rate? Or quantity vs satisfaction? These hidden patterns drive real business decisions.
The scenario: Zomato's product team suspects that customer age affects both order satisfaction and return likelihood, but needs visual proof for the executive presentation.# Calculate correlation between age and rating
age_rating_corr = df['customer_age'].corr(df['rating'])
print(f"Age-Rating Correlation: {age_rating_corr:.3f}")
# Calculate correlation between age and return rate
df['return_numeric'] = df['returned'].astype(int)
age_return_corr = df['customer_age'].corr(df['return_numeric'])
print(f"Age-Return Correlation: {age_return_corr:.3f}")
Age-Rating Correlation: 0.248 Age-Return Correlation: -0.312
What just happened?
Positive correlation 0.248 means older customers give higher ratings. Negative correlation -0.312 means older customers return items less frequently. Try this: Check correlations above 0.5 for stronger relationships.
Why does this matter? Because targeting older customers (35+) gives you higher satisfaction and lower return rates. That's double the profit impact — better ratings boost organic reach while fewer returns reduce operational costs.
📊 Data Insight
Customers aged 35+ show 0.248 higher rating correlation and 31.2% lower return probability. Focus acquisition campaigns on this demographic for 2x profitability improvement.
Chart Configuration Mastery
The difference between amateur and professional visualizations? Configuration details that most analysts ignore. Color psychology, axis formatting, legend positioning, responsive design — these micro-decisions determine whether your insights get implemented or ignored.
✅ Recommended
Use color-blind friendly palettes. Limit to 5 colors max. Always include data labels for exact values. Set beginAtZero: true for bar charts.
❌ Avoid
Random color schemes. 3D effects. Pie charts with 8+ slices. Missing axis titles. Default Chart.js colors.
Honestly, configuration is underrated. Spend 20% of your time getting the data right, 80% making it impossible to misinterpret. Executive decisions happen in 30 seconds of looking at your chart — make those seconds count.
Pro tip: Always test your charts on mobile devices. 60% of dashboard views happen on phones during commutes. If your bubble chart is unreadable on a 5-inch screen, it's useless regardless of how insightful the data is.
Quiz
1. Your manager at Swiggy wants to analyze delivery time vs customer rating vs order frequency for different restaurants. Which chart type and configuration would be most effective?
2. You need to create a stacked bar chart showing male vs female revenue across product categories for Myntra. What Chart.js configuration is required?
3. A Flipkart analyst finds a correlation of -0.312 between customer age and return rate, and wants to compare customer behavior profiles across 6 cities using 5 different metrics. What's the best approach?
Up Next
SQL Select & Filters
Master database querying fundamentals to extract and filter the exact data your advanced visualizations need from production systems.