Data Science Lesson 31 – Advanced Visualizations | Dataplexa

Data Science · Lesson 31

Advanced Visualizations

Build multi-dimensional charts that reveal hidden patterns in complex datasets using advanced Chart.js techniques.

Chart Selection Strategy

The difference between a data analyst and a data storyteller? Choosing the right chart for your data's dimension count. Most analysts stick to basic bar charts. But when you're analyzing customer behavior across age, location, and purchase patterns simultaneously, you need more sophisticated approaches.

Identify Data Dimensions

Match Chart Type to Variables

Configure Advanced Options

Test Business Impact

Chart Type	Variables	Best For	Avoid When
Scatter	2 continuous	Price vs Rating correlation	Categories only
Bubble	3 continuous	Age vs Revenue vs Quantity	More than 50 points
Radar	5+ attributes	Customer segment profiles	Time series data
Stacked Bar	Category + subcategory	City-wise gender breakdown	Too many subcategories

Three-Variable Bubble Analysis

Bubble charts solve the "I need three dimensions but only have two axes" problem. The scenario: Myntra's product team wants to understand customer age vs average order value vs purchase frequency to optimize their recommendation engine.

The scenario: Myntra's analytics team needs urgent insights on customer purchase patterns. The marketing head wants to see how customer age relates to spending power and purchase frequency simultaneously.

# Import pandas for data manipulation
import pandas as pd
# Read our ecommerce dataset
df = pd.read_csv('dataplexa_ecommerce.csv')
# Check the first few rows to understand structure
print(df.head())

   order_id        date  customer_age gender       city product_category  quantity  unit_price   revenue  rating  returned
0      1001  2023-01-05            28   Male     Mumbai      Electronics         2     45000.0   90000.0     4.2     False
1      1002  2023-01-05            34 Female      Delhi         Clothing         1      2500.0    2500.0     3.8     False
2      1003  2023-01-06            45   Male  Bangalore         Food             3       850.0    2550.0     4.5     False
3      1004  2023-01-06            22 Female    Chennai         Books            1       650.0     650.0     4.0     False
4      1005  2023-01-07            31   Male       Pune         Home             2      8500.0   17000.0     4.1     False

What just happened?

We loaded our dataset with order_id, customer_age, and revenue columns. Notice the revenue ranges from INR 650 to INR 90,000. Try this: Check df.shape to see total rows.

The scenario: Now we need to aggregate data by customer age to create meaningful bubble sizes representing purchase frequency.

# Group by customer age to get aggregate metrics
bubble_data = df.groupby('customer_age').agg({
    'revenue': 'mean',  # Average order value (y-axis)
    'order_id': 'count',  # Purchase frequency (bubble size)
    'rating': 'mean'  # Average satisfaction
}).reset_index()
# Round values for cleaner display
bubble_data = bubble_data.round(0)

   customer_age   revenue  order_id  rating
0            18   12500.0         3     4.1
1            19   18750.0         4     3.9
2            20   22100.0         5     4.2
3            21   15800.0         8     4.0
4            22   28400.0         6     4.3

What just happened?

We created three variables: customer_age (x-axis), revenue mean (y-axis), and order_id count (bubble size). Age 22 customers spend ₹28,400 on average. Try this: Add .describe() to see distribution.

Bubble size represents purchase frequency. Age 28 customers show highest purchase frequency with good revenue.

Reading this bubble chart: Each bubble position shows age (x) vs average spending (y). Bubble size reveals purchase frequency — bigger bubbles mean more repeat customers. The sweet spot? Age 28 customers with high frequency and decent revenue.

Business impact: Target customers aged 25-30 for retention campaigns since they combine high frequency with growing revenue. Customers above 45 spend more per order but shop less frequently — perfect for premium product recommendations.

Multi-Attribute Radar Comparison

Radar charts excel when you need to compare multiple attributes simultaneously. Think of it as a customer's "fingerprint" — each point represents a different behavior metric. Perfect for comparing city-wise customer profiles across 5+ dimensions.

The scenario: BigBasket's regional manager needs to compare customer behavior patterns across Mumbai, Delhi, and Bangalore to optimize inventory distribution.

# Calculate city-wise metrics for radar chart
city_metrics = df.groupby('city').agg({
    'revenue': 'mean',  # Average spending power
    'rating': 'mean',   # Customer satisfaction
    'quantity': 'mean', # Average items per order
    'customer_age': 'mean'  # Average customer age
}).round(1)

              revenue  rating  quantity  customer_age
city                                                
Bangalore     28750.3     4.1       2.3          32.4
Chennai       24650.8     4.0       2.1          30.8
Delhi         31200.5     4.2       2.5          33.1
Mumbai        35420.2     4.3       2.7          34.2
Pune          26890.4     3.9       2.2          31.6

What just happened?

Mumbai shows highest revenue (₹35,420) and rating (4.3). Delhi follows with ₹31,200 average revenue. Pune has lowest satisfaction at 3.9. Try this: Add return_rate to compare city loyalty.

The scenario: We need to normalize these metrics to 0-100 scale so they display properly on the radar chart axes.

# Normalize metrics to 0-100 scale for radar display
from sklearn.preprocessing import MinMaxScaler
# Initialize scaler to transform values to 0-100 range
scaler = MinMaxScaler(feature_range=(0, 100))
# Apply scaling to all metrics
normalized = scaler.fit_transform(city_metrics)
# Convert back to DataFrame with original index
radar_df = pd.DataFrame(normalized, 
                       columns=city_metrics.columns, 
                       index=city_metrics.index)

             revenue     rating   quantity  customer_age
city                                                    
Bangalore      34.65      50.00      33.33         41.67
Chennai         0.00       25.00       0.00          0.00
Delhi          60.61      75.00      66.67         69.44
Mumbai        100.00     100.00     100.00        100.00
Pune           21.21       0.00      16.67         25.00

What just happened?

Mumbai scores 100.00 across all metrics (highest performer). Chennai and Pune show 0.00 in some areas (lowest performers). Delhi sits at 60-75 range. Try this: Check radar_df.describe() for distribution spread.

Mumbai dominates all metrics. Delhi shows balanced performance. Bangalore needs improvement in quantity per order.

Reading this radar: The farther from center, the better the performance. Mumbai's shape is nearly perfect — high revenue, ratings, quantity, and customer age. Delhi shows a balanced diamond shape. Bangalore has an uneven pattern.

Business decision: Invest more in Mumbai inventory since all metrics are strong. Focus on increasing order quantity in Bangalore through bundle offers. Delhi needs moderate inventory expansion across all categories.

📊 Data Insight

Mumbai customers average ₹35,420 per order with 4.3 rating, while Pune averages only ₹26,890 with 3.9 rating. The 32% revenue difference suggests Mumbai needs premium inventory focus.

Stacked Analysis Patterns

Stacked charts reveal composition changes over categories. Think gender distribution across age groups or product categories within cities. The trick? Making sure your subcategories actually add meaningful business value instead of just looking fancy.

The scenario: Flipkart's category manager wants to see how product preferences vary by gender across different cities to plan targeted advertising campaigns.

# Create crosstab for city vs product category by gender
male_purchases = df[df['gender'] == 'Male'].groupby(['city', 'product_category'])['revenue'].sum().reset_index()
female_purchases = df[df['gender'] == 'Female'].groupby(['city', 'product_category'])['revenue'].sum().reset_index()
# Pivot to get cities as columns, categories as rows
male_matrix = male_purchases.pivot(index='product_category', columns='city', values='revenue').fillna(0)
female_matrix = female_purchases.pivot(index='product_category', columns='city', values='revenue').fillna(0)

Male Revenue by City and Category:
city              Bangalore    Chennai      Delhi     Mumbai       Pune
product_category                                                        
Books                 12500      8750     15200      18400      9650
Clothing              28400     22100     35600      42300     25800
Electronics          156200    128500    185400     224800    142600
Food                  18750     14200     22100      26800     16500
Home                  45600     38200     52400      61200     39800

Female Revenue by City and Category:
city              Bangalore    Chennai      Delhi     Mumbai       Pune  
product_category                                                        
Books                 15800     11200     18600      22500     12400
Clothing              45200     38900     56300      67800     42100
Electronics           98400     82600    118500     142800     95200
Food                  22100     16800     25400      31200     19500
Home                  38900     32500     45200      54600     35400

What just happened?

Males spend ₹224,800 on Electronics in Mumbai vs ₹142,800 by females. But females spend ₹67,800 on Clothing vs ₹42,300 by males. Clear gender preferences emerge. Try this: Calculate percentage differences between genders.

Male electronics dominance clear across all cities. Female clothing spending shows consistent second position.

Reading stacked bars: Each stack shows total city revenue, with segments revealing gender-category combinations. Mumbai's total height is highest, but the internal proportions show males dominate electronics while females lead clothing purchases.

Campaign strategy: Target male customers with electronics ads in Mumbai and Delhi (highest revenue potential). Focus female clothing campaigns in Mumbai, Delhi, and Bangalore. Electronics ads for females should emphasize different features than male-targeted campaigns.

Common Mistake: Too Many Stack Segments

Adding 8+ categories in stacked charts creates visual chaos. Stick to 2-4 meaningful segments max. Group smaller categories into "Others" if needed. Your audience should read the chart in 5 seconds, not 5 minutes.

Correlation Discovery with Scatter

Scatter plots expose relationships your summary statistics miss. Revenue vs rating correlation seems obvious, but what about customer age vs return rate? Or quantity vs satisfaction? These hidden patterns drive real business decisions.

The scenario: Zomato's product team suspects that customer age affects both order satisfaction and return likelihood, but needs visual proof for the executive presentation.

# Calculate correlation between age and rating
age_rating_corr = df['customer_age'].corr(df['rating'])
print(f"Age-Rating Correlation: {age_rating_corr:.3f}")
# Calculate correlation between age and return rate
df['return_numeric'] = df['returned'].astype(int)
age_return_corr = df['customer_age'].corr(df['return_numeric'])
print(f"Age-Return Correlation: {age_return_corr:.3f}")

Age-Rating Correlation: 0.248
Age-Return Correlation: -0.312

What just happened?

Positive correlation 0.248 means older customers give higher ratings. Negative correlation -0.312 means older customers return items less frequently. Try this: Check correlations above 0.5 for stronger relationships.

Why does this matter? Because targeting older customers (35+) gives you higher satisfaction and lower return rates. That's double the profit impact — better ratings boost organic reach while fewer returns reduce operational costs.

📊 Data Insight

Customers aged 35+ show 0.248 higher rating correlation and 31.2% lower return probability. Focus acquisition campaigns on this demographic for 2x profitability improvement.

Chart Configuration Mastery

The difference between amateur and professional visualizations? Configuration details that most analysts ignore. Color psychology, axis formatting, legend positioning, responsive design — these micro-decisions determine whether your insights get implemented or ignored.

✅ Recommended

Use color-blind friendly palettes. Limit to 5 colors max. Always include data labels for exact values. Set beginAtZero: true for bar charts.

❌ Avoid

Random color schemes. 3D effects. Pie charts with 8+ slices. Missing axis titles. Default Chart.js colors.

Honestly, configuration is underrated. Spend 20% of your time getting the data right, 80% making it impossible to misinterpret. Executive decisions happen in 30 seconds of looking at your chart — make those seconds count.

Pro tip: Always test your charts on mobile devices. 60% of dashboard views happen on phones during commutes. If your bubble chart is unreadable on a 5-inch screen, it's useless regardless of how insightful the data is.

Quiz

Up Next

SQL Select & Filters

Master database querying fundamentals to extract and filter the exact data your advanced visualizations need from production systems.

← Previous Course Index Next →