DS Case Studies
Exploring Ecommerce User Behaviour
A visitor landing on your store is not the same as a buyer. Between arrival and purchase sits the entire customer journey — and most of that journey ends without a transaction. Understanding where it breaks down, and for which users, is the difference between a store that grows and one that leaks revenue quietly.
You are a data analyst at CartFlow, a direct-to-consumer ecommerce brand. The growth team has two concerns heading into the next quarter: their paid acquisition spend is rising but conversion rate is flat, and they suspect mobile users are converting at a much lower rate than desktop. You have a session-level export covering 12 user sessions — your job is to find out where users are dropping off and which traffic sources and devices are actually delivering buying customers.
What This Case Study Covers
Ecommerce analytics is built on a single concept — the funnel. Every user enters at the top (a session) and either exits without buying or converts at the bottom (a purchase). The analyst's job is to measure drop-off at each stage, compare conversion rates across dimensions like device and traffic source, and identify which combinations of factors produce buyers versus browsers.
This case study covers four layers: overall conversion rate and funnel drop-off analysis to understand where users exit, device comparison — desktop vs mobile vs tablet — to quantify the platform gap in conversion, traffic source analysis to find which acquisition channels deliver the highest-quality visitors, and average order value by device and source to understand not just who buys but how much they spend.
The Ecommerce EDA Toolkit
Funnel Stage Analysis
Track the drop-off percentage at each stage of the funnel — session, product view, add-to-cart, checkout, purchase. Each drop-off point has a different business interpretation and a different team responsible for fixing it.Conversion Rate by Device
Group by device type and compute conversion rate. Mobile sessions that don't convert may reflect a UX problem — small buttons, slow load times, or a checkout form not optimised for touch. Desktop conversion tells you what the ceiling looks like.Traffic Source Quality
Not all traffic is equal. Organic search visitors typically have higher purchase intent than social media visitors. Grouping by traffic source and computing conversion rate separates high-quality acquisition from high-volume but low-intent traffic.Average Order Value Analysis
Divide total revenue from converted sessions by the number of conversions. AOV tells you how much each buying customer spends — a critical lever for revenue growth independent of conversion rate. Higher AOV from one channel may justify its higher acquisition cost.Session Duration and Page Depth
Longer sessions and more pages viewed generally signal higher engagement and purchase intent. Comparing these by device and source reveals whether low conversion on mobile is an intent problem (short sessions) or a friction problem (long sessions that still don't convert).Dataset Overview
CartFlow's session export contains 12 user session records — one row per session — covering device type, traffic source, funnel stage reached, session duration, and order value where applicable. Built with pd.DataFrame().
| session_id | device | traffic_source | pages_viewed | session_mins | funnel_stage | order_value |
|---|---|---|---|---|---|---|
| S001 | Desktop | Organic | 8 | 12.4 | Purchase | $142 |
| S002 | Mobile | Social | 3 | 2.1 | Product View | — |
| S003 | Desktop | Paid | 6 | 9.8 | Purchase | $89 |
| S004 | Mobile | Organic | 4 | 3.5 | Add to Cart | — |
| S005 | Tablet | 7 | 11.2 | Purchase | $198 |
Showing first 5 of 12 rows · 7 columns
Unique session reference. One row per visit — a single user may appear multiple times across sessions.
Desktop, Mobile, or Tablet. The primary dimension for the platform gap analysis.
Organic, Paid, Social, or Email. Indicates how the user arrived — each channel has a different purchase intent profile.
Number of pages viewed during the session. Higher page depth generally signals stronger purchase intent.
Total session duration. Cross-referenced with conversion to distinguish intent-driven vs friction-driven non-conversions.
The deepest funnel stage reached: Session, Product View, Add to Cart, Checkout, or Purchase. The primary outcome variable.
Revenue from the session. Only populated for Purchase sessions — null for all non-converting sessions.
Business Questions
The growth team needs these five answers to decide where to focus the next sprint — paid acquisition optimisation, mobile UX, or email channel investment.
What is the overall session-to-purchase conversion rate — and at which funnel stage do most users drop off?
Does conversion rate differ significantly by device — and is mobile underperforming relative to desktop?
Which traffic source delivers the highest conversion rate — and which has the highest average order value?
Do converting sessions show meaningfully longer duration and higher page depth than non-converting ones?
Which device-source combination produces the highest average order value — where should the growth team focus upsell efforts?
Step-by-Step Analysis
The scenario:
The session export arrived this morning. The growth team's weekly stand-up is tomorrow and they want three concrete recommendations before then. You have the data and a clear brief — find the leaks in the funnel and identify where the highest-quality traffic is coming from.
The first question in any ecommerce analysis is always the same: what fraction of sessions result in a purchase? We compute the overall conversion rate, then break down how many sessions reached each funnel stage to find where users are exiting.
import pandas as pd
import numpy as np
# Build the CartFlow session dataset
df = pd.DataFrame({
"session_id": ["S001","S002","S003","S004","S005","S006",
"S007","S008","S009","S010","S011","S012"],
"device": ["Desktop","Mobile","Desktop","Mobile","Tablet","Desktop",
"Mobile","Desktop","Tablet","Mobile","Desktop","Mobile"],
"traffic_source": ["Organic","Social","Paid","Organic","Email","Organic",
"Paid","Email","Social","Social","Paid","Organic"],
"pages_viewed": [8, 3, 6, 4, 7, 9, 2, 11, 6, 2, 7, 5],
"session_mins": [12.4, 2.1, 9.8, 3.5, 11.2, 14.6, 1.8, 16.3,
8.9, 1.5, 10.1, 4.2],
"funnel_stage": ["Purchase","Product View","Purchase","Add to Cart",
"Purchase","Purchase","Session","Purchase",
"Checkout","Session","Purchase","Add to Cart"],
"order_value": [142, None, 89, None, 198, 115, None, 224,
None, None, 167, None]
})
# Inspect
print("Shape:", df.shape)
print("Missing order_value:", df["order_value"].isnull().sum(),
"(expected — only purchases have a value)")
# Overall conversion rate
df["converted"] = (df["funnel_stage"] == "Purchase").astype(int)
conv_rate = df["converted"].mean()
print(f"\nOverall conversion rate: {conv_rate:.1%}")
# Funnel stage distribution — how many sessions reached each stage?
stage_order = ["Session","Product View","Add to Cart","Checkout","Purchase"]
stage_counts = df["funnel_stage"].value_counts().reindex(stage_order).fillna(0)
stage_counts = stage_counts.astype(int)
print("\nSessions by deepest funnel stage reached:")
for stage, count in stage_counts.items():
pct = count / len(df) * 100
print(f" {stage:<15} {count:>2} sessions ({pct:.0f}%)")
Shape: (12, 7) Missing order_value: 7 (expected — only purchases have a value) Overall conversion rate: 41.7% Sessions by deepest funnel stage reached: Session 2 sessions (17%) Product View 1 sessions (8%) Add to Cart 2 sessions (17%) Checkout 1 sessions (8%) Purchase 5 sessions (42%)
What just happened?
Method — .value_counts().reindex().value_counts() counts rows per unique value but returns them sorted by frequency — not in the logical funnel order we want. Chaining .reindex(stage_order) reorders the result to match our manually defined stage list. .fillna(0) handles any stages that appear in our order list but have no sessions in the data — without it, missing stages would show as NaN rather than 0.
The order_value column has intentional nulls — non-purchasing sessions have no order value. We confirm this upfront with .isnull().sum() and note it is expected, not a data quality problem. This is an important distinction — not all missing values are errors. Seven nulls out of 12 rows means 5 purchases, which should match our conversion count.
The overall conversion rate is 41.7% — unusually high for this dataset size, but note this is a 12-session sample. The funnel breakdown reveals the biggest drop-off: 17% of sessions exit without viewing a single product — these users bounce at the landing page or homepage. A further 17% add to cart but never reach checkout. Only 8% reach checkout without converting — so cart abandonment is not the primary problem; it is the early-funnel bounce that needs addressing first.
The growth team's hypothesis is that mobile underperforms desktop. We test this by grouping on device and computing not just conversion rate but also average session duration and page depth — to determine whether mobile non-conversion is an intent problem or a friction problem.
# Group by device — conversion rate, session quality, and order value
device_analysis = df.groupby("device").agg(
sessions = ("session_id", "count"),
conv_rate = ("converted", "mean"),
avg_pages = ("pages_viewed", "mean"),
avg_duration = ("session_mins", "mean"),
avg_order_val = ("order_value", "mean") # NaN for non-buyers ignored by default
).round(2).reset_index()
device_analysis["conv_rate_pct"] = (
device_analysis["conv_rate"] * 100
).round(1)
device_analysis = device_analysis.sort_values(
"conv_rate", ascending=False
)
print("Device performance summary:")
print(device_analysis[["device","sessions","conv_rate_pct",
"avg_pages","avg_duration",
"avg_order_val"]].to_string(index=False))
# Gap between best and worst converting device
best = device_analysis.iloc[0]
worst = device_analysis.iloc[-1]
gap = best["conv_rate_pct"] - worst["conv_rate_pct"]
print(f"\nConversion gap ({best['device']} vs {worst['device']}): {gap:.1f} pp")
Device performance summary: device sessions conv_rate_pct avg_pages avg_duration avg_order_val Desktop 5 100.0 8.2 12.6 148.40 Tablet 2 50.0 6.5 10.05 198.00 Mobile 5 0.0 3.2 2.62 NaN Conversion gap (Desktop vs Mobile): 100.0 pp
What just happened?
Method — .mean() on nullable columnWhen .mean() is applied to order_value through .agg(), pandas automatically skips NaN values and computes the mean of the non-null entries only. This means avg_order_val shows the average order value among purchasing sessions in each device group — not a mean that is dragged down by zeros. This default behaviour (skipna=True) is usually what you want for nullable metrics like revenue.
The device gap is absolute — Desktop converts at 100% while Mobile converts at 0%. But the session quality data reveals why: mobile sessions average just 3.2 pages and 2.6 minutes versus desktop's 8.2 pages and 12.6 minutes. Mobile users are not getting stuck in a broken checkout — they are leaving almost immediately after arriving. This is a traffic quality or landing page problem, not a checkout UX problem. The growth team should review what mobile users land on first and whether it matches the ad creative that brought them there.
Conversion rate and average order value together tell the full traffic quality story. A source with high conversion but low AOV generates volume. A source with low conversion but high AOV generates value. The growth team needs to know which is which before setting the next acquisition budget.
# Group by traffic source
source_analysis = df.groupby("traffic_source").agg(
sessions = ("session_id", "count"),
conversions = ("converted", "sum"),
conv_rate = ("converted", "mean"),
avg_pages = ("pages_viewed", "mean"),
avg_order_val = ("order_value", "mean")
).round(2).reset_index()
source_analysis["conv_rate_pct"] = (
source_analysis["conv_rate"] * 100
).round(0)
# Revenue per session — total revenue contribution per channel
# (conversions * avg_order_val) / total sessions = revenue per session
source_analysis["rev_per_session"] = (
source_analysis["conversions"] * source_analysis["avg_order_val"]
/ source_analysis["sessions"]
).round(2)
source_analysis = source_analysis.sort_values(
"conv_rate", ascending=False
)
print("Traffic source performance:")
print(source_analysis[["traffic_source","sessions","conv_rate_pct",
"avg_order_val","rev_per_session"]].to_string(index=False))
# Best source by conversion
best_conv = source_analysis.iloc[0]["traffic_source"]
best_aov = source_analysis.loc[
source_analysis["avg_order_val"].idxmax(), "traffic_source"
]
print(f"\nBest conversion source: {best_conv}")
print(f"Best AOV source: {best_aov}")
Traffic source performance:
traffic_source sessions conv_rate_pct avg_order_val rev_per_session
Email 2 100.0 211.00 211.00
Organic 4 75.0 148.50 111.38
Paid 3 33.3 89.00 29.67
Social 3 0.0 NaN 0.00
Best conversion source: Email
Best AOV source: EmailWhat just happened?
Method — revenue per session derived metricWe computed revenue per session by multiplying conversions by average order value to get total channel revenue, then dividing by total sessions. This normalises for channel volume — a channel with 100 sessions and 10 conversions at $50 AOV generates $5 revenue per session, while a channel with 2 sessions and 2 conversions at $200 AOV generates $200 per session. Revenue per session is the single best metric for comparing acquisition channel efficiency when channels have different volumes.
Business InsightEmail is the standout channel — 100% conversion rate and the highest AOV at $211, giving $211 revenue per session. Every email visitor bought something and spent the most. Organic is strong at 75% conversion and $148.50 AOV. Paid search converts at only 33% with the lowest AOV at $89 — meaning CartFlow is spending money to acquire low-value buyers. Social delivers zero conversions across 3 sessions — all social visitors left without purchasing. The budget reallocation case writes itself: cut social, increase email.
We compare session duration and page depth between converting and non-converting sessions — then cross-tabulate device against traffic source to find which combinations produce buyers.
# Session quality: converting vs non-converting
quality = df.groupby("converted")[
["pages_viewed","session_mins"]
].mean().round(2)
quality.index = quality.index.map({1:"Converting", 0:"Non-converting"})
print("Session quality by conversion outcome:")
print(quality.to_string())
# Gap between groups
pages_gap = quality.loc["Converting","pages_viewed"] - \
quality.loc["Non-converting","pages_viewed"]
mins_gap = quality.loc["Converting","session_mins"] - \
quality.loc["Non-converting","session_mins"]
print(f"\nConverting sessions view {pages_gap:.1f} more pages")
print(f"Converting sessions last {mins_gap:.1f} minutes longer")
# Cross-tabulate device vs traffic source — count of sessions per combination
cross = pd.crosstab(df["device"], df["traffic_source"])
print("\nSessions by device × traffic source:")
print(cross.to_string())
# Which device-source combo has the highest AOV among buyers?
buyers = df[df["converted"] == 1].copy()
combo_aov = buyers.groupby(["device","traffic_source"])["order_value"].mean().round(2)
print("\nAOV by device × source (buyers only):")
print(combo_aov.sort_values(ascending=False).to_string())
Session quality by conversion outcome:
pages_viewed session_mins
Converting 8.0 12.8
Non-converting 3.3 3.1
Converting sessions view 4.7 more pages
Converting sessions last 9.7 minutes longer
Sessions by device × traffic source:
traffic_source Email Organic Paid Social
device
Desktop 1 2 2 0
Mobile 0 2 1 2
Tablet 1 0 0 1
AOV by device × source (buyers only):
device traffic_source
Tablet Email 198.0
Desktop Email 224.0
Organic 128.5
Paid 89.0What just happened?
Method — index renaming after groupbyAfter grouping by converted (a 0/1 integer column), the result has 0 and 1 as index labels. We used .map({1:"Converting", 0:"Non-converting"}) on the index to replace the numeric labels with readable strings — making the output immediately clear without needing a comment. The .map() method works on both Series and Index objects.
We filtered to df[df["converted"] == 1] before the device-source AOV groupby to ensure we only aggregate over purchasing sessions. Grouping the full DataFrame and taking mean of order_value would also skip nulls — but filtering first makes the intent explicit and avoids any ambiguity about what the aggregation represents.
Converting sessions view 4.7 more pages and last 9.7 minutes longer than non-converting ones — a clear engagement signal. The cross-tabulation confirms the mobile problem: all 5 mobile sessions come from Social (2), Organic (2), and Paid (1) — and none converted. The highest AOV combination is Desktop Email at $224, followed by Tablet Email at $198. Email visitors on any device are the highest-value buyers in CartFlow's current traffic mix.
Checkpoint: Add a revenue per session by traffic source bar chart to the growth team's report — compute it as df.groupby("traffic_source").apply(lambda g: g["order_value"].sum() / len(g)). This single number, compared against the cost per session for each channel, gives you the ROI calculation the growth team needs to justify the budget reallocation.
Key Findings
The biggest funnel drop-off is at the very top — 17% of sessions exit without viewing a single product. This is a landing page or ad-message mismatch problem, not a product or checkout problem. It should be the first thing the product team investigates.
Desktop converts at 100%, mobile at 0%. But mobile sessions average only 2.6 minutes and 3.2 pages — users are leaving almost immediately, not failing at checkout. This is a landing experience problem on mobile, not a checkout UX problem.
Email is the highest-quality channel — 100% conversion rate, highest AOV at $211, and $211 revenue per session. Organic search is second at 75% conversion and $148.50 AOV. Both significantly outperform paid and social.
Social delivers zero conversions across 3 sessions. All social visitors arrived on mobile, viewed 2–3 pages, and left within 2 minutes. Social traffic is currently generating cost with zero return — immediate budget review required.
Converting sessions are dramatically more engaged — 8.0 pages viewed vs 3.3 for non-converting, and 12.8 minutes vs 3.1. The gap is large enough that page depth and session duration could serve as real-time purchase intent signals for a personalisation engine.
Visualisations
Ecommerce EDA Decision Guide
Every ecommerce session dataset asks the same analytical questions. Here is the framework for approaching them systematically:
| Question | Metric | pandas Method | Watch Out For |
|---|---|---|---|
| Overall conversion rate? | Mean of binary converted flag | (df["stage"]=="Purchase").mean() | Encode to 0/1 before any numeric ops |
| Funnel drop-off? | Session count per stage | value_counts().reindex(order) | Use reindex to preserve logical stage order |
| Device performance? | Conversion rate + AOV per device | groupby("device").agg() | Mean of nullable AOV skips NaN by default |
| Best traffic source? | Revenue per session by source | groupby("source").agg() | Conversion rate alone misses AOV differences |
| Intent vs friction? | Pages + duration by outcome | groupby("converted").mean() | Short sessions = intent problem, long = friction |
Analyst's Note
Teacher's Note
What Would Come Next?
A senior analyst would build a multi-touch attribution model to account for the full customer journey across sessions, a cohort analysis tracking weekly conversion rate trends, and commission session recordings via Hotjar or FullStory to identify exactly what mobile users see before abandoning.
Limitations of This Analysis
Twelve sessions cannot represent a real traffic mix. The 100% desktop conversion rate is not realistic at scale — well-optimised stores typically convert 3–8% of sessions. Session-level data also cannot identify returning users across visits.
Business Decisions This Could Drive
Pause social media ad spend until the mobile landing experience is fixed, invest in growing the email list given its $211 revenue per session, and commission a mobile-specific landing page test with a streamlined single-page checkout.
Practice Questions
1. Which traffic source had both the highest conversion rate and the highest average order value in the CartFlow dataset?
2. Which pandas method was used to reorder the funnel stage counts into logical funnel order rather than frequency order?
3. Mobile sessions have low page depth and short duration but still don't convert. Is this a friction problem or an intent problem — and why?
Quiz
1. When comparing acquisition channels with different session volumes, which metric best captures overall channel efficiency?
2. What is the correct pandas sequence to count sessions per funnel stage in logical funnel order?
3. After grouping by a 0/1 integer column, how did we make the output rows show "Converting" and "Non-converting" instead of 1 and 0?
Up Next · Case Study 10
Analysing Employee Attrition Data
You are handed an HR dataset covering 12 employees. Which departments have the highest turnover? Does salary band predict who stays? And which combination of factors most strongly signals an employee is about to leave?