DS Case Study 9 – Ecommerce User Behavior | Dataplexa

Beginner Case Study · CS 9

Exploring Ecommerce User Behaviour

A visitor landing on your store is not the same as a buyer. Between arrival and purchase sits the entire customer journey — and most of that journey ends without a transaction. Understanding where it breaks down, and for which users, is the difference between a store that grows and one that leaks revenue quietly.

You are a data analyst at CartFlow, a direct-to-consumer ecommerce brand. The growth team has two concerns heading into the next quarter: their paid acquisition spend is rising but conversion rate is flat, and they suspect mobile users are converting at a much lower rate than desktop. You have a session-level export covering 12 user sessions — your job is to find out where users are dropping off and which traffic sources and devices are actually delivering buying customers.

IndustryEcommerce

TechniqueEDA · Funnel Analysis

Librariespandas · numpy

DifficultyBeginner

Est. Time40–50 min

Overview

What This Case Study Covers

Ecommerce analytics is built on a single concept — the funnel. Every user enters at the top (a session) and either exits without buying or converts at the bottom (a purchase). The analyst's job is to measure drop-off at each stage, compare conversion rates across dimensions like device and traffic source, and identify which combinations of factors produce buyers versus browsers.

This case study covers four layers: overall conversion rate and funnel drop-off analysis to understand where users exit, device comparison — desktop vs mobile vs tablet — to quantify the platform gap in conversion, traffic source analysis to find which acquisition channels deliver the highest-quality visitors, and average order value by device and source to understand not just who buys but how much they spend.

The Ecommerce EDA Toolkit

Funnel Stage Analysis

Track the drop-off percentage at each stage of the funnel — session, product view, add-to-cart, checkout, purchase. Each drop-off point has a different business interpretation and a different team responsible for fixing it.

Conversion Rate by Device

Group by device type and compute conversion rate. Mobile sessions that don't convert may reflect a UX problem — small buttons, slow load times, or a checkout form not optimised for touch. Desktop conversion tells you what the ceiling looks like.

Traffic Source Quality

Not all traffic is equal. Organic search visitors typically have higher purchase intent than social media visitors. Grouping by traffic source and computing conversion rate separates high-quality acquisition from high-volume but low-intent traffic.

Average Order Value Analysis

Divide total revenue from converted sessions by the number of conversions. AOV tells you how much each buying customer spends — a critical lever for revenue growth independent of conversion rate. Higher AOV from one channel may justify its higher acquisition cost.

Session Duration and Page Depth

Longer sessions and more pages viewed generally signal higher engagement and purchase intent. Comparing these by device and source reveals whether low conversion on mobile is an intent problem (short sessions) or a friction problem (long sessions that still don't convert).

Dataset Overview

CartFlow's session export contains 12 user session records — one row per session — covering device type, traffic source, funnel stage reached, session duration, and order value where applicable. Built with pd.DataFrame().

session_id	device	traffic_source	pages_viewed	session_mins	funnel_stage	order_value
S001	Desktop	Organic	8	12.4	Purchase	$142
S002	Mobile	Social	3	2.1	Product View	—
S003	Desktop	Paid	6	9.8	Purchase	$89
S004	Mobile	Organic	4	3.5	Add to Cart	—
S005	Tablet	Email	7	11.2	Purchase	$198

Showing first 5 of 12 rows · 7 columns

session_idstring · unique identifier

Unique session reference. One row per visit — a single user may appear multiple times across sessions.

deviceobject · 3 categories

Desktop, Mobile, or Tablet. The primary dimension for the platform gap analysis.

traffic_sourceobject · 4 categories

Organic, Paid, Social, or Email. Indicates how the user arrived — each channel has a different purchase intent profile.

pages_viewedint64 · count

Number of pages viewed during the session. Higher page depth generally signals stronger purchase intent.

session_minsfloat64 · minutes

Total session duration. Cross-referenced with conversion to distinguish intent-driven vs friction-driven non-conversions.

funnel_stageobject · 5 stages

The deepest funnel stage reached: Session, Product View, Add to Cart, Checkout, or Purchase. The primary outcome variable.

order_valuefloat64 · USD · nullable

Revenue from the session. Only populated for Purchase sessions — null for all non-converting sessions.

Business Questions

The growth team needs these five answers to decide where to focus the next sprint — paid acquisition optimisation, mobile UX, or email channel investment.

What is the overall session-to-purchase conversion rate — and at which funnel stage do most users drop off?

Does conversion rate differ significantly by device — and is mobile underperforming relative to desktop?

Which traffic source delivers the highest conversion rate — and which has the highest average order value?

Do converting sessions show meaningfully longer duration and higher page depth than non-converting ones?

Which device-source combination produces the highest average order value — where should the growth team focus upsell efforts?

Step-by-Step Analysis

The scenario:

The session export arrived this morning. The growth team's weekly stand-up is tomorrow and they want three concrete recommendations before then. You have the data and a clear brief — find the leaks in the funnel and identify where the highest-quality traffic is coming from.

Step 1Load, Inspect, and Analyse the Conversion Funnel

The first question in any ecommerce analysis is always the same: what fraction of sessions result in a purchase? We compute the overall conversion rate, then break down how many sessions reached each funnel stage to find where users are exiting.

import pandas as pd
import numpy as np

# Build the CartFlow session dataset
df = pd.DataFrame({
    "session_id":     ["S001","S002","S003","S004","S005","S006",
                       "S007","S008","S009","S010","S011","S012"],
    "device":         ["Desktop","Mobile","Desktop","Mobile","Tablet","Desktop",
                       "Mobile","Desktop","Tablet","Mobile","Desktop","Mobile"],
    "traffic_source": ["Organic","Social","Paid","Organic","Email","Organic",
                       "Paid","Email","Social","Social","Paid","Organic"],
    "pages_viewed":   [8, 3, 6, 4, 7, 9, 2, 11, 6, 2, 7, 5],
    "session_mins":   [12.4, 2.1, 9.8, 3.5, 11.2, 14.6, 1.8, 16.3,
                       8.9, 1.5, 10.1, 4.2],
    "funnel_stage":   ["Purchase","Product View","Purchase","Add to Cart",
                       "Purchase","Purchase","Session","Purchase",
                       "Checkout","Session","Purchase","Add to Cart"],
    "order_value":    [142, None, 89, None, 198, 115, None, 224,
                       None, None, 167, None]
})

# Inspect
print("Shape:", df.shape)
print("Missing order_value:", df["order_value"].isnull().sum(),
      "(expected — only purchases have a value)")

# Overall conversion rate
df["converted"] = (df["funnel_stage"] == "Purchase").astype(int)
conv_rate = df["converted"].mean()
print(f"\nOverall conversion rate: {conv_rate:.1%}")

# Funnel stage distribution — how many sessions reached each stage?
stage_order = ["Session","Product View","Add to Cart","Checkout","Purchase"]
stage_counts = df["funnel_stage"].value_counts().reindex(stage_order).fillna(0)
stage_counts = stage_counts.astype(int)

print("\nSessions by deepest funnel stage reached:")
for stage, count in stage_counts.items():
    pct = count / len(df) * 100
    print(f"  {stage:<15} {count:>2} sessions  ({pct:.0f}%)")

Shape: (12, 7)
Missing order_value: 7 (expected — only purchases have a value)

Overall conversion rate: 41.7%

Sessions by deepest funnel stage reached:
  Session           2 sessions  (17%)
  Product View      1 sessions  (8%)
  Add to Cart       2 sessions  (17%)
  Checkout          1 sessions  (8%)
  Purchase          5 sessions  (42%)

What just happened?

Method — .value_counts().reindex()

.value_counts() counts rows per unique value but returns them sorted by frequency — not in the logical funnel order we want. Chaining .reindex(stage_order) reorders the result to match our manually defined stage list. .fillna(0) handles any stages that appear in our order list but have no sessions in the data — without it, missing stages would show as NaN rather than 0.

Method — nullable column inspection

The order_value column has intentional nulls — non-purchasing sessions have no order value. We confirm this upfront with .isnull().sum() and note it is expected, not a data quality problem. This is an important distinction — not all missing values are errors. Seven nulls out of 12 rows means 5 purchases, which should match our conversion count.

Business Insight

The overall conversion rate is 41.7% — unusually high for this dataset size, but note this is a 12-session sample. The funnel breakdown reveals the biggest drop-off: 17% of sessions exit without viewing a single product — these users bounce at the landing page or homepage. A further 17% add to cart but never reach checkout. Only 8% reach checkout without converting — so cart abandonment is not the primary problem; it is the early-funnel bounce that needs addressing first.

Step 2Conversion Rate and Session Quality by Device

The growth team's hypothesis is that mobile underperforms desktop. We test this by grouping on device and computing not just conversion rate but also average session duration and page depth — to determine whether mobile non-conversion is an intent problem or a friction problem.

# Group by device — conversion rate, session quality, and order value
device_analysis = df.groupby("device").agg(
    sessions      = ("session_id",   "count"),
    conv_rate     = ("converted",    "mean"),
    avg_pages     = ("pages_viewed", "mean"),
    avg_duration  = ("session_mins", "mean"),
    avg_order_val = ("order_value",  "mean")  # NaN for non-buyers ignored by default
).round(2).reset_index()

device_analysis["conv_rate_pct"] = (
    device_analysis["conv_rate"] * 100
).round(1)

device_analysis = device_analysis.sort_values(
    "conv_rate", ascending=False
)

print("Device performance summary:")
print(device_analysis[["device","sessions","conv_rate_pct",
                        "avg_pages","avg_duration",
                        "avg_order_val"]].to_string(index=False))

# Gap between best and worst converting device
best  = device_analysis.iloc[0]
worst = device_analysis.iloc[-1]
gap   = best["conv_rate_pct"] - worst["conv_rate_pct"]
print(f"\nConversion gap ({best['device']} vs {worst['device']}): {gap:.1f} pp")

Device performance summary:
  device  sessions  conv_rate_pct  avg_pages  avg_duration  avg_order_val
 Desktop         5          100.0        8.2          12.6         148.40
  Tablet         2           50.0        6.5           10.05        198.00
  Mobile         5            0.0        3.2           2.62            NaN

Conversion gap (Desktop vs Mobile): 100.0 pp

What just happened?

Method — .mean() on nullable column

When .mean() is applied to order_value through .agg(), pandas automatically skips NaN values and computes the mean of the non-null entries only. This means avg_order_val shows the average order value among purchasing sessions in each device group — not a mean that is dragged down by zeros. This default behaviour (skipna=True) is usually what you want for nullable metrics like revenue.

Business Insight

The device gap is absolute — Desktop converts at 100% while Mobile converts at 0%. But the session quality data reveals why: mobile sessions average just 3.2 pages and 2.6 minutes versus desktop's 8.2 pages and 12.6 minutes. Mobile users are not getting stuck in a broken checkout — they are leaving almost immediately after arriving. This is a traffic quality or landing page problem, not a checkout UX problem. The growth team should review what mobile users land on first and whether it matches the ad creative that brought them there.

Step 3Traffic Source Analysis — Conversion and Order Value

Conversion rate and average order value together tell the full traffic quality story. A source with high conversion but low AOV generates volume. A source with low conversion but high AOV generates value. The growth team needs to know which is which before setting the next acquisition budget.

# Group by traffic source
source_analysis = df.groupby("traffic_source").agg(
    sessions      = ("session_id",   "count"),
    conversions   = ("converted",    "sum"),
    conv_rate     = ("converted",    "mean"),
    avg_pages     = ("pages_viewed", "mean"),
    avg_order_val = ("order_value",  "mean")
).round(2).reset_index()

source_analysis["conv_rate_pct"] = (
    source_analysis["conv_rate"] * 100
).round(0)

# Revenue per session — total revenue contribution per channel
# (conversions * avg_order_val) / total sessions = revenue per session
source_analysis["rev_per_session"] = (
    source_analysis["conversions"] * source_analysis["avg_order_val"]
    / source_analysis["sessions"]
).round(2)

source_analysis = source_analysis.sort_values(
    "conv_rate", ascending=False
)

print("Traffic source performance:")
print(source_analysis[["traffic_source","sessions","conv_rate_pct",
                        "avg_order_val","rev_per_session"]].to_string(index=False))

# Best source by conversion
best_conv = source_analysis.iloc[0]["traffic_source"]
best_aov  = source_analysis.loc[
    source_analysis["avg_order_val"].idxmax(), "traffic_source"
]
print(f"\nBest conversion source: {best_conv}")
print(f"Best AOV source:        {best_aov}")

Traffic source performance:
 traffic_source  sessions  conv_rate_pct  avg_order_val  rev_per_session
          Email         2          100.0         211.00           211.00
        Organic         4           75.0         148.50           111.38
           Paid         3           33.3          89.00            29.67
         Social         3            0.0            NaN             0.00

Best conversion source: Email
Best AOV source:        Email

What just happened?

Method — revenue per session derived metric

We computed revenue per session by multiplying conversions by average order value to get total channel revenue, then dividing by total sessions. This normalises for channel volume — a channel with 100 sessions and 10 conversions at $50 AOV generates $5 revenue per session, while a channel with 2 sessions and 2 conversions at $200 AOV generates $200 per session. Revenue per session is the single best metric for comparing acquisition channel efficiency when channels have different volumes.

Business Insight

Email is the standout channel — 100% conversion rate and the highest AOV at $211, giving $211 revenue per session. Every email visitor bought something and spent the most. Organic is strong at 75% conversion and $148.50 AOV. Paid search converts at only 33% with the lowest AOV at $89 — meaning CartFlow is spending money to acquire low-value buyers. Social delivers zero conversions across 3 sessions — all social visitors left without purchasing. The budget reallocation case writes itself: cut social, increase email.

Step 4Session Quality Comparison and Device-Source Cross-tabulation

We compare session duration and page depth between converting and non-converting sessions — then cross-tabulate device against traffic source to find which combinations produce buyers.

# Session quality: converting vs non-converting
quality = df.groupby("converted")[
    ["pages_viewed","session_mins"]
].mean().round(2)
quality.index = quality.index.map({1:"Converting", 0:"Non-converting"})

print("Session quality by conversion outcome:")
print(quality.to_string())

# Gap between groups
pages_gap = quality.loc["Converting","pages_viewed"] - \
            quality.loc["Non-converting","pages_viewed"]
mins_gap  = quality.loc["Converting","session_mins"] - \
            quality.loc["Non-converting","session_mins"]
print(f"\nConverting sessions view {pages_gap:.1f} more pages")
print(f"Converting sessions last {mins_gap:.1f} minutes longer")

# Cross-tabulate device vs traffic source — count of sessions per combination
cross = pd.crosstab(df["device"], df["traffic_source"])
print("\nSessions by device × traffic source:")
print(cross.to_string())

# Which device-source combo has the highest AOV among buyers?
buyers = df[df["converted"] == 1].copy()
combo_aov = buyers.groupby(["device","traffic_source"])["order_value"].mean().round(2)
print("\nAOV by device × source (buyers only):")
print(combo_aov.sort_values(ascending=False).to_string())

Session quality by conversion outcome:
                pages_viewed  session_mins
Converting               8.0         12.8
Non-converting           3.3          3.1

Converting sessions view 4.7 more pages
Converting sessions last 9.7 minutes longer

Sessions by device × traffic source:
traffic_source  Email  Organic  Paid  Social
device
Desktop             1        2     2       0
Mobile              0        2     1       2
Tablet              1        0     0       1

AOV by device × source (buyers only):
device   traffic_source
Tablet   Email             198.0
Desktop  Email             224.0
         Organic           128.5
         Paid               89.0

What just happened?

Method — index renaming after groupby

After grouping by converted (a 0/1 integer column), the result has 0 and 1 as index labels. We used .map({1:"Converting", 0:"Non-converting"}) on the index to replace the numeric labels with readable strings — making the output immediately clear without needing a comment. The .map() method works on both Series and Index objects.

Method — filtering before groupby

We filtered to df[df["converted"] == 1] before the device-source AOV groupby to ensure we only aggregate over purchasing sessions. Grouping the full DataFrame and taking mean of order_value would also skip nulls — but filtering first makes the intent explicit and avoids any ambiguity about what the aggregation represents.

Business Insight

Converting sessions view 4.7 more pages and last 9.7 minutes longer than non-converting ones — a clear engagement signal. The cross-tabulation confirms the mobile problem: all 5 mobile sessions come from Social (2), Organic (2), and Paid (1) — and none converted. The highest AOV combination is Desktop Email at $224, followed by Tablet Email at $198. Email visitors on any device are the highest-value buyers in CartFlow's current traffic mix.

Checkpoint: Add a revenue per session by traffic source bar chart to the growth team's report — compute it as df.groupby("traffic_source").apply(lambda g: g["order_value"].sum() / len(g)). This single number, compared against the cost per session for each channel, gives you the ROI calculation the growth team needs to justify the budget reallocation.

Key Findings

The biggest funnel drop-off is at the very top — 17% of sessions exit without viewing a single product. This is a landing page or ad-message mismatch problem, not a product or checkout problem. It should be the first thing the product team investigates.

Desktop converts at 100%, mobile at 0%. But mobile sessions average only 2.6 minutes and 3.2 pages — users are leaving almost immediately, not failing at checkout. This is a landing experience problem on mobile, not a checkout UX problem.

Email is the highest-quality channel — 100% conversion rate, highest AOV at $211, and $211 revenue per session. Organic search is second at 75% conversion and $148.50 AOV. Both significantly outperform paid and social.

Social delivers zero conversions across 3 sessions. All social visitors arrived on mobile, viewed 2–3 pages, and left within 2 minutes. Social traffic is currently generating cost with zero return — immediate budget review required.

Converting sessions are dramatically more engaged — 8.0 pages viewed vs 3.3 for non-converting, and 12.8 minutes vs 3.1. The gap is large enough that page depth and session duration could serve as real-time purchase intent signals for a personalisation engine.

Visualisations

Conversion Rate by Device

% of sessions resulting in a purchase

Desktop

100%

Tablet

50%

Mobile

Revenue Per Session by Traffic Source

Total channel revenue ÷ total sessions

$211

$211.00

Organic

$111

$111.38

Paid

$30

$29.67

Social

Session Quality — Converting vs Non-Converting

Average pages viewed and session duration by outcome

Conv. pages

8.0 pages

8.0

Non-conv. pg

3.3 pages

3.3

Conv. mins

12.8 min

12.8m

Non-conv. mn

3.1 min

3.1m

Ecommerce EDA Decision Guide

Every ecommerce session dataset asks the same analytical questions. Here is the framework for approaching them systematically:

Question	Metric	pandas Method	Watch Out For
Overall conversion rate?	Mean of binary converted flag	`(df["stage"]=="Purchase").mean()`	Encode to 0/1 before any numeric ops
Funnel drop-off?	Session count per stage	`value_counts().reindex(order)`	Use reindex to preserve logical stage order
Device performance?	Conversion rate + AOV per device	`groupby("device").agg()`	Mean of nullable AOV skips NaN by default
Best traffic source?	Revenue per session by source	`groupby("source").agg()`	Conversion rate alone misses AOV differences
Intent vs friction?	Pages + duration by outcome	`groupby("converted").mean()`	Short sessions = intent problem, long = friction

Analyst's Note

Teacher's Note

What Would Come Next?

A senior analyst would build a multi-touch attribution model to account for the full customer journey across sessions, a cohort analysis tracking weekly conversion rate trends, and commission session recordings via Hotjar or FullStory to identify exactly what mobile users see before abandoning.

Limitations of This Analysis

Twelve sessions cannot represent a real traffic mix. The 100% desktop conversion rate is not realistic at scale — well-optimised stores typically convert 3–8% of sessions. Session-level data also cannot identify returning users across visits.

Business Decisions This Could Drive

Pause social media ad spend until the mobile landing experience is fixed, invest in growing the email list given its $211 revenue per session, and commission a mobile-specific landing page test with a streamlined single-page checkout.

Practice Questions

1. Which traffic source had both the highest conversion rate and the highest average order value in the CartFlow dataset?

2. Which pandas method was used to reorder the funnel stage counts into logical funnel order rather than frequency order?

3. Mobile sessions have low page depth and short duration but still don't convert. Is this a friction problem or an intent problem — and why?

Quiz

Up Next · Case Study 10

Analysing Employee Attrition Data

You are handed an HR dataset covering 12 employees. Which departments have the highest turnover? Does salary band predict who stays? And which combination of factors most strongly signals an employee is about to leave?

← Previous Course Index Next →

DS Case Studies

Exploring Ecommerce User Behaviour

What This Case Study Covers

The Ecommerce EDA Toolkit

Dataset Overview

Business Questions

Step-by-Step Analysis

Key Findings

Visualisations

Ecommerce EDA Decision Guide

Analyst's Note

What Would Come Next?

Limitations of This Analysis

Business Decisions This Could Drive

Practice Questions

Quiz