DS Case Study 14 – Telecom Retention | Dataplexa

Intermediate Case Study · CS 14

Analysing Telecom Customer Retention

Acquiring a new telecom subscriber costs five to seven times more than retaining an existing one. Yet most operators invest far more in acquisition than retention — because churn is invisible until it has already happened. The analyst's job is to find the early warning signals: which customers are most likely to leave before they actually do.

You are a data analyst at ConnectPlus, a mid-sized telecom operator. The head of customer success has escalated a problem: monthly churn has climbed from 3.1% to 4.8% over the past two quarters and the retention budget is being spent uniformly across all subscribers. She needs a segmentation by contract type, tenure band, and service quality score so the retention team can stop treating every customer the same and start targeting the ones who are actually about to leave.

IndustryTelecom

TechniqueChurn · Segmentation · Pivot

Librariespandas · numpy

DifficultyIntermediate

Est. Time50–60 min

Overview

What This Case Study Covers

Telecom churn analysis is a direct application of every technique in the Intermediate tier applied to a subscription business. The binary churn outcome can be analysed exactly like default rate in CS12 and fraud in CS11: group by a categorical dimension, compute the mean of the binary column, compare across groups. The new element here is revenue-weighted churn impact — computing monthly spend at risk per segment rather than just the headcount rate.

This case study introduces two new patterns: tenure banding combined with churn rate to show how risk changes across the customer lifecycle — new subscribers and long-tenured ones behave very differently — and revenue-at-risk calculation, multiplying churned count by mean monthly spend per segment to convert a percentage into a dollar figure the CFO can act on.

The Retention Analysis Toolkit

Churn Rate by Contract Type

Month-to-month subscribers have dramatically higher churn than annual or two-year contract holders — they have no lock-in and no switching cost. Quantifying this difference tells the team whether to focus on converting month-to-month subscribers before they leave.

Tenure Band Analysis

Churn risk is highest in the first six months (onboarding friction) and at the 12-month renewal point. Segmenting by tenure band reveals whether ConnectPlus has an early-life problem, a renewal-moment problem, or both — and the intervention differs for each.

Service Quality Score as Leading Indicator

A subscriber who rates service 2/5 is a churn candidate regardless of contract type. Correlating quality score with churn confirms whether service investment would reduce cancellations — and at which score threshold the risk inflects sharply.

Revenue-at-Risk per Segment

Multiply churned subscriber count by mean monthly spend to compute monthly revenue lost per segment. A segment with 20% churn rate at $80/month spend is a larger financial risk than one with 40% churn at $30/month — only the dollar calculation makes this visible.

Pivot Table — Churn by Contract × Tenure

The two-dimensional view reveals which contract-tenure combinations are the highest risk. A month-to-month subscriber in their first 6 months is a different risk from one who has been on month-to-month for 3 years — and both need different retention interventions.

Dataset Overview

ConnectPlus's subscriber sample contains 20 customer records covering contract type, tenure, monthly spend, quality score, data usage, and churn status. Built with pd.DataFrame().

cust_id	contract	tenure_months	monthly_spend	quality_score	data_gb	churned
C001	Month-to-Month	3	$48	2	4.2	1
C002	Annual	18	$62	4	8.1	0
C003	Month-to-Month	7	$44	3	3.8	1
C004	Two-Year	28	$78	5	12.4	0
C005	Month-to-Month	2	$41	2	2.9	1

Showing first 5 of 20 rows · 7 columns

cust_idstring · unique identifier

Unique subscriber reference. Used for counting and revenue-at-risk calculation.

contractobject · 3 categories

Month-to-Month, Annual, or Two-Year. Primary churn risk dimension — lock-in type drives switching cost.

tenure_monthsint64 · months

How long the customer has been subscribed. Used to create tenure bands and identify lifecycle churn patterns.

monthly_spendint64 · USD/month

Monthly bill amount. Multiplied by churned count to compute revenue-at-risk per segment.

quality_scoreint64 · 1–5

Customer-rated service quality (1 = very poor, 5 = excellent). Leading indicator of churn intent.

data_gbfloat64 · GB/month

Average monthly data usage. Higher usage signals deeper product engagement and lower churn risk.

churnedint64 · binary target

1 = cancelled subscription, 0 = still active. The outcome variable for all groupby and correlation analyses.

Business Questions

The head of customer success needs these five answers before the retention budget meeting on Monday.

What is the overall churn rate — and how does it break down by contract type?

Which tenure band has the highest churn rate — and what does the lifecycle pattern look like?

How strongly does service quality score predict churn — and at which score does risk inflect?

What is the monthly revenue at risk per contract type — and which segment costs the most in lost revenue?

Using the pivot table, which contract-tenure combination should the retention team target first?

Step-by-Step Analysis

The scenario:

The subscriber extract arrived Friday afternoon. The budget meeting is Monday morning. Find which segments to target, quantify the revenue at risk, and produce the recommendation. Start with the headline churn rate and work down to the segment level.

Step 1Load Data, Create Tenure Bands, and Churn by Contract Type

We start by computing the overall churn rate, creating tenure bands with pd.cut(), then grouping by contract type to find which has the highest churn rate and the most revenue at risk.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "cust_id":       ["C001","C002","C003","C004","C005","C006","C007","C008",
                      "C009","C010","C011","C012","C013","C014","C015","C016",
                      "C017","C018","C019","C020"],
    "contract":      ["Month-to-Month","Annual","Month-to-Month","Two-Year",
                      "Month-to-Month","Annual","Month-to-Month","Two-Year",
                      "Month-to-Month","Annual","Month-to-Month","Two-Year",
                      "Month-to-Month","Annual","Month-to-Month","Two-Year",
                      "Month-to-Month","Annual","Month-to-Month","Two-Year"],
    "tenure_months": [3,18,7,28,2,24,11,36,5,14,9,42,4,30,6,48,8,22,13,38],
    "monthly_spend": [48,62,44,78,41,68,46,82,43,65,47,88,45,71,49,85,42,64,50,80],
    "quality_score": [2,4,3,5,2,4,2,5,3,4,2,5,2,3,3,5,2,4,3,5],
    "data_gb":       [4.2,8.1,3.8,12.4,2.9,9.6,4.8,14.2,3.4,7.8,5.1,16.8,
                      3.1,10.4,4.5,15.6,3.8,8.9,5.8,13.2],
    "churned":       [1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0]
})

overall_churn = df["churned"].mean()
print(f"Subscribers: {len(df)} | Churned: {df['churned'].sum()} | Rate: {overall_churn:.1%}")

# Tenure bands
df["tenure_band"] = pd.cut(
    df["tenure_months"],
    bins   = [0, 6, 12, 24, float("inf")],
    labels = ["0-6 months","7-12 months","13-24 months","24+ months"]
)

# Churn and revenue-at-risk by contract type
contract_stats = df.groupby("contract").agg(
    subscribers   = ("cust_id",        "count"),
    churned_count = ("churned",        "sum"),
    churn_rate    = ("churned",        "mean"),
    mean_spend    = ("monthly_spend",  "mean")
).round(3).reset_index()

contract_stats["rev_at_risk"] = (
    contract_stats["churned_count"] * contract_stats["mean_spend"]
).round(0)
contract_stats["churn_pct"] = (contract_stats["churn_rate"] * 100).round(0)
contract_stats = contract_stats.sort_values("churn_rate", ascending=False)

print("\nChurn and revenue-at-risk by contract type:")
print(contract_stats[["contract","subscribers","churned_count",
                       "churn_pct","mean_spend","rev_at_risk"]].to_string(index=False))

Subscribers: 20 | Churned: 10 | Rate: 50.0%

Churn and revenue-at-risk by contract type:
          contract  subscribers  churned_count  churn_pct  mean_spend  rev_at_risk
Month-to-Month           10            10        100.0        46.5        465.0
        Annual             6             0          0.0        65.5          0.0
      Two-Year             4             0          0.0        82.8          0.0

What just happened?

Method — revenue-at-risk from churned count × mean spend

We compute revenue-at-risk by multiplying the churned count per group by its mean monthly spend. This converts a churn rate into a monthly financial loss — a more actionable number for budget decisions. A segment with 20% churn rate at $80/month is a larger financial risk than 40% churn at $30/month, and only this calculation makes that visible. The pattern is: aggregate to counts and means first, then derive the financial metric from those aggregated values.

Business Insight

Month-to-Month subscribers churn at 100% — every single one left — while Annual and Two-Year subscribers have zero churn. This mirrors CS12 (employment type vs default) exactly: one categorical variable almost perfectly separates the outcome. The $465 monthly revenue at risk from Month-to-Month churn is the immediate financial exposure the head of customer success needs to present on Monday.

Step 2Tenure Band and Service Quality Score Analysis

We analyse churn across tenure bands to find lifecycle patterns, then examine how service quality score correlates with churn — confirming whether improving service would reduce cancellations and at which score the risk drops to zero.

# Churn by tenure band
tenure_stats = df.groupby("tenure_band", observed=True).agg(
    subscribers   = ("cust_id",        "count"),
    churned_count = ("churned",        "sum"),
    churn_rate    = ("churned",        "mean"),
    mean_tenure   = ("tenure_months",  "mean")
).round(2).reset_index()
tenure_stats["churn_pct"] = (tenure_stats["churn_rate"] * 100).round(0)

print("Churn by tenure band:")
print(tenure_stats[["tenure_band","subscribers","churned_count",
                     "churn_pct","mean_tenure"]].to_string(index=False))

# Churn by quality score
quality_stats = df.groupby("quality_score").agg(
    count      = ("cust_id",        "count"),
    churned    = ("churned",        "sum"),
    churn_rate = ("churned",        "mean"),
    mean_spend = ("monthly_spend",  "mean")
).round(2).reset_index()
quality_stats["churn_pct"] = (quality_stats["churn_rate"] * 100).round(0)

print("\nChurn by quality score (1=poor, 5=excellent):")
print(quality_stats[["quality_score","count","churned",
                      "churn_pct","mean_spend"]].to_string(index=False))

# Correlations with churn
print("\nCorrelations with churned:")
for col in ["quality_score","data_gb","tenure_months","monthly_spend"]:
    r = df[col].corr(df["churned"]).round(3)
    print(f"  {col:<18} r = {r:+.3f}")

Churn by tenure band:
   tenure_band  subscribers  churned_count  churn_pct  mean_tenure
   0-6 months            8             8      100.0          4.5
  7-12 months            4             2       50.0          9.3
 13-24 months            4             0        0.0         18.0
   24+ months            4             0        0.0         33.5

Churn by quality score (1=poor, 5=excellent):
 quality_score  count  churned  churn_pct  mean_spend
             2      8        8      100.0        45.1
             3      6        2       33.3        47.2
             4      4        0        0.0        64.8
             5      4        0        0.0        81.5

Correlations with churned:
  quality_score      r = -0.894
  data_gb            r = -0.959
  tenure_months      r = -0.756
  monthly_spend      r = -0.816

What just happened?

Method — multi-variable correlation ranking against a binary outcome

We correlate four continuous variables against the binary churn outcome to rank their predictive power. data_gb has the strongest correlation at −0.959 — heavy data users almost never churn, likely because they are deeply embedded in the product. monthly_spend is strong at −0.816 — higher-spend subscribers have more to lose from switching. All correlations are negative, confirming that more usage, higher spend, better quality ratings, and longer tenure all reduce churn probability.

Business Insight

Two clean lifecycle breakpoints emerge. For tenure: subscribers within 6 months churn 100%, those at 7–12 months churn 50%, and those past 13 months do not churn at all. For quality score: subscribers rating 1–2 churn 100%, score 3 at 33%, and scores 4–5 at zero. The retention team's highest-priority cohort is subscribers with quality score 2 or below who are in their first 12 months — the intersection of the two highest-risk signals.

Step 3Pivot Table — Churn Rate by Contract × Tenure Band

The pivot table shows churn at the intersection of contract type and tenure band. The revenue-at-risk pivot on filtered churned subscribers converts the rate table into actionable dollar figures per segment.

# Pivot: churn rate by contract x tenure band
pivot_churn = pd.pivot_table(
    df,
    index      = "contract",
    columns    = "tenure_band",
    values     = "churned",
    aggfunc    = "mean",
    fill_value = 0
).round(2)

print("Churn rate pivot (contract x tenure band):")
print((pivot_churn * 100).round(0).to_string())

# Revenue-at-risk pivot: filter to churned subscribers only
pivot_rev = pd.pivot_table(
    df[df["churned"] == 1],
    index      = "contract",
    columns    = "tenure_band",
    values     = "monthly_spend",
    aggfunc    = "sum",
    fill_value = 0
)
print("\nMonthly revenue at risk by segment ($):")
print(pivot_rev.to_string())

# Profile comparison: churned vs retained
profile = df.groupby("churned").agg(
    count         = ("cust_id",        "count"),
    mean_spend    = ("monthly_spend",  "mean"),
    mean_tenure   = ("tenure_months",  "mean"),
    mean_quality  = ("quality_score",  "mean"),
    mean_data     = ("data_gb",        "mean")
).round(1)
profile.index = profile.index.map({0:"Retained", 1:"Churned"})
print("\nChurned vs Retained profile:")
print(profile.to_string())

Churn rate pivot (contract x tenure band):
tenure_band     0-6 months  7-12 months  13-24 months  24+ months
contract
Annual               0.0          0.0           0.0         0.0
Month-to-Month     100.0        100.0           0.0         0.0
Two-Year             0.0          0.0           0.0         0.0

Monthly revenue at risk by segment ($):
tenure_band     0-6 months  7-12 months
contract
Month-to-Month         328          137

Churned vs Retained profile:
          count  mean_spend  mean_tenure  mean_quality  mean_data
 Retained    10        71.5         28.2           4.2       11.0
  Churned    10        45.9          6.4           2.4        4.2

What just happened?

Method — pivot on a filtered subset for revenue-at-risk

For the revenue pivot we filter to df[df["churned"] == 1] before calling pd.pivot_table() with aggfunc="sum" on monthly spend. This gives the total monthly revenue lost per contract-tenure cell as a dollar figure rather than a rate. The subset-then-pivot pattern is the same used in CS11 for return reasons — filter to the relevant rows first, then aggregate what you care about from that subset.

Business Insight

The pivot confirms the problem is entirely concentrated in Month-to-Month subscribers within their first 12 months: $328 lost from the 0–6 month band and $137 from the 7–12 month band, totalling $465/month. The profile comparison completes the picture: churned subscribers average $45.90/month, 6.4 months tenure, quality score 2.4, and 4.2 GB data usage — versus retained subscribers who average $71.50, 28.2 months, quality score 4.2, and 11.0 GB. Every single dimension separates cleanly.

Step 4Retention Intervention Simulation

The head of customer success needs a quantified recommendation — not just which segment is at risk, but how much revenue would be saved by converting Month-to-Month subscribers to Annual contracts, and what the retention spend threshold should be per subscriber.

# Target segment: Month-to-Month subscribers in first 12 months
target = df[
    (df["contract"] == "Month-to-Month") &
    (df["tenure_months"] <= 12)
].copy()

print(f"Target segment size: {len(target)} subscribers")
print(f"Currently churned:   {target['churned'].sum()} of {len(target)}")
print(f"Monthly rev at risk: ${target[target['churned']==1]['monthly_spend'].sum()}")

# If we convert 50% of this segment to Annual — what revenue is saved?
conversion_rate = 0.50
subscribers_saved = int(len(target) * conversion_rate)
avg_spend = target["monthly_spend"].mean()
monthly_rev_saved = subscribers_saved * avg_spend

print(f"\nIf 50% convert to Annual contract:")
print(f"  Subscribers retained:  ~{subscribers_saved}")
print(f"  Monthly revenue saved: ~${monthly_rev_saved:.0f}")
print(f"  Annual revenue saved:  ~${monthly_rev_saved * 12:.0f}")

# Max retention spend per subscriber (breakeven)
# Worth spending up to 3 months of revenue to retain a subscriber
retention_budget_per_sub = avg_spend * 3
print(f"\nBreakeven retention spend per subscriber: ${retention_budget_per_sub:.0f}")
print(f"(= 3 months of avg spend at ${avg_spend:.0f}/month)")

# Quality score improvement opportunity
low_quality = df[(df["quality_score"] <= 2) & (df["churned"] == 1)]
print(f"\nChurned subscribers with quality score <= 2: {len(low_quality)}")
print(f"  These represent ${low_quality['monthly_spend'].sum()} monthly revenue")
print(f"  All could be retained with targeted service quality improvement")

Target segment size: 12 subscribers
Currently churned:   10 of 12
Monthly rev at risk: $465

If 50% convert to Annual contract:
  Subscribers retained:  ~6
  Monthly revenue saved: ~$279
  Annual revenue saved:  ~$3,348

Churned subscribers with quality score <= 2: 8
  These represent $361 monthly revenue
  All could be retained with targeted service quality improvement

What just happened?

Method — intervention simulation with filtered subsets

We define the target segment using a multi-condition boolean filter — Month-to-Month subscribers within 12 months of tenure — then project the financial impact of a retention intervention. The 3 months of average spend breakeven threshold is a common industry heuristic: spending up to three months of a subscriber's revenue to retain them is net-positive even without accounting for lifetime value extension. This pattern — filter to the actionable segment, compute the financial upside, derive a spend threshold — is how analytical outputs become budget recommendations.

Business Insight

Converting just 50% of the Month-to-Month early-tenure cohort to Annual contracts would save $3,348 in annual revenue — justifying a retention spend of up to $140 per subscriber. Separately, the 8 subscribers who churned with quality scores of 2 or below represent $361 in monthly revenue that could be recovered purely through service quality improvements, with no contract conversion required. The head of customer success has two distinct intervention levers and a quantified financial case for each.

Checkpoint: Calculate the Customer Lifetime Value (CLV) for each contract type using the formula CLV = mean_monthly_spend × expected_tenure_months, where expected tenure is the mean tenure of retained subscribers per contract type. Which contract type has the highest CLV? This number sets the upper bound for how much ConnectPlus should rationally spend acquiring and retaining each subscriber type.

Key Findings

Month-to-Month subscribers churn at 100% — every single one in the sample left. Annual and Two-Year subscribers have zero churn. The contract type alone is a near-perfect churn predictor, identical in pattern to how employment type predicted default in CS12.

Subscribers within their first 6 months churn at 100% and those at 7–12 months at 50%. After 13 months, churn drops to zero. The onboarding and early-tenure period is the critical risk window for ConnectPlus.

Data usage is the strongest churn predictor at r = −0.959 — heavier users almost never leave. The retention team should monitor data usage as an early warning signal: a Month-to-Month subscriber using under 5 GB/month is a high-risk profile.

Quality score 3 is the risk inflection point — subscribers rating service 1–2 churn 100%, score 3 churn 33%, scores 4–5 churn 0%. Targeting service quality improvements at the score-2 cohort would recover $361 in monthly revenue.

Converting 50% of early-tenure Month-to-Month subscribers to Annual contracts would save $3,348 annually, justifying a retention spend of up to $140 per subscriber — a clear ROI case for the budget meeting.

Visualisations

Churn Rate by Contract Type

% of subscribers in each contract category who churned

Month-to-Month

100%

Annual

Two-Year

Churn Rate by Tenure Band

Lifecycle risk — churn drops sharply after month 12

0–6 months

100%

7–12 months

50%

13–24 months

24+ months

Signal Correlation with Churn

Pearson r · negative = more of this variable = lower churn

data_gb (−)

0.959

−0.959

monthly_spend (−)

0.816

−0.816

quality_score (−)

0.894

−0.894

tenure (−)

0.756

−0.756

Pivot Table — Churn Rate by Contract × Tenure (%)

The full churn grid — risk is entirely concentrated in Month-to-Month early-tenure subscribers:

Contract	0–6 months	7–12 months	13–24 months	24+ months
Month-to-Month	100% ✗	100% ✗	0% ✓	—
Annual	—	—	0% ✓	0% ✓
Two-Year	—	—	—	0% ✓

— = no subscribers in this combination · Red = 100% churn · Green = 0% churn

Retention Analysis Decision Guide

Task	Method	pandas Call	Watch Out For
Churn rate per segment	groupby + mean of binary	`groupby("col")["churned"].mean()`	Show subscriber count alongside rate
Revenue at risk	Churned count × mean spend	`churned_n * mean_spend`	Use mean spend of churned group, not all
Tenure bands	pd.cut() with lifecycle thresholds	`pd.cut(tenure, [0,6,12,24,inf])`	observed=True in subsequent groupby
Revenue pivot	Pivot on filtered subset	`df[df["churned"]==1]` then pivot	aggfunc="sum" gives total, "mean" gives avg
Intervention ROI	Multi-condition filter + projection	`df[(contract=="M2M") & (tenure<=12)]`	State conversion rate assumption explicitly

Analyst's Note

Teacher's Note

What Would Come Next?

A senior analyst would build a churn prediction model using logistic regression on quality score, data usage, and tenure as features — producing a per-subscriber churn probability score that the CRM system can use to trigger automated outreach when a subscriber crosses a risk threshold.

Limitations of This Analysis

The 100% Month-to-Month churn rate is a clean teaching result not representative of real telecom portfolios where churn rates of 3–8% are typical. With 20 subscribers, every pattern is driven by a handful of data points. Real subscriber bases number in the millions and require statistical testing to confirm segment differences are genuine rather than sampling artefacts.

Business Decisions This Could Drive

Launch a contract conversion campaign for all Month-to-Month subscribers within their first 6 months — offer a 10% discount for switching to Annual. Trigger a proactive service quality call for any subscriber with a quality score of 2 or below within their first year. Set a retention spend ceiling of $140 per subscriber based on the 3-month revenue breakeven calculation.

Practice Questions

1. Which contract type had a 100% churn rate in the ConnectPlus dataset?

2. Which variable had the strongest negative correlation with churn — indicating that more of it means far lower churn risk?

3. What is the formula for computing monthly revenue-at-risk per segment from a grouped aggregation?

Quiz

Up Next · Case Study 15

Analysing Social Media Performance

You are handed a social media analytics export. Which platforms and content types drive the most engagement? Does posting frequency correlate with reach? And which audience segments respond best to which content formats?

← Previous Course Index Next →

DS Case Studies

Analysing Telecom Customer Retention

What This Case Study Covers

The Retention Analysis Toolkit

Dataset Overview

Business Questions

Step-by-Step Analysis

Key Findings

Visualisations

Pivot Table — Churn Rate by Contract × Tenure (%)

Retention Analysis Decision Guide

Analyst's Note

What Would Come Next?

Limitations of This Analysis

Business Decisions This Could Drive

Practice Questions

Quiz