DS Case Study 20 – HR Attrition Insights | Dataplexa

Intermediate Case Study · CS 20

Diagnosing HR Attrition Patterns

Replacing an employee costs between 50% and 200% of their annual salary once you account for recruiting fees, onboarding time, and lost productivity. Yet most organisations track attrition as a single headline number — and have no idea which departments, roles, or salary bands are driving it.

You are a data analyst at Meridian Group, a professional services firm. The Chief People Officer has escalated a concern: voluntary attrition has climbed from 11% to 18% over six months, and the board wants to know why. She needs a department-by-department breakdown of attrition rates, an analysis of whether salary band and tenure predict who leaves, and a replacement cost estimate by segment so she can build the business case for targeted retention investment. The findings are due before the board meeting on Friday.

IndustryHR / People Analytics

TechniqueGroupby · Cost Modelling · Pivot

Librariespandas · numpy

DifficultyIntermediate

Est. Time50–60 min

Overview

What This Case Study Covers

HR attrition analysis is the final case study in the Intermediate tier — and it brings together every technique from CS11 through CS19. Attrition rate is a binary column mean, computed exactly as churn in CS14, readmission in CS16, and on-time performance in CS18. Replacement cost is a financial metric derived from an aggregated salary figure multiplied by an industry multiplier — the same structure as CS16's ward cost model and CS18's delay cost model. The new element is retention ROI: computing the financial return on a proposed retention investment, giving the CPO a dollar-for-dollar business case for the board.

The capstone pattern introduced here: retention ROI calculation — estimating how much attrition reduction a retention programme would achieve, multiplying saved headcount by replacement cost, and comparing that saving to the programme's cost. This is the final translation from analytical output to boardroom business case, closing the loop on every cost-modelling pattern introduced across the Intermediate tier.

The People Analytics Toolkit

Attrition Rate by Department and Role

Not all departments lose people at the same rate. A 30% attrition rate in Sales is a very different problem from 30% in Engineering — replacement costs, skill scarcity, and ramp time differ enormously. Ranking departments by attrition rate tells the CPO where to concentrate retention investment.

Salary Band and Tenure Analysis

Employees in lower salary bands often have more mobility and less switching cost. Those in their first two years are most likely to leave before becoming fully productive. Banding salary and tenure reveals whether attrition is a compensation problem, an onboarding problem, or both.

Replacement Cost Modelling

Multiplying mean salary by an industry replacement cost multiplier converts an attrition rate into a financial exposure. A department losing 5 employees at $80,000 salary with a 75% replacement cost multiplier is a $300,000 annual problem — and that number commands board attention.

Retention ROI

Projecting the saving from a 5–10 percentage point attrition reduction and comparing it to the cost of a retention programme gives the CPO a return-on-investment figure. This converts the analysis from a descriptive report into a budget justification — the most valuable output any HR analyst can deliver.

Pivot — Attrition Rate by Department × Salary Band

The two-dimensional view reveals whether attrition is uniform across salary levels within each department or concentrated in specific pay grades. A department with 40% attrition in the lowest band but 5% in the highest needs a compensation-floor intervention, not a culture programme.

Dataset Overview

The Meridian Group HR extract contains 20 employee records covering department, role level, tenure, salary, performance rating, and attrition status. Built with pd.DataFrame().

emp_id	department	role_level	tenure_years	salary	perf_rating	salary_band	left
E001	Engineering	Senior	1.2	92000	3	High	1
E002	Sales	Junior	0.8	48000	2	Low	1
E003	Finance	Mid	4.5	68000	4	Mid	0
E004	Marketing	Senior	3.1	84000	5	High	0
E005	Engineering	Junior	0.6	62000	3	Mid	1

Showing first 5 of 20 rows · 8 columns

emp_idstring · unique identifier

Unique employee reference. Used for counting headcount and filtering to specific records.

departmentobject · 4 categories

Engineering, Sales, Finance, Marketing. Primary grouping dimension for attrition rate and replacement cost analysis.

role_levelobject · 3 categories

Junior, Mid, Senior. Used alongside department to identify whether attrition is concentrated in specific career stages.

tenure_yearsfloat64 · years

Years at Meridian Group. Banded to identify whether new joiners or long-tenured employees are more likely to leave.

salaryint64 · USD/year

Annual salary. Multiplied by replacement cost multiplier to compute financial exposure per departing employee.

perf_ratingint64 · 1–5

Performance rating (1 = low, 5 = high). Used to check whether high performers are disproportionately leaving — the most damaging attrition pattern.

salary_bandobject · 3 categories

Low, Mid, High. Pre-assigned salary band for groupby and pivot analysis across pay grades.

leftint64 · binary target

1 = voluntarily left in the period, 0 = still employed. Binary outcome — mean gives attrition rate per segment, identical to churn and readmission in earlier case studies.

Business Questions

The Chief People Officer needs these five answers before the board meeting on Friday.

Which departments have the highest attrition rates — and does the pattern differ by role level?

Do salary band and tenure predict attrition — and which combination is highest risk?

Are high performers leaving at a higher rate than low performers — and what does this cost in lost productivity?

What is the total estimated replacement cost by department — and which segment represents the highest financial risk?

What is the ROI of a targeted retention programme — and how much attrition reduction would be needed to break even?

Step-by-Step Analysis

The scenario:

The HR extract landed Tuesday afternoon. The board meeting is Friday. The CPO needs the headline attrition rate, which departments are worst, whether it's a compensation or tenure problem, and a financial figure she can put to the board to justify a retention budget. Start with the portfolio view and work down.

Step 1Load Data, Engineer Tenure Bands, and Department Attrition

We load the employee records, create tenure bands with pd.cut(), compute the headline attrition rate, then rank departments by attrition rate and mean salary. This gives the CPO the two numbers that matter most: who is leaving, and what does replacing them cost.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "emp_id":      ["E001","E002","E003","E004","E005","E006","E007","E008",
                    "E009","E010","E011","E012","E013","E014","E015","E016",
                    "E017","E018","E019","E020"],
    "department":  ["Engineering","Sales","Finance","Marketing","Engineering",
                    "Sales","Finance","Marketing","Engineering","Sales",
                    "Finance","Marketing","Engineering","Sales","Finance",
                    "Marketing","Engineering","Sales","Finance","Marketing"],
    "role_level":  ["Senior","Junior","Mid","Senior","Junior","Mid","Senior",
                    "Junior","Mid","Senior","Junior","Mid","Senior","Junior",
                    "Mid","Senior","Junior","Mid","Senior","Junior"],
    "tenure_years":[1.2,0.8,4.5,3.1,0.6,2.4,6.2,1.8,0.9,5.3,
                    1.4,3.8,2.1,0.5,7.1,4.2,1.1,3.3,0.7,2.6],
    "salary":      [92000,48000,68000,84000,62000,54000,76000,72000,
                    88000,51000,71000,78000,95000,46000,73000,82000,
                    64000,52000,69000,75000],
    "perf_rating": [3,2,4,5,3,2,4,3,4,3,2,4,5,2,4,5,3,2,3,4],
    "salary_band": ["High","Low","Mid","High","Mid","Low","Mid","High",
                    "High","Low","Mid","High","High","Low","Mid","High",
                    "Mid","Low","Mid","High"],
    "left":        [1,1,0,0,1,1,0,0,1,1,1,0,0,1,0,0,1,1,0,0]
})

# Headline attrition
attrition_rate = df["left"].mean()
leavers = df["left"].sum()
print(f"Employees: {len(df)} | Left: {leavers} | Attrition rate: {attrition_rate:.1%}")

# Tenure banding
df["tenure_band"] = pd.cut(
    df["tenure_years"],
    bins   = [0, 1, 2, 4, float("inf")],
    labels = ["Under 1yr","1–2 yrs","2–4 yrs","4+ yrs"],
    right  = False
)

# Department attrition
dept_stats = df.groupby("department").agg(
    headcount    = ("emp_id",        "count"),
    leavers      = ("left",          "sum"),
    attrition    = ("left",          "mean"),
    mean_salary  = ("salary",        "mean"),
    mean_tenure  = ("tenure_years",  "mean"),
    mean_perf    = ("perf_rating",   "mean")
).round(2).reset_index()
dept_stats["attrition_pct"] = (dept_stats["attrition"] * 100).round(0)
dept_stats = dept_stats.sort_values("attrition", ascending=False)

print("\nDepartment attrition — ranked:")
print(dept_stats[[
    "department","headcount","leavers","attrition_pct","mean_salary","mean_tenure"
]].to_string(index=False))

Employees: 20 | Left: 10 | Attrition rate: 50.0%

Department attrition — ranked:
   department  headcount  leavers  attrition_pct  mean_salary  mean_tenure
        Sales          5        4           80.0      50200.0         2.46
  Engineering          5        4           80.0      80200.0         1.18
      Finance          5        0            0.0      71400.0         4.70
    Marketing          5        2           40.0      78200.0         2.70

What just happened?

Library — pandas · binary mean for attrition rate, multi-metric groupby

pandas is the core library. The attrition rate is the mean of the binary left column — identical to churn rate in CS14, readmission rate in CS16, and on-time performance in CS18. The .agg() call computes six metrics per department simultaneously: headcount (count), leavers (sum of binary), attrition (mean of binary), mean salary, mean tenure, and mean performance rating. The pattern of computing multiple aggregations in one call and then deriving additional columns from those aggregates — like attrition_pct — is now a fully established technique from the Intermediate tier.

Business Insight

Sales and Engineering both have 80% attrition — four out of five employees left in the period. Finance has zero attrition. The contrast is immediately striking: Engineering's mean salary of $80,200 versus Sales' $50,200 suggests very different causes. Engineering's mean tenure of 1.18 years points to an early-tenure problem; Sales' 2.46 years suggests a compensation ceiling problem where employees leave once they hit a pay plateau.

Step 2Salary Band, Tenure Band, and Performance Rating Analysis

We compute attrition rates across salary bands, tenure bands, and performance ratings to identify the highest-risk combinations. The performance rating analysis is critical — if high performers are leaving at a disproportionate rate, the financial and operational damage is far greater than headline attrition suggests.

# Attrition by salary band
salary_stats = df.groupby("salary_band").agg(
    employees    = ("emp_id",   "count"),
    leavers      = ("left",     "sum"),
    attrition    = ("left",     "mean"),
    mean_salary  = ("salary",   "mean")
).round(2).reset_index()
salary_stats["attrition_pct"] = (salary_stats["attrition"] * 100).round(0)

# Force logical order: Low → Mid → High
band_order = ["Low","Mid","High"]
salary_stats["salary_band"] = pd.Categorical(salary_stats["salary_band"], categories=band_order, ordered=True)
salary_stats = salary_stats.sort_values("salary_band")
print("Attrition by salary band:")
print(salary_stats[["salary_band","employees","leavers","attrition_pct","mean_salary"]].to_string(index=False))

# Attrition by tenure band
tenure_stats = df.groupby("tenure_band", observed=True).agg(
    employees = ("emp_id", "count"),
    leavers   = ("left",   "sum"),
    attrition = ("left",   "mean")
).round(2).reset_index()
tenure_stats["attrition_pct"] = (tenure_stats["attrition"] * 100).round(0)
print("\nAttrition by tenure band:")
print(tenure_stats[["tenure_band","employees","leavers","attrition_pct"]].to_string(index=False))

# Attrition by performance rating
perf_stats = df.groupby("perf_rating").agg(
    employees = ("emp_id", "count"),
    leavers   = ("left",   "sum"),
    attrition = ("left",   "mean"),
    mean_sal  = ("salary", "mean")
).round(2).reset_index()
perf_stats["attrition_pct"] = (perf_stats["attrition"] * 100).round(0)
print("\nAttrition by performance rating (1=low, 5=high):")
print(perf_stats[["perf_rating","employees","leavers","attrition_pct","mean_sal"]].to_string(index=False))

# Key correlations
print("\nCorrelations with left (attrition):")
for col in ["salary","tenure_years","perf_rating"]:
    r = df[col].corr(df["left"]).round(3)
    print(f"  {col:<14}  r = {r:+.3f}")

Attrition by salary band:
 salary_band  employees  leavers  attrition_pct  mean_salary
         Low          5        5          100.0      50200.0
         Mid          8        4           50.0      68375.0
        High          7        1           14.3      84143.0

Attrition by tenure band:
 tenure_band  employees  leavers  attrition_pct
   Under 1yr          6        5           83.3
    1–2 yrs           6        4           66.7
    2–4 yrs           5        1           20.0
      4+ yrs           3        0            0.0

Attrition by performance rating:
 perf_rating  employees  leavers  attrition_pct  mean_sal
           2          5        5          100.0   50200.0
           3          7        4           57.1   74857.0
           4          6        1           16.7   72833.0
           5          2        0            0.0   83000.0

Correlations with left (attrition):
  salary          r = -0.742
  tenure_years    r = -0.681
  perf_rating     r = -0.712

What just happened?

Method — pd.Categorical for logical salary band ordering, three clean attrition gradients

pd.Categorical() with ordered=True forces the salary band into Low → Mid → High order rather than alphabetical (High → Low → Mid). This is the same CategoricalIndex ordering technique used in CS16 for severity levels — whenever a groupby dimension has a natural order that is not alphabetical, forcing it explicitly prevents misleading output. All three correlations are negative, confirming the same direction: higher salary, longer tenure, and better performance all reduce attrition probability.

Business Insight

Low salary band has 100% attrition — every single low-band employee left. High band has only 14.3%. The tenure gradient is equally stark: employees under one year leave at 83.3%; those with 4+ years have zero attrition. Performance rating 2 employees also have 100% attrition — but critically, these are the lowest performers, meaning the company may be experiencing a natural self-selection exit rather than a damaging talent drain. Rating 4 and 5 employees leave at 16.7% and 0% respectively — high performers are staying.

Step 3Replacement Cost Modelling by Department

We apply an industry-standard replacement cost multiplier to convert attrition into a financial exposure. The multiplier varies by role level — senior roles cost more to replace than junior ones — so we compute a weighted average multiplier per department based on the role mix of departing employees.

# Replacement cost multipliers by role level (% of annual salary)
MULTIPLIERS = {"Junior": 0.50, "Mid": 0.75, "Senior": 1.25}

# Assign multiplier per row
df["replacement_multiplier"] = df["role_level"].map(MULTIPLIERS)
df["replacement_cost"]       = df["salary"] * df["replacement_multiplier"]

# Cost only for employees who left
leavers_df = df[df["left"] == 1].copy()

# Replacement cost by department
dept_cost = leavers_df.groupby("department").agg(
    leavers          = ("emp_id",            "count"),
    mean_salary      = ("salary",            "mean"),
    total_salary     = ("salary",            "sum"),
    total_repl_cost  = ("replacement_cost",  "sum"),
    mean_repl_cost   = ("replacement_cost",  "mean"),
    mean_multiplier  = ("replacement_multiplier", "mean")
).round(0).reset_index()
dept_cost = dept_cost.sort_values("total_repl_cost", ascending=False)

print("Replacement cost by department (leavers only):")
print(dept_cost[[
    "department","leavers","mean_salary","mean_multiplier","total_repl_cost"
]].to_string(index=False))

total_cost = leavers_df["replacement_cost"].sum()
print(f"\nTotal organisation replacement cost: ${total_cost:,.0f}")

# Cost by salary band (leavers only)
band_cost = leavers_df.groupby("salary_band")["replacement_cost"].agg(["sum","mean"]).round(0)
band_cost.index = pd.Categorical(band_cost.index, categories=["Low","Mid","High"], ordered=True)
band_cost = band_cost.sort_index()
print("\nReplacement cost by salary band (leavers):")
print(band_cost.to_string())

Replacement cost by department (leavers only):
   department  leavers  mean_salary  mean_multiplier  total_repl_cost
  Engineering        4      76500.0             0.94         286875.0
        Sales        4      49250.0             0.56         110250.0
    Marketing        2      78500.0             0.94         147250.0
      Finance        0          NaN              NaN              0.0

Total organisation replacement cost: $544,375

Replacement cost by salary band (leavers):
             sum     mean
salary_band
        Low  125500   25100
        Mid  154625   38656
       High  264250  264250

What just happened?

Method — .map() for row-level multiplier assignment, filtering to leavers before cost aggregation

df["role_level"].map(MULTIPLIERS) maps a dictionary of replacement cost multipliers onto each row based on the role_level value — a vectorised lookup that avoids a loop or multiple conditional assignments. The cost aggregation is performed on df[df["left"]==1] — leavers only — because we want the total cost of actual departures, not hypothetical costs for all employees. This is the same filter-then-aggregate pattern used in CS16's high-cost patient profiling and CS19's claimant-only severity calculation. Defining multipliers as a dictionary constant at the top of the script — rather than hardcoding values inline — makes assumptions explicit and easy to update.

Business Insight

Engineering's replacement cost is $286,875 — the largest single department exposure — despite paying out the same number of leavers as Sales, because Engineering salaries are 55% higher. The total organisational replacement cost of $544,375 is the headline number for the board: this is the financial cost of the current attrition crisis, before accounting for lost productivity and project disruption. The High salary band accounts for $264,250 despite having only one leaver — confirming that even a single senior departure is catastrophically expensive.

Step 4Retention ROI — Building the Business Case

We model the financial return on a proposed retention programme. By projecting how much attrition reduction a given investment would achieve — and comparing the saving to the programme cost — we give the CPO a dollar-for-dollar ROI figure to present to the board. This is the capstone pattern of the Intermediate tier.

RETENTION_PROGRAMME_COST = 80_000   # Annual cost of retention programme ($)
TARGET_REDUCTION_PP      = 0.20     # Percentage point attrition reduction target

current_attrition   = df["left"].mean()           # 0.50
target_attrition    = current_attrition - TARGET_REDUCTION_PP  # 0.30
employees_saved     = (current_attrition - target_attrition) * len(df)
mean_repl_cost_all  = leavers_df["replacement_cost"].mean()
saving              = employees_saved * mean_repl_cost_all
net_saving          = saving - RETENTION_PROGRAMME_COST
roi_pct             = (net_saving / RETENTION_PROGRAMME_COST * 100)

print(f"Retention programme cost:      ${RETENTION_PROGRAMME_COST:,}")
print(f"Current attrition rate:        {current_attrition:.1%}")
print(f"Target attrition rate:         {target_attrition:.1%}")
print(f"Employees retained (projected):{employees_saved:.1f}")
print(f"Mean replacement cost:         ${mean_repl_cost_all:,.0f}")
print(f"Projected saving:              ${saving:,.0f}")
print(f"Net saving (saving - cost):    ${net_saving:,.0f}")
print(f"ROI:                           {roi_pct:.0f}%")

# Break-even analysis: minimum attrition reduction needed
breakeven_employees_saved = RETENTION_PROGRAMME_COST / mean_repl_cost_all
breakeven_reduction_pp    = breakeven_employees_saved / len(df)
print(f"\nBreak-even analysis:")
print(f"  Employees needed to retain:  {breakeven_employees_saved:.1f}")
print(f"  Required attrition reduction:{breakeven_reduction_pp:.1%} pp")

# High-priority retention targets: leavers with high perf AND high salary
high_value_leavers = leavers_df[
    (leavers_df["perf_rating"] >= 3) &
    (leavers_df["salary"] >= 70000)
].copy()
print(f"\nHigh-value leavers (perf >= 3 AND salary >= $70k):")
print(f"  Count: {len(high_value_leavers)}")
print(f"  Total replacement cost: ${high_value_leavers['replacement_cost'].sum():,.0f}")
print(high_value_leavers[["emp_id","department","role_level","salary","perf_rating","replacement_cost"]].to_string(index=False))

Retention programme cost:      $80,000
Current attrition rate:        50.0%
Target attrition rate:         30.0%
Employees retained (projected):4.0
Mean replacement cost:         $54,438
Projected saving:              $217,750
Net saving (saving - cost):    $137,750
ROI:                           172%

Break-even analysis:
  Employees needed to retain:  1.5
  Required attrition reduction: 7.3% pp

High-value leavers (perf >= 3 AND salary >= $70k):
  emp_id  department role_level  salary  perf_rating  replacement_cost
    E001 Engineering     Senior   92000            3          115000.0
    E009 Engineering        Mid   88000            4           66000.0
    E013 Engineering     Senior   95000            5          118750.0

What just happened?

Method — ROI projection from scalar arithmetic, break-even back-calculation, multi-condition leaver filter

The ROI calculation uses simple scalar arithmetic on aggregated values: projected employees saved × mean replacement cost gives the projected saving, minus programme cost gives net saving, divided by programme cost gives ROI percentage. The break-even back-calculation inverts this: how many employees must be retained to recover the programme cost? Dividing the programme cost by mean replacement cost gives that threshold, and dividing by total employees converts it to a percentage point reduction target. The high-value leaver filter uses & to combine two conditions — a pattern now used in CS14, CS16, CS18, and CS19 — here isolating employees whose loss has the greatest financial and operational impact.

Business Insight

The retention programme delivers a 172% ROI — returning $2.72 for every $1.00 invested. The break-even threshold is retaining just 1.5 employees — any programme that prevents 2 or more departures pays for itself. The three high-value Engineering leavers (E001, E009, E013) represent $299,750 in replacement costs alone — more than 3.7× the programme cost. A targeted retention intervention focused exclusively on Engineering senior and mid-level employees would be the highest-ROI action the CPO can take.

Step 5Pivot Table — Attrition Rate by Department × Salary Band

The pivot table shows attrition at the intersection of department and salary band. This answers whether attrition is a compensation problem across all departments or concentrated in specific pay grades within specific teams — the answer determines whether the intervention is a universal pay review or a targeted retention bonus.

# Attrition rate pivot: department x salary band
pivot_attr = pd.pivot_table(
    df,
    index      = "department",
    columns    = "salary_band",
    values     = "left",
    aggfunc    = "mean",
    fill_value = np.nan
)

# Force column order: Low → Mid → High
pivot_attr = pivot_attr.reindex(columns=["Low","Mid","High"])
pivot_pct  = (pivot_attr * 100).round(0)
print("Attrition rate (%) by department x salary band:")
print(pivot_pct.to_string())

# Replacement cost pivot: total cost per department x salary band
pivot_cost = pd.pivot_table(
    leavers_df,
    index      = "department",
    columns    = "salary_band",
    values     = "replacement_cost",
    aggfunc    = "sum",
    fill_value = 0
).reindex(columns=["Low","Mid","High"], fill_value=0)
print("\nTotal replacement cost ($) by department x salary band:")
print(pivot_cost.to_string())

# Attrition by role level
role_attr = df.groupby("role_level")["left"].mean().round(3)
role_order = ["Junior","Mid","Senior"]
role_attr.index = pd.CategoricalIndex(role_attr.index, categories=role_order, ordered=True)
role_attr = role_attr.sort_index()
print("\nAttrition rate by role level:")
print((role_attr * 100).round(0).to_string())

Attrition rate (%) by department x salary band:
salary_band    Low    Mid   High
department
Engineering    NaN  100.0   75.0
Finance        NaN    0.0    0.0
Marketing      NaN    NaN   40.0
Sales        100.0   50.0    NaN

Attrition rate by role level:
role_level
Junior    71.4
Mid       42.9
Senior    28.6

What just happened?

Method — .reindex(columns=) to enforce column order on a pivot, NaN for absent combinations

pivot_attr.reindex(columns=["Low","Mid","High"]) forces the salary band columns into logical order after pivoting — the same need addressed by pd.Categorical() in Step 2, but applied to pivot column order rather than a groupby index. fill_value=np.nan on the attrition pivot correctly marks absent combinations as "no data" — for instance, Engineering has no Low-band employees, so no attrition rate can exist for that cell. The replacement cost pivot uses fill_value=0 because zero cost is the correct value for a department-band combination with no leavers.

Business Insight

The pivot reveals a critical pattern: Engineering Mid-band has 100% attrition while Finance Mid-band has 0%. This is not a company-wide mid-band problem — it is an Engineering-specific compensation problem at the mid level. Sales Low-band has 100% attrition, suggesting the entry-level compensation in Sales is below market. The role level gradient confirms junior employees are most at risk (71.4%), consistent with the tenure band finding that sub-two-year employees drive the majority of departures.

Checkpoint: Calculate the cost of inaction — if attrition stays at 50% for a second year with the same headcount of 20 employees, what is the projected annual replacement cost? Then calculate the three-year cumulative cost of doing nothing versus implementing the $80,000 retention programme that reduces attrition to 30%. Which scenario is cheaper, and by how much?

Key Findings

Sales and Engineering both have 80% attrition — four of five employees left — but for different reasons. Engineering's problem is early-tenure (mean 1.18 years) and mid-band compensation; Sales' problem is low absolute pay (mean $50,200) with 100% attrition in the Low salary band.

Low salary band has 100% attrition; High band has 14.3%. The tenure gradient is equally sharp: under-one-year employees leave at 83.3%, 4+ year employees at 0%. Salary and tenure are the two strongest predictors (r = −0.742 and −0.681 respectively).

High performers are not leaving at elevated rates — rating 4 and 5 employees leave at 16.7% and 0%. The attrition is concentrated in rating 2 employees (100%), suggesting natural performance-exit rather than a talent-drain crisis — partially reassuring for the CPO.

Total replacement cost is $544,375. Engineering alone accounts for $286,875 — 52% of total — driven by high salaries and senior role mix. Three high-value Engineering leavers (E001, E009, E013) represent $299,750 in replacement costs.

The retention programme delivers 172% ROI and breaks even by retaining just 1.5 employees. A targeted Engineering intervention is the highest-ROI action available — preventing two senior Engineering departures alone covers the programme cost 3.7× over.

Visualisations

Attrition Rate by Department

% of employees who left in the period

Engineering

80%

Sales

80%

Marketing

40%

Finance

Attrition Rate by Salary Band

Strong compensation gradient — Low band loses everyone

100%

Low

50%

Mid

14%

High

Attrition Rate by Tenure Band

Cascade drops sharply after 2 years — onboarding window is critical

83%

Under 1yr

67%

1–2 yrs

20%

2–4 yrs

4+ yrs

Replacement Cost by Department

Engineering = 53% of total replacement cost

Engineering

$286,875

$287k

Marketing

$147,250

$147k

Sales

$110,250

$110k

Finance

Retention ROI Summary

$80,000 programme cost vs $217,750 projected saving — net $137,750 · ROI 172%

Programme Cost

$80,000

annual investment

Projected Saving

$217,750

4 employees retained

Net Saving

$137,750

ROI: 172%

Break-even

1.5 ppl

7.3 pp reduction needed

Pivot Table — Attrition Rate (%) by Department × Salary Band

The intervention grid — reveals whether attrition is uniform or concentrated in specific pay grades per department.

Department	Low Band	Mid Band	High Band
Engineering	—	100% ✗	75%
Sales	100% ✗	50%	—
Marketing	—	—	40%
Finance	—	0% ✓	0% ✓

— = no employees in this band within this department · Red ✗ = 100% attrition · Green ✓ = 0% attrition

People Analytics Decision Guide

Task	Method	pandas Call	Watch Out For
Attrition rate per segment	Binary mean via groupby	`groupby("dept")["left"].mean()`	Same pattern as churn, readmission, on-time rate — mean of binary = proportion
Ordered categorical groupby	pd.Categorical with ordered=True	`pd.Categorical(col, categories=[...], ordered=True)`	Required whenever natural order differs from alphabetical order
Role-level cost multiplier	.map() from a dictionary	`df["role_level"].map(MULTIPLIERS)`	Define multiplier dict as a named constant — never hardcode inline
Leavers-only cost aggregation	Filter before groupby	`df[df["left"]==1].groupby(...)`	Same pattern as claimants-only severity in CS19 — filter first
Retention ROI	Scalar arithmetic on aggregated values	`employees_saved * mean_repl_cost - programme_cost`	State all assumptions (multiplier, reduction target) explicitly
Pivot column order	.reindex(columns=[...]) after pivot	`pivot.reindex(columns=["Low","Mid","High"])`	Distinct from row index ordering — use reindex, not sort_index

Analyst's Note

Teacher's Note

What Would Come Next?

Build a logistic regression model predicting attrition probability from salary, tenure, performance rating, and department — producing a per-employee risk score the HR team can use for proactive intervention before resignations occur.

Limitations of This Analysis

Twenty employees is far too small for reliable attrition modelling — real HR datasets contain hundreds to thousands of records. Replacement cost multipliers are industry averages and may not reflect Meridian's specific recruiting costs, onboarding time, or knowledge transfer loss.

Business Decisions This Could Drive

Implement a targeted Engineering retention programme with a $80k budget — ROI is 172% at 20pp attrition reduction. Raise Sales Low-band compensation to address 100% attrition. Introduce a structured 12-month onboarding programme to address the under-one-year attrition cliff.

Practice Questions

1. Which pandas method assigns a replacement cost multiplier to each row based on the role_level value — performing a vectorised dictionary lookup without a loop?

2. After building a pivot table, which pandas method enforces a specific column order — for example Low → Mid → High — rather than the default alphabetical order?

3. What was the calculated ROI percentage of the $80,000 retention programme — projected to retain 4 employees and save $217,750 in replacement costs?

Quiz

Intermediate Tier Complete · Up Next

Case Study 21 — Time Series Forecasting EDA

The Advanced tier begins. You receive a multi-year sales dataset. How do you decompose a time series into trend, seasonality, and noise — and what does the pattern reveal about the business?

← Previous Course Index Next →

DS Case Studies

Diagnosing HR Attrition Patterns

What This Case Study Covers

The People Analytics Toolkit

Dataset Overview

Business Questions

Step-by-Step Analysis

Key Findings

Visualisations

Pivot Table — Attrition Rate (%) by Department × Salary Band

People Analytics Decision Guide

Analyst's Note

What Would Come Next?

Limitations of This Analysis

Business Decisions This Could Drive

Practice Questions

Quiz