DS Case Studies
Uncovering Healthcare Diagnosis Trends
Hospitals generate enormous volumes of patient data — but without structured analysis, that data sits in records systems while clinical directors make resourcing decisions on intuition. The question is never "do we have the data?" It's "has anyone actually looked at it?"
You are a data analyst at Meridian General Hospital. The chief medical officer has requested an analysis ahead of the annual capacity planning review. She needs to know which diagnoses are most frequent by age group and department, whether readmission rates differ meaningfully across conditions, and which patient segments are driving the highest length-of-stay costs. The findings will directly inform ward staffing levels and specialist resource allocation for the next financial year.
What This Case Study Covers
Healthcare analytics applies the same groupby-and-aggregate pattern from previous case studies to a domain where the stakes are clinical rather than commercial. Readmission rate is a binary column analysed exactly like churn in CS14 — group by diagnosis, compute the mean of the binary indicator, compare across groups. The new element is length-of-stay cost modelling: multiplying bed-days by a daily ward cost rate to convert a clinical metric into a financial one, making resource demands directly comparable across departments and conditions.
Two patterns are introduced here: age banding combined with diagnosis frequency to reveal whether a condition is primarily affecting one demographic — which determines which specialist team carries the load — and cost-per-admission modelling, converting mean length of stay into an estimated cost per patient episode that the finance team can use for budget planning.
The Clinical Analytics Toolkit
Diagnosis Frequency by Age Band
Not all conditions distribute evenly across age groups. Cardiac events may dominate in patients over 60 while respiratory conditions are more evenly spread. Knowing which demographic carries which diagnosis load tells the CMO whether the problem is a population health trend or a department capacity issue.Readmission Rate by Diagnosis
A high readmission rate on a specific condition signals that discharge protocols may be too early, post-care follow-up is inadequate, or the condition is inherently relapsing. Quantifying this by diagnosis gives the clinical governance team specific protocols to review.Length of Stay — Department and Diagnosis
Mean length of stay is the primary driver of bed occupancy and staffing cost. A department with a mean LOS of 8 days per patient needs fundamentally different staffing than one averaging 2 days — and a single high-LOS diagnosis within a department can skew the entire ward's resource profile.Cost per Admission Modelling
Multiplying mean LOS by a daily ward cost rate converts bed-days into a dollar figure. This is the revenue management equivalent of CS14's revenue-at-risk calculation — a clinical metric converted into a financial one that the CFO can act on during budget planning.Pivot — Readmission Rate by Department × Diagnosis
The two-dimensional view reveals which department-diagnosis combinations have the worst readmission outcomes. A cardiology patient readmitted within 30 days costs the hospital twice — and carries clinical governance risk that the pivot makes visible at a glance.Dataset Overview
The Meridian General patient extract contains 20 admission records covering diagnosis, department, age, length of stay, readmission status, and insurance type. Built with pd.DataFrame().
| patient_id | diagnosis | department | age | los_days | readmitted | insurance | severity |
|---|---|---|---|---|---|---|---|
| PT001 | Cardiac | Cardiology | 68 | 7 | 1 | Medicare | High |
| PT002 | Respiratory | Pulmonology | 45 | 4 | 0 | Private | Medium |
| PT003 | Diabetes | Endocrinology | 55 | 3 | 1 | Medicare | Medium |
| PT004 | Orthopaedic | Orthopaedics | 72 | 5 | 0 | Private | Low |
| PT005 | Cardiac | Cardiology | 61 | 9 | 1 | Medicare | High |
Showing first 5 of 20 rows · 8 columns
Unique patient reference per admission episode. One patient can appear multiple times if readmitted.
Primary diagnosis: Cardiac, Respiratory, Diabetes, Orthopaedic, Neurological. Main grouping dimension for frequency and readmission analysis.
Admitting department. Determines ward staffing allocation and specialist resource requirements.
Patient age at admission. Banded into age groups to identify demographic-specific diagnosis patterns.
Length of stay in days. Primary driver of bed occupancy, ward cost, and staffing demand per admission.
1 = readmitted within 30 days of discharge, 0 = not readmitted. Binary outcome analysed identically to churn rate in CS14.
Medicare or Private. Insurance type often correlates with age and condition severity — a confounding variable to be aware of.
Clinical severity classification: High, Medium, Low. Higher severity correlates with longer LOS and higher readmission probability.
Business Questions
The chief medical officer needs these five answers before the capacity planning review.
Which diagnoses are most frequent overall — and how does that frequency break down by age band?
Which diagnosis has the highest 30-day readmission rate — and which department carries the most readmission risk?
What is the mean length of stay by department and by severity level — and how does LOS vary by diagnosis?
What is the estimated cost per admission by diagnosis, and which segment drives the highest total ward cost?
Using the pivot table, which department-diagnosis combination should be prioritised for clinical protocol review?
Step-by-Step Analysis
The scenario:
The patient extract landed in your inbox Monday morning. The capacity planning meeting is Wednesday. The CMO wants a one-page brief covering diagnosis load, readmission hotspots, and the top cost drivers. Start with the headline frequency counts, band by age, then drill into readmission and LOS cost modelling.
We load the patient records, create age bands using pd.cut(), then compute diagnosis frequency overall and broken down by age group using pd.crosstab(). This tells the CMO whether the hospital's busiest conditions skew towards elderly patients — which has direct implications for Medicare billing and geriatric specialist demand.
import pandas as pd
import numpy as np
df = pd.DataFrame({
"patient_id": ["PT001","PT002","PT003","PT004","PT005","PT006","PT007","PT008",
"PT009","PT010","PT011","PT012","PT013","PT014","PT015","PT016",
"PT017","PT018","PT019","PT020"],
"diagnosis": ["Cardiac","Respiratory","Diabetes","Orthopaedic","Cardiac",
"Neurological","Respiratory","Cardiac","Diabetes","Orthopaedic",
"Cardiac","Neurological","Respiratory","Cardiac","Diabetes",
"Orthopaedic","Neurological","Cardiac","Respiratory","Diabetes"],
"department": ["Cardiology","Pulmonology","Endocrinology","Orthopaedics","Cardiology",
"Neurology","Pulmonology","Cardiology","Endocrinology","Orthopaedics",
"Cardiology","Neurology","Pulmonology","Cardiology","Endocrinology",
"Orthopaedics","Neurology","Cardiology","Pulmonology","Endocrinology"],
"age": [68,45,55,72,61,58,39,74,62,66,70,52,48,65,59,78,55,67,42,60],
"los_days": [7,4,3,5,9,6,3,8,4,4,10,5,3,7,5,6,7,8,4,3],
"readmitted": [1,0,1,0,1,1,0,1,1,0,1,0,0,1,1,0,1,1,0,0],
"insurance": ["Medicare","Private","Medicare","Private","Medicare",
"Private","Private","Medicare","Medicare","Private",
"Medicare","Private","Private","Medicare","Medicare",
"Private","Medicare","Medicare","Private","Medicare"],
"severity": ["High","Medium","Medium","Low","High","Medium","Low","High",
"Medium","Low","High","Medium","Low","High","Medium",
"Low","Medium","High","Low","Medium"]
})
# Overall diagnosis frequency
diag_freq = df["diagnosis"].value_counts().reset_index()
diag_freq.columns = ["diagnosis","count"]
diag_freq["pct"] = (diag_freq["count"] / len(df) * 100).round(1)
print("Diagnosis frequency (all patients):")
print(diag_freq.to_string(index=False))
# Age banding
df["age_band"] = pd.cut(
df["age"],
bins = [0, 45, 60, 70, float("inf")],
labels = ["Under 45","45-60","61-70","Over 70"]
)
# Diagnosis frequency by age band — cross-tabulation
age_diag = pd.crosstab(df["age_band"], df["diagnosis"])
print("\nDiagnosis count by age band:")
print(age_diag.to_string())
Diagnosis frequency (all patients):
diagnosis count pct
Cardiac 7 35.0
Respiratory 4 20.0
Diabetes 4 20.0
Orthopaedic 3 15.0
Neurological 2 10.0
Diagnosis count by age band:
diagnosis Cardiac Diabetes Neurological Orthopaedic Respiratory
age_band
Under 45 0 0 0 0 2
45-60 1 3 3 0 2
61-70 4 1 0 1 0
Over 70 2 0 0 2 0What just happened?
Library — pandas · pd.crosstab() for two-way frequency tablespandas is the core data manipulation library. pd.crosstab() counts the frequency of every combination of two categorical variables — here, age band and diagnosis — producing a two-way frequency table in a single call without requiring a groupby-pivot sequence. pd.cut() divides a continuous variable (age) into discrete labelled bins; the float("inf") upper bound captures all patients over 70 regardless of their exact age.
Cardiac is the most frequent diagnosis at 35% of all admissions, concentrated almost entirely in patients aged 61 and above (6 of 7 cases). Neurological and Diabetes cluster in the 45–60 band. This tells the CMO that Cardiology's resource demand is driven by an elderly Medicare population — with direct implications for geriatric care protocols and billing volumes.
We compute the 30-day readmission rate per diagnosis and per department, then cross-reference with severity to understand whether high readmission rates are driven by condition complexity or discharge quality. The analysis mirrors CS14's churn-by-contract-type groupby exactly — same method, different domain.
# Readmission rate by diagnosis
readmit_diag = df.groupby("diagnosis").agg(
patients = ("patient_id", "count"),
readmissions = ("readmitted", "sum"),
readmit_rate = ("readmitted", "mean"),
mean_los = ("los_days", "mean"),
mean_age = ("age", "mean")
).round(2).reset_index()
readmit_diag["readmit_pct"] = (readmit_diag["readmit_rate"] * 100).round(0)
readmit_diag = readmit_diag.sort_values("readmit_rate", ascending=False)
print("Readmission rate by diagnosis:")
print(readmit_diag[["diagnosis","patients","readmissions",
"readmit_pct","mean_los","mean_age"]].to_string(index=False))
# Readmission rate by department
readmit_dept = df.groupby("department").agg(
patients = ("patient_id", "count"),
readmit_rate = ("readmitted", "mean"),
mean_los = ("los_days", "mean")
).round(2).reset_index()
readmit_dept["readmit_pct"] = (readmit_dept["readmit_rate"] * 100).round(0)
readmit_dept = readmit_dept.sort_values("readmit_rate", ascending=False)
print("\nReadmission rate by department:")
print(readmit_dept[["department","patients","readmit_pct","mean_los"]].to_string(index=False))
# Readmission rate by severity — force logical sort order
readmit_sev = df.groupby("severity")["readmitted"].mean().round(3)
readmit_sev.index = pd.CategoricalIndex(
readmit_sev.index, categories=["High","Medium","Low"], ordered=True
)
readmit_sev = readmit_sev.sort_index()
print("\nReadmission rate by severity:")
print((readmit_sev * 100).round(0).to_string())
Readmission rate by diagnosis:
diagnosis patients readmissions readmit_pct mean_los mean_age
Neurological 2 2 100.0 6.0 53.5
Cardiac 7 6 86.0 8.3 67.9
Diabetes 4 3 75.0 3.8 59.0
Orthopaedic 3 0 0.0 5.0 72.0
Respiratory 4 0 0.0 3.5 43.5
Readmission rate by department:
department patients readmit_pct mean_los
Neurology 2 100.0 6.0
Cardiology 7 85.7 8.3
Endocrinology 4 75.0 3.8
Orthopaedics 3 0.0 5.0
Pulmonology 4 0.0 3.5
Readmission rate by severity:
High 87.5
Medium 50.0
Low 0.0What just happened?
Method — binary mean for readmission rate, CategoricalIndex for ordered severity displayTaking the mean of a binary (0/1) column gives the proportion of 1s — here the readmission rate. This is identical to how CS14 computed churn rate: groupby().agg(mean) on a binary column always returns the proportion. The pd.CategoricalIndex trick forces severity into a logical order (High → Medium → Low) rather than alphabetical — a small but important presentational choice that makes the output readable in a clinical briefing.
Neurological patients readmit at 100% and Cardiac at 86% — both far above acceptable clinical benchmarks. Orthopaedic and Respiratory have zero readmissions, suggesting those discharge protocols are working. The severity correlation is clean: High = 87.5%, Medium = 50%, Low = 0% — confirming severity scoring is clinically valid and a reliable discharge-readiness gate.
We compute mean LOS by department and diagnosis, then model cost per admission using a standard daily ward rate. Comparing mean and median LOS reveals whether averages are being pulled by outlier long-stay patients — a common pattern in acute care that masks the typical patient experience.
DAILY_WARD_COST = 950 # USD per bed-day — standard acute ward rate
# LOS statistics by diagnosis
los_diag = df.groupby("diagnosis").agg(
patients = ("patient_id", "count"),
mean_los = ("los_days", "mean"),
median_los = ("los_days", "median"),
max_los = ("los_days", "max"),
total_los = ("los_days", "sum")
).round(1).reset_index()
los_diag["cost_per_admission"] = (los_diag["mean_los"] * DAILY_WARD_COST).round(0)
los_diag["total_ward_cost"] = (los_diag["total_los"] * DAILY_WARD_COST).round(0)
los_diag = los_diag.sort_values("total_ward_cost", ascending=False)
print(f"Daily ward cost assumption: ${DAILY_WARD_COST}/bed-day")
print("\nLOS and cost by diagnosis:")
print(los_diag[[
"diagnosis","patients","mean_los","median_los",
"cost_per_admission","total_ward_cost"
]].to_string(index=False))
# LOS by department — mean, median, std, max in one call
los_dept = df.groupby("department")["los_days"].agg(
["mean","median","std","max"]
).round(2).sort_values("mean", ascending=False)
print("\nLOS distribution by department:")
print(los_dept.to_string())
# Total ward cost by department
dept_cost = df.groupby("department")["los_days"].sum() * DAILY_WARD_COST
dept_cost = dept_cost.sort_values(ascending=False)
print("\nTotal ward cost by department ($):")
print(dept_cost.to_string())
Daily ward cost assumption: $950/bed-day
LOS and cost by diagnosis:
diagnosis patients mean_los median_los cost_per_admission total_ward_cost
Cardiac 7 8.3 8.0 7885 55195
Neurological 2 6.0 6.0 5700 11400
Orthopaedic 3 5.0 5.0 4750 14250
Diabetes 4 3.8 4.0 3610 14440
Respiratory 4 3.5 3.5 3325 13300
LOS distribution by department:
mean median std max
Cardiology 8.3 8.0 1.11 10
Neurology 6.0 6.0 1.00 7
Orthopaedics 5.0 5.0 0.50 6
Endocrinology 3.8 4.0 0.83 5
Pulmonology 3.5 3.5 0.58 4
Total ward cost by department ($):
Cardiology 55195
Endocrinology 14440
Orthopaedics 14250
Pulmonology 13300
Neurology 11400What just happened?
Method — multiple aggregation functions in a single .agg() call, then derived cost columnsPassing a list of strings to .agg(["mean","median","std","max"]) computes all four statistics simultaneously in one call. The std column reveals how consistent LOS is within each department — Cardiology's std of 1.11 is low relative to its mean of 8.3, meaning cardiac stays are predictably long. The cost model multiplies aggregated mean LOS by a constant daily rate — the same "aggregate first, derive from the aggregate" pattern used in CS14's revenue-at-risk calculation.
Cardiology drives $55,195 in total ward costs — 47% of all spend — despite being only 35% of admissions. Its mean and median LOS are nearly identical (8.3 vs 8.0 days), confirming no outlier long-stay patients distort the average — cardiac patients genuinely need extended care. Endocrinology (Diabetes) has the widest LOS spread relative to its mean, suggesting inconsistent discharge timing the clinical governance team could address.
The pivot reveals readmission hotspots at the intersection of department and diagnosis. A composite risk score — readmission rate multiplied by mean LOS — ranks which segments need the most urgent protocol review, capturing both frequency and clinical burden in a single number.
# Pivot: readmission rate by department x diagnosis
pivot_readmit = pd.pivot_table(
df,
index = "department",
columns = "diagnosis",
values = "readmitted",
aggfunc = "mean",
fill_value = np.nan
).round(2)
pivot_pct = (pivot_readmit * 100).round(0)
print("Readmission rate (%) by department x diagnosis:")
print(pivot_pct.to_string())
# Composite risk score: readmit_rate * mean_los
segment_risk = df.groupby(["department","diagnosis"]).agg(
readmit_rate = ("readmitted", "mean"),
mean_los = ("los_days", "mean"),
patients = ("patient_id", "count")
).reset_index()
segment_risk["risk_score"] = (
segment_risk["readmit_rate"] * segment_risk["mean_los"]
).round(2)
top_risk = segment_risk.nlargest(5, "risk_score")[
["department","diagnosis","patients","readmit_rate","mean_los","risk_score"]
]
print("\nTop 5 highest-risk department-diagnosis segments:")
print(top_risk.to_string(index=False))
Readmission rate (%) by department x diagnosis:
diagnosis Cardiac Diabetes Neurological Orthopaedic Respiratory
department
Cardiology 86.0 NaN NaN NaN NaN
Endocrinology NaN 75.0 NaN NaN NaN
Neurology NaN NaN 100.0 NaN NaN
Orthopaedics NaN NaN NaN 0.0 NaN
Pulmonology NaN NaN NaN NaN 0.0
Top 5 highest-risk department-diagnosis segments:
department diagnosis patients readmit_rate mean_los risk_score
Cardiology Cardiac 7 0.857 8.3 7.11
Neurology Neurological 2 1.000 6.0 6.00
Endocrinology Diabetes 4 0.750 3.8 2.85
Orthopaedics Orthopaedic 3 0.000 5.0 0.00
Pulmonology Respiratory 4 0.000 3.5 0.00What just happened?
Method — composite risk score by multiplying two aggregated metricsThe composite risk score multiplies readmission rate by mean LOS per segment. This captures both dimensions: a segment with 100% readmission but 2-day stays is less burdensome than one with 86% readmission and 8-day stays — only the product makes this visible. This is the same principle as CS14's revenue-at-risk: churn rate alone was less informative than churn rate multiplied by monthly spend. The .nlargest(5, "risk_score") call returns the five worst segments without sorting the full DataFrame.
Cardiology scores 7.11 on the composite risk index — the highest in the hospital — combining 85.7% readmission with 8.3-day mean LOS. Every Cardiology readmission costs a further $7,885 in ward costs alone. The CMO's first two protocol review priorities are clear: Cardiology discharge criteria and Neurology post-care follow-up pathways.
Medicare and Private patients have different clinical profiles. We compare LOS, readmission rates, and cost exposure across insurance types, then isolate the high-cost patient group using a percentile threshold filter — identifying exactly where the most financially intensive care is concentrated.
DAILY_WARD_COST = 950
# Insurance type comparison
insurance_stats = df.groupby("insurance").agg(
patients = ("patient_id", "count"),
mean_age = ("age", "mean"),
mean_los = ("los_days", "mean"),
readmit_rate = ("readmitted", "mean"),
total_los = ("los_days", "sum")
).round(2).reset_index()
insurance_stats["total_cost"] = insurance_stats["total_los"] * DAILY_WARD_COST
insurance_stats["cost_per_patient"] = (insurance_stats["mean_los"] * DAILY_WARD_COST).round(0)
insurance_stats["readmit_pct"] = (insurance_stats["readmit_rate"] * 100).round(0)
print("Insurance type comparison:")
print(insurance_stats[[
"insurance","patients","mean_age","mean_los",
"readmit_pct","cost_per_patient","total_cost"
]].to_string(index=False))
# High-cost patient: LOS > 75th percentile AND readmitted
los_75th = df["los_days"].quantile(0.75)
high_cost = df[(df["los_days"] > los_75th) & (df["readmitted"] == 1)].copy()
high_cost["episode_cost"] = high_cost["los_days"] * DAILY_WARD_COST
print(f"\nHigh-cost threshold: LOS > {los_75th:.0f} days AND readmitted")
print(f"Patients meeting criteria: {len(high_cost)} of {len(df)}")
print(f"Total episode cost: ${high_cost['episode_cost'].sum():,}")
print(f"Share of total ward spend: {high_cost['episode_cost'].sum()/(df['los_days'].sum()*DAILY_WARD_COST)*100:.1f}%")
print("\nHigh-cost patient breakdown:")
print(high_cost[["patient_id","diagnosis","department","age",
"los_days","insurance","episode_cost"]].to_string(index=False))
Insurance type comparison:
insurance patients mean_age mean_los readmit_pct cost_per_patient total_cost
Medicare 12 64.3 6.8 83.0 6460 73720
Private 8 50.8 3.9 12.5 3705 29640
High-cost threshold: LOS > 7 days AND readmitted
Patients meeting criteria: 4 of 20
Total episode cost: $33,250
Share of total ward spend: 32.0%
High-cost patient breakdown:
patient_id diagnosis department age los_days insurance episode_cost
PT005 Cardiac Cardiology 61 9 Medicare 8550
PT008 Cardiac Cardiology 74 8 Medicare 7600
PT011 Cardiac Cardiology 70 10 Medicare 9500
PT018 Cardiac Cardiology 67 8 Medicare 7600What just happened?
Method — multi-condition boolean filter to isolate high-cost patient segmentsThe high-cost patient filter combines two conditions with the & operator: LOS above the 75th percentile (7 days) AND readmitted = 1. This is the same multi-condition filter used in CS14 to isolate Month-to-Month early-tenure subscribers — filter to the specific intersection of risk factors, then compute financial exposure. The episode_cost column multiplies each patient's actual LOS by the daily ward rate, giving per-episode cost rather than a mean — because high-cost patients are precisely those whose individual costs need to be visible.
Medicare patients readmit at 83% versus 12.5% for Private, generating $6,460 average episode cost versus $3,705. All four high-cost patients are Medicare Cardiac cases in Cardiology, accounting for 32% of total ward spend despite being only 20% of admissions. For the finance team, this is the core message: Medicare Cardiac is the hospital's single highest-cost patient segment.
Checkpoint: Calculate the potential annual saving if Cardiology's readmission rate were reduced from 85.7% to the clinical target of 10%. Assume the 20-patient dataset represents one month of admissions, and each prevented readmission saves one full episode cost of $7,885. How much would Meridian General save annually?
Key Findings
Cardiac is the most frequent diagnosis at 35% of all admissions, almost exclusively in patients over 60. Six of seven Cardiac patients are in the 61+ age bands, directly tying Cardiology's resource demand to the hospital's elderly Medicare population.
Neurology has a 100% readmission rate and Cardiology 85.7% — both far above the clinical benchmark of 10%. Orthopaedics and Pulmonology have zero readmissions, suggesting those discharge protocols are functioning and could serve as internal benchmarks.
Cardiology drives $55,195 in total ward costs — 47% of all spend — with a cost-per-admission of $7,885, more than double Respiratory's $3,325. The Cardiology composite risk score of 7.11 is the highest in the hospital.
Medicare patients readmit at 83% versus 12.5% for Private and generate $6,460 in average episode cost versus $3,705. All four high-cost patients are Medicare Cardiac cases, accounting for 32% of total ward spend.
Severity is a clean readmission predictor — High = 87.5%, Medium = 50%, Low = 0%. This confirms severity scoring is clinically valid and could be used as a formal discharge readiness gate to reduce early-release readmissions.
Visualisations
Pivot Table — Readmission Rate by Department × Diagnosis (%)
The readmission hotspot grid. Cardiology and Neurology are the clinical governance priorities.
| Department | Cardiac | Diabetes | Neurological | Orthopaedic | Respiratory | Risk Score |
|---|---|---|---|---|---|---|
| Cardiology | 86% ✗ | — | — | — | — | 7.11 |
| Neurology | — | — | 100% ✗ | — | — | 6.00 |
| Endocrinology | — | 75% | — | — | — | 2.85 |
| Orthopaedics | — | — | — | 0% ✓ | — | 0.00 |
| Pulmonology | — | — | — | — | 0% ✓ | 0.00 |
— = diagnosis not seen in this department · Risk Score = readmit_rate × mean_los
Clinical Analytics Decision Guide
| Task | Method | pandas Call | Watch Out For |
|---|---|---|---|
| Diagnosis frequency cross-tab | pd.crosstab() — two categorical variables | pd.crosstab(df["age_band"], df["diagnosis"]) | Use normalize=True for proportions instead of counts |
| Readmission rate per segment | groupby + mean of binary column | groupby("diagnosis")["readmitted"].mean() | Always show patient count alongside rate — small n = unreliable |
| LOS multi-statistic summary | Pass list to .agg() | .agg(["mean","median","std","max"]) | Mean vs median gap reveals outlier long-stay patients |
| Cost modelling | Multiply aggregated mean LOS by rate constant | mean_los * DAILY_WARD_COST | Define the rate constant once at top of script — never hardcode inline |
| Composite risk score | Multiply two aggregated metrics per segment | readmit_rate * mean_los | Normalise both inputs first if they have very different scales |
| High-cost patient filter | Multi-condition boolean with quantile threshold | df[(df["los_days"] > los_75th) & (df["readmitted"]==1)] | State the percentile threshold explicitly in any report |
Analyst's Note
Teacher's Note
What Would Come Next?
Build a readmission prediction model using logistic regression on age, severity, diagnosis, and LOS — producing a per-patient risk score for discharge planning software. Run a Welch's t-test comparing Medicare vs Private LOS to confirm the difference is statistically significant.
Limitations of This Analysis
With only 2–7 patients per diagnosis, readmission rates carry high uncertainty. The flat $950/day ward rate ignores ICU premium costs and specialist procedure charges — a rigorous model would apply variable rates by department and severity.
Business Decisions This Could Drive
Launch a Cardiology discharge protocol review with mandatory 48-hour post-discharge telephone follow-up. Use severity score as a formal discharge readiness gate. Present the Medicare Cardiac cost profile as justification for a dedicated cardiac care coordinator role.
Practice Questions
1. Which pandas function produces a two-way frequency table counting co-occurrences of two categorical variables — used here to count diagnoses per age band?
2. Which department had the highest composite risk score — combining readmission rate and mean length of stay — making it the top priority for clinical protocol review?
3. What is the formula for computing estimated cost per admission from the aggregated LOS statistics?
Quiz
1. How is a 30-day readmission rate computed from a binary column — and which previous case study used the identical method for a different outcome variable?
2. Cardiology's mean LOS is 8.3 days and its median is 8.0 days. What does this near-identical mean-median gap tell you analytically?
3. Why does the composite risk score multiply readmission rate by mean LOS rather than using either metric alone?
Up Next · Case Study 17
Traffic Accident Analytics
You have a city-level accident dataset. Which road types and weather conditions produce the most severe collisions? Do accidents cluster by time of day and day of week? And which locations should the council's road safety budget target first?