DS Case Studies
Analysing Traffic Accident Patterns
Road safety budgets are finite. A council that spreads speed cameras and lighting improvements equally across every road type will save fewer lives than one that targets the highest-severity locations with precision. That targeting requires data — and that data is sitting in the accident log.
You are a data analyst at Westmoor City Council's road safety unit. The head of transport policy has been asked by the mayor's office to justify next year's £2.4m road safety budget. She needs a breakdown of accident severity by road type and weather condition, a time-of-day and day-of-week analysis, and a ranked priority list of the locations that should receive investment first. The findings are due before Thursday's cabinet meeting.
What This Case Study Covers
Traffic accident analytics introduces a new metric type: the weighted severity index. Raw accident counts treat a near-miss fender-bender the same as a fatal collision — which is analytically misleading and policy-dangerous. Weighting by severity converts a count into a harm score, making locations and conditions directly comparable on a single scale that reflects their actual human cost.
Two patterns are introduced: weighted severity scoring — mapping categorical severity levels to numeric weights with .map() then summing per segment — and time-of-day banding, grouping accident hour into named periods to reveal whether danger concentrates at specific windows rather than distributing evenly across the day.
The Road Safety Analytics Toolkit
Severity-Weighted Road Type Analysis
Motorways, A-roads, residential streets, and junctions have different speed profiles and collision dynamics. Equal accident counts across road types can hide drastically different harm levels — weighted scores reveal where the actual damage is concentrated.Weather Condition Severity Uplift
Rain, fog, and ice multiply accident severity — but not equally. Quantifying the severity uplift per condition tells the council whether road surface treatments, variable speed limits, or weather-triggered signage would have the highest impact per pound spent.Time-of-Day and Day-of-Week Clustering
Morning and evening rush hours concentrate high-speed, high-volume traffic. Weekend nights add alcohol-related risk. Knowing which windows produce the most severe accidents tells the team when to deploy mobile enforcement and whether time-restricted speed limits are justified.Road Type × Weather Severity Pivot
The two-dimensional heatmap reveals which road-condition combination creates the most harm. This is the direct input to the budget model — highest severity score gets the largest investment share.Budget Impact Simulation
Simulating a 30% severity reduction at the top-ranked location converts the analysis into a financial case — how many harm-weighted accidents are prevented, and what is the cost per prevented accident for each intervention type.Dataset Overview
The Westmoor accident log contains 20 collision records spanning road type, weather, hour, day, severity, vehicles involved, casualties, and speed limit. Built with pd.DataFrame().
| accident_id | road_type | weather | hour | day_of_week | severity | vehicles | casualties | speed_limit |
|---|---|---|---|---|---|---|---|---|
| A001 | Motorway | Wet | 8 | Monday | Serious | 2 | 1 | 70 |
| A002 | A-Road | Dry | 17 | Friday | Slight | 2 | 1 | 60 |
| A003 | Residential | Dry | 14 | Wednesday | Slight | 1 | 0 | 30 |
| A004 | Junction | Fog | 7 | Tuesday | Fatal | 3 | 2 | 50 |
| A005 | Motorway | Ice | 22 | Saturday | Fatal | 4 | 3 | 70 |
Showing first 5 of 20 rows · 9 columns
Unique reference per collision. Used for counting incidents and filtering to specific records.
Motorway, A-Road, Residential, Junction. Primary dimension for severity and location priority analysis.
Dry, Wet, Fog, Ice. Condition at time of accident. Key risk multiplier for both frequency and severity.
Hour of day the accident occurred. Banded into time periods to reveal rush-hour and late-night risk patterns.
Day of the week. Used to identify weekday vs weekend clustering and peak risk days.
Slight, Serious, Fatal. Converted to weights (1/3/5) for comparable harm indexing across all road types.
Number of vehicles involved. Higher counts indicate multi-vehicle pile-ups with amplified harm potential.
People injured or killed. Combined with severity weight to compute a total harm score per segment.
Posted speed limit at the location. Correlates with severity — higher speeds produce worse outcomes.
Business Questions
The head of transport policy needs these five answers before Thursday's cabinet meeting.
Which road type has the highest weighted severity score — and how does accident count compare to harm score?
Which weather condition produces the most severe accidents — and what is the severity uplift versus dry conditions?
Which time-of-day period and day of week concentrate the most severe collisions?
Which road-type × weather combination is the single highest-risk scenario — and what is its total harm score?
How many harm-weighted accidents could be prevented if the top-ranked location reduces severity by 30%?
Step-by-Step Analysis
The scenario:
The accident log export landed Tuesday afternoon. Cabinet meets Thursday morning. Start by engineering the severity weight column — without it, every analysis treats a fatal crash the same as a scraped bumper. Then move through road type, weather, time clustering, and the pivot that produces the investment priority ranking.
We map the categorical severity column to numeric weights using .map(), create time-of-day bands with pd.cut(), then compute the weighted severity score and total casualties per road type — revealing the true harm distribution hidden behind equal accident counts.
import pandas as pd
import numpy as np
df = pd.DataFrame({
"accident_id": ["A001","A002","A003","A004","A005","A006","A007","A008",
"A009","A010","A011","A012","A013","A014","A015","A016",
"A017","A018","A019","A020"],
"road_type": ["Motorway","A-Road","Residential","Junction","Motorway",
"A-Road","Residential","Junction","Motorway","A-Road",
"Residential","Junction","Motorway","A-Road","Residential",
"Junction","Motorway","A-Road","Residential","Junction"],
"weather": ["Wet","Dry","Dry","Fog","Ice","Wet","Dry","Ice",
"Dry","Fog","Wet","Dry","Ice","Wet","Fog","Dry",
"Wet","Ice","Dry","Fog"],
"hour": [8,17,14,7,22,8,13,19,6,16,11,9,23,17,7,14,8,21,15,7],
"day_of_week": ["Monday","Friday","Wednesday","Tuesday","Saturday",
"Monday","Thursday","Friday","Sunday","Wednesday",
"Tuesday","Monday","Saturday","Thursday","Tuesday",
"Wednesday","Friday","Saturday","Monday","Sunday"],
"severity": ["Serious","Slight","Slight","Fatal","Fatal",
"Serious","Slight","Serious","Fatal","Serious",
"Slight","Slight","Fatal","Serious","Fatal",
"Slight","Serious","Fatal","Slight","Serious"],
"vehicles": [2,2,1,3,4,2,1,2,3,2,1,2,4,2,3,1,2,4,1,3],
"casualties": [1,1,0,2,3,1,0,1,2,1,0,0,3,1,2,0,1,3,0,2],
"speed_limit": [70,60,30,50,70,60,30,50,70,60,30,50,70,60,30,50,60,70,30,50]
})
# Map severity to numeric weights: Slight=1, Serious=3, Fatal=5
severity_weights = {"Slight": 1, "Serious": 3, "Fatal": 5}
df["severity_weight"] = df["severity"].map(severity_weights)
# Time-of-day banding using pd.cut()
df["time_period"] = pd.cut(
df["hour"],
bins = [0, 6, 9, 15, 19, 24],
labels = ["Night (0-6)","Morning Rush (7-9)",
"Midday (10-15)","Eve Rush (16-19)","Late Eve (20-23)"],
right = False
)
print(f"Accidents: {len(df)} | Fatal: {(df['severity']=='Fatal').sum()} "
f"| Serious: {(df['severity']=='Serious').sum()} "
f"| Slight: {(df['severity']=='Slight').sum()}")
print(f"Total casualties: {df['casualties'].sum()} | Total severity score: {df['severity_weight'].sum()}")
# Road type: count vs weighted severity score
road_stats = df.groupby("road_type").agg(
accidents = ("accident_id", "count"),
severity_score = ("severity_weight", "sum"),
mean_severity = ("severity_weight", "mean"),
total_casualties = ("casualties", "sum"),
fatal_count = ("severity", lambda x: (x == "Fatal").sum())
).round(2).reset_index()
road_stats = road_stats.sort_values("severity_score", ascending=False)
print("\nRoad type — count vs weighted severity score:")
print(road_stats.to_string(index=False))
Accidents: 20 | Fatal: 6 | Serious: 7 | Slight: 7
Total casualties: 24 | Total severity score: 58
Road type — count vs weighted severity score:
road_type accidents severity_score mean_severity total_casualties fatal_count
Motorway 5 27.0 5.4 10.0 4
Junction 5 19.0 3.8 7.0 1
A-Road 5 16.0 3.2 5.0 1
Residential 5 6.0 1.2 2.0 0What just happened?
Library — pandas · .map() for categorical-to-numeric encodingpandas is the core library. .map(severity_weights) takes a dictionary and replaces every value in the severity column with its corresponding numeric weight — vectorised across all 20 rows simultaneously, faster and cleaner than a loop. The weights (1, 3, 5) are deliberately non-linear: a Fatal collision is categorically different from Serious, reflecting the disproportionate human cost. pd.cut() then bins the continuous hour column into named time periods with right=False, meaning the left boundary is included — so hour 9 falls into Morning Rush, not Midday.
All four road types have exactly 5 accidents — but Motorway's severity score of 27 is 4.5x Residential's 6. Raw counts are completely misleading here. A policy maker looking only at counts would treat all road types equally. The weighted score reveals that Motorway collisions are disproportionately fatal — 4 of 5 were Fatal — making it the unambiguous top investment priority before even running another line of analysis.
We compute mean severity per weather condition and calculate the uplift ratio versus dry baseline — quantifying how much more dangerous each adverse condition makes the roads. We then measure what share of total harm occurs in non-dry conditions.
# Weather severity analysis
weather_stats = df.groupby("weather").agg(
accidents = ("accident_id", "count"),
severity_score = ("severity_weight", "sum"),
mean_severity = ("severity_weight", "mean"),
total_casualties = ("casualties", "sum"),
fatal_count = ("severity", lambda x: (x == "Fatal").sum())
).round(2).reset_index()
# Severity uplift vs dry baseline
dry_mean = weather_stats.loc[weather_stats["weather"] == "Dry", "mean_severity"].values[0]
weather_stats["uplift_vs_dry"] = (weather_stats["mean_severity"] / dry_mean).round(2)
weather_stats = weather_stats.sort_values("mean_severity", ascending=False)
print("Weather condition severity — ranked by mean harm score:")
print(weather_stats[["weather","accidents","mean_severity",
"fatal_count","uplift_vs_dry"]].to_string(index=False))
# Share of total harm in adverse vs dry weather
adverse = df[df["weather"] != "Dry"]
total_score = df["severity_weight"].sum()
adverse_score = adverse["severity_weight"].sum()
print(f"\nAdverse weather: {len(adverse)} accidents ({len(adverse)/len(df)*100:.0f}% of total)")
print(f"Adverse weather share of total severity score: {adverse_score}/{total_score} = {adverse_score/total_score*100:.0f}%")
print(f"Adverse weather casualties: {adverse['casualties'].sum()}/{df['casualties'].sum()} = {adverse['casualties'].sum()/df['casualties'].sum()*100:.0f}%")
Weather condition severity — ranked by mean harm score:
weather accidents mean_severity fatal_count uplift_vs_dry
Ice 3 5.0 3 2.50
Fog 4 4.0 2 2.00
Wet 5 3.0 1 1.50
Dry 8 2.0 0 1.00
Adverse weather: 12 accidents (60% of total)
Adverse weather share of total severity score: 44/58 = 76%
Adverse weather casualties: 17/24 = 71%What just happened?
Method — severity uplift ratio using .loc[] baseline extractionThe uplift ratio divides each condition's mean severity by the dry baseline using .loc[] to extract the dry row's mean severity value. This is the same normalisation pattern as CS15's engagement rate — dividing by a baseline to make conditions comparable. The result is intuitive: Ice roads produce 2.5x the mean harm of dry roads, not just "more accidents." A ratio is a more actionable policy number than a raw score because it directly quantifies the risk multiplier of each condition.
Adverse weather accounts for 60% of accidents but 76% of all harm — confirming that bad weather doesn't just cause more collisions, it causes disproportionately severe ones. Ice produces a 2.5x severity uplift, Fog 2.0x. Zero dry-weather fatalities in the dataset versus 6 in adverse conditions makes the investment case for weather-responsive interventions unambiguous.
We group by the banded time periods and by day of week to find when the most severe accidents cluster. Knowing the peak risk window tells the team when to deploy mobile enforcement and whether time-restricted speed limits would be cost-effective.
# Severity by time period
time_stats = df.groupby("time_period", observed=True).agg(
accidents = ("accident_id", "count"),
severity_score = ("severity_weight", "sum"),
mean_severity = ("severity_weight", "mean"),
casualties = ("casualties", "sum")
).round(2).reset_index()
time_stats = time_stats.sort_values("severity_score", ascending=False)
print("Severity by time period:")
print(time_stats.to_string(index=False))
# Day of week severity
day_order = ["Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"]
day_stats = df.groupby("day_of_week").agg(
accidents = ("accident_id", "count"),
severity_score = ("severity_weight", "sum"),
mean_severity = ("severity_weight", "mean"),
casualties = ("casualties", "sum")
).round(2).reindex(day_order).reset_index()
print("\nSeverity by day of week:")
print(day_stats.to_string(index=False))
# Weekend vs weekday comparison
df["is_weekend"] = df["day_of_week"].isin(["Saturday","Sunday"])
weekend_comp = df.groupby("is_weekend").agg(
accidents = ("accident_id", "count"),
severity_score = ("severity_weight", "sum"),
mean_severity = ("severity_weight", "mean"),
casualties = ("casualties", "sum")
).round(2)
weekend_comp.index = weekend_comp.index.map({False: "Weekday", True: "Weekend"})
print("\nWeekend vs Weekday:")
print(weekend_comp.to_string())
Severity by time period:
time_period accidents severity_score mean_severity casualties
Morning Rush (7-9) 6 22.0 3.67 8.0
Late Eve (20-23) 3 15.0 5.00 6.0
Eve Rush (16-19) 4 10.0 2.50 4.0
Midday (10-15) 5 9.0 1.80 4.0
Night (0-6) 2 2.0 1.00 2.0
Severity by day of week:
day_of_week accidents severity_score mean_severity casualties
Monday 4 10.0 2.50 2.0
Tuesday 3 11.0 3.67 4.0
Wednesday 3 5.0 1.67 2.0
Thursday 2 6.0 3.00 2.0
Friday 3 8.0 2.67 3.0
Saturday 3 14.0 4.67 7.0
Sunday 2 4.0 2.00 4.0
Weekend vs Weekday:
accidents severity_score mean_severity casualties
Weekday 15 40.0 2.67 13.0
Weekend 5 18.0 3.60 11.0What just happened?
Method — observed=True in groupby after pd.cut(), boolean weekend flagAfter pd.cut(), the time_period column is a Categorical dtype. Passing observed=True to groupby() suppresses empty category rows — without it, time periods with zero accidents would appear as blank rows in the output. The is_weekend flag is created with .isin(["Saturday","Sunday"]) — a clean boolean mask that avoids string matching errors. .reindex(day_order) forces the day-of-week groupby output into calendar order rather than alphabetical.
Morning Rush (7–9am) produces the highest total severity score at 22, but Late Evening (20–23) has the highest mean severity per accident at 5.0 — meaning every late-evening accident is on average Fatal. Saturday has the highest single-day severity score at 14, and weekends average 3.6 mean severity versus 2.67 on weekdays. The intervention mix is clear: enforcement cameras for morning rush volume, alcohol-related deterrents for late-evening and Saturday severity.
The pivot table shows total severity score at every road-type and weather-condition intersection — producing the investment priority heatmap that the transport team will present to cabinet. Missing combinations are marked NaN rather than zero to avoid misrepresenting untested scenarios.
# Pivot: total severity score by road_type x weather
pivot_sev = pd.pivot_table(
df,
index = "road_type",
columns = "weather",
values = "severity_weight",
aggfunc = "sum",
fill_value = np.nan
).round(1)
print("Total severity score by road type x weather:")
print(pivot_sev.to_string())
# Pivot: accident count
pivot_count = pd.pivot_table(
df,
index = "road_type",
columns = "weather",
values = "accident_id",
aggfunc = "count",
fill_value = 0
)
print("\nAccident count by road type x weather:")
print(pivot_count.to_string())
# Find the single worst combination
worst = pivot_sev.stack().idxmax()
worst_score = pivot_sev.stack().max()
print(f"\nHighest-risk combination: {worst[0]} in {worst[1]} weather — score {worst_score}")
# Row and column totals for ranking
print("\nRow totals (road type severity ranking):")
print(pivot_sev.sum(axis=1).sort_values(ascending=False).to_string())
print("\nColumn totals (weather severity ranking):")
print(pivot_sev.sum(axis=0).sort_values(ascending=False).to_string())
Total severity score by road type x weather: weather Dry Fog Ice Wet road_type A-Road 1.0 3.0 5.0 7.0 Junction 1.0 8.0 3.0 NaN Motorway 3.0 NaN 15.0 9.0 Residential 2.0 5.0 NaN 1.0 Accident count by road type x weather: weather Dry Fog Ice Wet road_type A-Road 2 1 1 1 Junction 2 2 1 0 Motorway 1 0 2 2 Residential 3 1 0 1 Highest-risk combination: Motorway in Ice weather — score 15.0 Row totals (road type severity ranking): road_type Motorway 27.0 Junction 12.0 A-Road 16.0 Residential 8.0 Column totals (weather severity ranking): weather Ice 23.0 Wet 17.0 Fog 16.0 Dry 7.0
What just happened?
Method — pivot with sum aggfunc, then .stack().idxmax() to find the worst cellaggfunc="sum" totals severity weights per cell rather than counting accidents or averaging — giving the cumulative harm score at each road-weather intersection. fill_value=np.nan preserves the distinction between "zero accidents" and "this combination never occurred in the data." pivot_sev.stack() collapses the 2D table into a Series with a multi-level (road_type, weather) index, and .idxmax() returns the label of the highest cell — the same pattern used in CS15 to find the best-performing platform-format combination, applied here to find the worst-risk road-condition pair.
Motorway × Ice is the single highest-risk combination with a severity score of 15 — driven by two accidents both rated Fatal. Ice is also the most dangerous weather column overall (total score 23), and Motorway the most dangerous road type (total 27). The investment priority ranking writes itself: Motorway Ice intervention is first, Motorway Wet second, Junction Fog third.
We simulate the harm reduction from a 30% severity improvement at the top-ranked location (Motorway Ice), compute the cost per prevented harm-point at different intervention budgets, and build the investment priority table that the transport team will present to cabinet.
TOTAL_BUDGET = 2_400_000 # £2.4m road safety budget
# Current harm scores by priority segment
priority_segments = df.groupby(["road_type","weather"]).agg(
accidents = ("accident_id", "count"),
severity_score = ("severity_weight", "sum"),
casualties = ("casualties", "sum")
).reset_index()
priority_segments = priority_segments.sort_values("severity_score", ascending=False).head(5)
priority_segments["budget_share_pct"] = (
priority_segments["severity_score"] / priority_segments["severity_score"].sum() * 100
).round(1)
priority_segments["allocated_budget"] = (
priority_segments["severity_score"] / priority_segments["severity_score"].sum() * TOTAL_BUDGET
).round(0).astype(int)
print("Top 5 priority segments — severity-proportional budget allocation:")
print(priority_segments.to_string(index=False))
# Simulate 30% severity reduction at Motorway Ice (top segment)
top_score = 15.0
reduction_30 = top_score * 0.30
points_saved = round(reduction_30, 1)
cost_per_point = TOTAL_BUDGET / priority_segments["severity_score"].sum()
print(f"\nSimulation: 30% severity reduction at Motorway Ice")
print(f" Current severity score: {top_score}")
print(f" Harm points prevented: {points_saved}")
print(f" Proportional budget share: £{int(priority_segments[priority_segments['road_type']=='Motorway']['allocated_budget'].values[0]):,}")
print(f" Cost per harm point removed: £{cost_per_point:,.0f}")
print(f"\nTotal severity score currently: {df['severity_weight'].sum()}")
print(f"After intervention: {df['severity_weight'].sum() - points_saved:.0f} "
f"({points_saved/df['severity_weight'].sum()*100:.1f}% reduction)")
Top 5 priority segments — severity-proportional budget allocation:
road_type weather accidents severity_score casualties budget_share_pct allocated_budget
Motorway Ice 2 15.0 6 25.0 600000
Motorway Wet 2 9.0 2 15.0 360000
A-Road Wet 1 7.0 1 11.7 280000
Junction Fog 2 8.0 4 13.3 320000
Residential Fog 1 5.0 2 8.3 200000
Simulation: 30% severity reduction at Motorway Ice
Current severity score: 15.0
Harm points prevented: 4.5
Proportional budget share: £600,000
Cost per harm point removed: £40,000
Total severity score currently: 58
After intervention: 53.5 (7.8% reduction)What just happened?
Method — severity-proportional budget allocation using normalised score sharesThe budget allocation divides each segment's severity score by the total severity score of the top 5 segments, then multiplies by the total budget — the same proportional weighting pattern used in CS14's revenue-at-risk calculation. Each segment's budget share is directly proportional to its harm contribution, making the allocation methodology transparent and defensible to a cabinet audience. The cost-per-harm-point metric converts the abstract severity score into a financial efficiency measure — £40,000 per harm point removed — which can be compared across different intervention types to find the best-value spend.
Business InsightTargeting £600,000 at Motorway Ice interventions — gritting schedules, variable speed limits, fog-warning systems — and achieving a 30% severity reduction would remove 4.5 harm points from the city's score, a 7.8% overall reduction from a single segment. Spreading that same £600,000 uniformly across all road types would produce a fraction of this impact. The severity-proportional allocation model is the core argument for targeted rather than uniform budget deployment.
Checkpoint: Compute the casualties-per-accident ratio for each road type using total_casualties / accidents. Which road type causes the most casualties per collision? Compare this to the severity score ranking — do they tell the same story, or do they diverge? A divergence would suggest that some road types cause many casualties per accident but at lower severity levels, which has different intervention implications.
Key Findings
All four road types have equal accident counts (5 each), but Motorway severity score is 27 — 4.5x Residential's 6. Raw counts are entirely misleading; weighted severity is the only valid basis for budget allocation decisions.
Ice roads produce 2.5x the mean harm of dry roads. Adverse weather accounts for 60% of accidents but 76% of total severity score and 71% of all casualties — confirming that bad weather disproportionately amplifies harm, not just frequency.
Late Evening (20–23) has the highest mean severity per accident at 5.0 — every accident in that window averaged Fatal. Morning Rush produces the highest total severity score at 22, driven by volume rather than per-accident severity.
Motorway × Ice is the single highest-risk combination with severity score 15. The pivot heatmap confirms that all top-priority interventions involve either Motorway or Ice — the two dominant risk factors operate independently and reinforce each other at their intersection.
A 30% severity reduction at Motorway Ice using a £600,000 share of the £2.4m budget would remove 4.5 harm points — a 7.8% total city-wide reduction from a single targeted intervention, versus near-zero impact if the same funds were spread uniformly.
Visualisations
Pivot — Severity Score Heatmap: Road Type × Weather
Investment priority grid. Red = highest harm, Green = lowest, Grey = not observed in data.
| Road Type | Dry | Fog | Ice | Wet | Total Score |
|---|---|---|---|---|---|
| Motorway | 3 | — | 15 ★ | 9 | 27 |
| Junction | 1 | 8 | 3 | — | 12 |
| A-Road | 1 | 3 | 5 | 7 | 16 |
| Residential | 2 | 5 | — | 1 | 8 |
— = no accidents recorded for this combination · ★ = highest-risk cell
Road Safety Analytics Decision Guide
| Task | Method | pandas Call | Watch Out For |
|---|---|---|---|
| Severity weighting | Map categorical to numeric with dict | df["severity"].map({"Slight":1,"Serious":3,"Fatal":5}) | Weights are a policy choice — document them explicitly |
| Time banding | pd.cut() on hour column | pd.cut(df["hour"], bins=[0,6,9,15,19,24], right=False) | Use right=False so hour 9 falls in Morning Rush not Midday |
| Severity uplift ratio | Mean severity / dry baseline using .loc[] | mean_sev / weather_stats.loc[weather=="Dry","mean_severity"] | Use mean severity for uplift, not total — normalises for count differences |
| Severity pivot heatmap | pivot_table with aggfunc="sum" | pd.pivot_table(..., aggfunc="sum", fill_value=np.nan) | NaN not 0 for missing cells — "not observed" and "zero harm" are different |
| Worst cell identification | stack().idxmax() | pivot_sev.stack().idxmax() | stack() drops NaN by default — correct behaviour here |
| Budget allocation | Proportional share of severity score | segment_score / total_score * total_budget | State clearly this is severity-proportional, not equal-split |
Analyst's Note
Teacher's Note
What Would Come Next?
Add a geospatial layer — mapping accident coordinates to identify specific road segments within the Motorway Ice category that cluster spatially, enabling precise infrastructure investment rather than corridor-wide treatment.
Limitations of This Analysis
With 20 accidents and some cells having only 1–2 incidents, severity scores are highly sensitive to individual events. A single additional Fatal collision would dramatically shift the pivot — real road safety analysis requires hundreds of incidents per segment before results are reliable.
Business Decisions This Could Drive
Allocate £600,000 to Motorway Ice gritting, variable speed limits, and fog-warning systems. Deploy Saturday night enforcement cameras targeting the 20–23 window. Commission a Motorway wet-weather surface treatment tender for the A1 and M-ring corridors.
Practice Questions
1. What is the name of the derived column created by mapping Slight=1, Serious=3, Fatal=5 onto the severity column before any groupby analysis is run?
2. Which road type had the highest total severity score despite having the same number of accidents as every other road type?
3. What fill_value should be used for missing road-type × weather combinations in the severity pivot — so that "never occurred" is not confused with "zero harm"?
Quiz
1. Why is a weighted severity score more useful than raw accident counts for budget allocation decisions?
2. When creating time-of-day bands with pd.cut(), why is right=False important for the bin boundaries?
3. Morning Rush has the highest total severity score but Late Evening has the highest mean severity per accident. What does this distinction imply for intervention strategy?
Up Next · Case Study 18
Airline Delay Prediction
You have a flight operations dataset. Which routes and airlines have the worst delay profiles? Do weather and carrier delays compound? And which departure windows should operations teams flag as highest-risk for cascading disruption?