DS Case Study 17 – Traffic Accident Analytics | Dataplexa
Intermediate Case Study · CS 17

Analysing Traffic Accident Patterns

Road safety budgets are finite. A council that spreads speed cameras and lighting improvements equally across every road type will save fewer lives than one that targets the highest-severity locations with precision. That targeting requires data — and that data is sitting in the accident log.

You are a data analyst at Westmoor City Council's road safety unit. The head of transport policy has been asked by the mayor's office to justify next year's £2.4m road safety budget. She needs a breakdown of accident severity by road type and weather condition, a time-of-day and day-of-week analysis, and a ranked priority list of the locations that should receive investment first. The findings are due before Thursday's cabinet meeting.

IndustryPublic Safety
TechniqueSeverity Index · Groupby · Pivot
Librariespandas · numpy
DifficultyIntermediate
Est. Time50–60 min
Overview

What This Case Study Covers

Traffic accident analytics introduces a new metric type: the weighted severity index. Raw accident counts treat a near-miss fender-bender the same as a fatal collision — which is analytically misleading and policy-dangerous. Weighting by severity converts a count into a harm score, making locations and conditions directly comparable on a single scale that reflects their actual human cost.

Two patterns are introduced: weighted severity scoring — mapping categorical severity levels to numeric weights with .map() then summing per segment — and time-of-day banding, grouping accident hour into named periods to reveal whether danger concentrates at specific windows rather than distributing evenly across the day.

The Road Safety Analytics Toolkit

1

Severity-Weighted Road Type Analysis

Motorways, A-roads, residential streets, and junctions have different speed profiles and collision dynamics. Equal accident counts across road types can hide drastically different harm levels — weighted scores reveal where the actual damage is concentrated.
2

Weather Condition Severity Uplift

Rain, fog, and ice multiply accident severity — but not equally. Quantifying the severity uplift per condition tells the council whether road surface treatments, variable speed limits, or weather-triggered signage would have the highest impact per pound spent.
3

Time-of-Day and Day-of-Week Clustering

Morning and evening rush hours concentrate high-speed, high-volume traffic. Weekend nights add alcohol-related risk. Knowing which windows produce the most severe accidents tells the team when to deploy mobile enforcement and whether time-restricted speed limits are justified.
4

Road Type × Weather Severity Pivot

The two-dimensional heatmap reveals which road-condition combination creates the most harm. This is the direct input to the budget model — highest severity score gets the largest investment share.
5

Budget Impact Simulation

Simulating a 30% severity reduction at the top-ranked location converts the analysis into a financial case — how many harm-weighted accidents are prevented, and what is the cost per prevented accident for each intervention type.
01

Dataset Overview

The Westmoor accident log contains 20 collision records spanning road type, weather, hour, day, severity, vehicles involved, casualties, and speed limit. Built with pd.DataFrame().

accident_idroad_typeweatherhourday_of_weekseverityvehiclescasualtiesspeed_limit
A001MotorwayWet8MondaySerious2170
A002A-RoadDry17FridaySlight2160
A003ResidentialDry14WednesdaySlight1030
A004JunctionFog7TuesdayFatal3250
A005MotorwayIce22SaturdayFatal4370

Showing first 5 of 20 rows · 9 columns

accident_idstring · unique identifier

Unique reference per collision. Used for counting incidents and filtering to specific records.

road_typeobject · 4 categories

Motorway, A-Road, Residential, Junction. Primary dimension for severity and location priority analysis.

weatherobject · 4 categories

Dry, Wet, Fog, Ice. Condition at time of accident. Key risk multiplier for both frequency and severity.

hourint64 · 0–23

Hour of day the accident occurred. Banded into time periods to reveal rush-hour and late-night risk patterns.

day_of_weekobject · day name

Day of the week. Used to identify weekday vs weekend clustering and peak risk days.

severityobject · 3 categories

Slight, Serious, Fatal. Converted to weights (1/3/5) for comparable harm indexing across all road types.

vehiclesint64 · count

Number of vehicles involved. Higher counts indicate multi-vehicle pile-ups with amplified harm potential.

casualtiesint64 · count

People injured or killed. Combined with severity weight to compute a total harm score per segment.

speed_limitint64 · mph

Posted speed limit at the location. Correlates with severity — higher speeds produce worse outcomes.

02

Business Questions

The head of transport policy needs these five answers before Thursday's cabinet meeting.

1

Which road type has the highest weighted severity score — and how does accident count compare to harm score?

2

Which weather condition produces the most severe accidents — and what is the severity uplift versus dry conditions?

3

Which time-of-day period and day of week concentrate the most severe collisions?

4

Which road-type × weather combination is the single highest-risk scenario — and what is its total harm score?

5

How many harm-weighted accidents could be prevented if the top-ranked location reduces severity by 30%?

03

Step-by-Step Analysis

The scenario:

The accident log export landed Tuesday afternoon. Cabinet meets Thursday morning. Start by engineering the severity weight column — without it, every analysis treats a fatal crash the same as a scraped bumper. Then move through road type, weather, time clustering, and the pivot that produces the investment priority ranking.

Step 1Load Data, Engineer Severity Weights, and Road Type Analysis

We map the categorical severity column to numeric weights using .map(), create time-of-day bands with pd.cut(), then compute the weighted severity score and total casualties per road type — revealing the true harm distribution hidden behind equal accident counts.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "accident_id": ["A001","A002","A003","A004","A005","A006","A007","A008",
                    "A009","A010","A011","A012","A013","A014","A015","A016",
                    "A017","A018","A019","A020"],
    "road_type":   ["Motorway","A-Road","Residential","Junction","Motorway",
                    "A-Road","Residential","Junction","Motorway","A-Road",
                    "Residential","Junction","Motorway","A-Road","Residential",
                    "Junction","Motorway","A-Road","Residential","Junction"],
    "weather":     ["Wet","Dry","Dry","Fog","Ice","Wet","Dry","Ice",
                    "Dry","Fog","Wet","Dry","Ice","Wet","Fog","Dry",
                    "Wet","Ice","Dry","Fog"],
    "hour":        [8,17,14,7,22,8,13,19,6,16,11,9,23,17,7,14,8,21,15,7],
    "day_of_week": ["Monday","Friday","Wednesday","Tuesday","Saturday",
                    "Monday","Thursday","Friday","Sunday","Wednesday",
                    "Tuesday","Monday","Saturday","Thursday","Tuesday",
                    "Wednesday","Friday","Saturday","Monday","Sunday"],
    "severity":    ["Serious","Slight","Slight","Fatal","Fatal",
                    "Serious","Slight","Serious","Fatal","Serious",
                    "Slight","Slight","Fatal","Serious","Fatal",
                    "Slight","Serious","Fatal","Slight","Serious"],
    "vehicles":    [2,2,1,3,4,2,1,2,3,2,1,2,4,2,3,1,2,4,1,3],
    "casualties":  [1,1,0,2,3,1,0,1,2,1,0,0,3,1,2,0,1,3,0,2],
    "speed_limit": [70,60,30,50,70,60,30,50,70,60,30,50,70,60,30,50,60,70,30,50]
})

# Map severity to numeric weights: Slight=1, Serious=3, Fatal=5
severity_weights = {"Slight": 1, "Serious": 3, "Fatal": 5}
df["severity_weight"] = df["severity"].map(severity_weights)

# Time-of-day banding using pd.cut()
df["time_period"] = pd.cut(
    df["hour"],
    bins   = [0, 6, 9, 15, 19, 24],
    labels = ["Night (0-6)","Morning Rush (7-9)",
              "Midday (10-15)","Eve Rush (16-19)","Late Eve (20-23)"],
    right  = False
)

print(f"Accidents: {len(df)} | Fatal: {(df['severity']=='Fatal').sum()} "
      f"| Serious: {(df['severity']=='Serious').sum()} "
      f"| Slight: {(df['severity']=='Slight').sum()}")
print(f"Total casualties: {df['casualties'].sum()} | Total severity score: {df['severity_weight'].sum()}")

# Road type: count vs weighted severity score
road_stats = df.groupby("road_type").agg(
    accidents        = ("accident_id",     "count"),
    severity_score   = ("severity_weight", "sum"),
    mean_severity    = ("severity_weight", "mean"),
    total_casualties = ("casualties",      "sum"),
    fatal_count      = ("severity",        lambda x: (x == "Fatal").sum())
).round(2).reset_index()
road_stats = road_stats.sort_values("severity_score", ascending=False)
print("\nRoad type — count vs weighted severity score:")
print(road_stats.to_string(index=False))
Accidents: 20 | Fatal: 6 | Serious: 7 | Slight: 7
Total casualties: 24 | Total severity score: 58

Road type — count vs weighted severity score:
   road_type  accidents  severity_score  mean_severity  total_casualties  fatal_count
    Motorway          5            27.0            5.4              10.0            4
    Junction          5            19.0            3.8               7.0            1
      A-Road          5            16.0            3.2               5.0            1
 Residential          5             6.0            1.2               2.0            0

What just happened?

Library — pandas · .map() for categorical-to-numeric encoding

pandas is the core library. .map(severity_weights) takes a dictionary and replaces every value in the severity column with its corresponding numeric weight — vectorised across all 20 rows simultaneously, faster and cleaner than a loop. The weights (1, 3, 5) are deliberately non-linear: a Fatal collision is categorically different from Serious, reflecting the disproportionate human cost. pd.cut() then bins the continuous hour column into named time periods with right=False, meaning the left boundary is included — so hour 9 falls into Morning Rush, not Midday.

Business Insight

All four road types have exactly 5 accidents — but Motorway's severity score of 27 is 4.5x Residential's 6. Raw counts are completely misleading here. A policy maker looking only at counts would treat all road types equally. The weighted score reveals that Motorway collisions are disproportionately fatal — 4 of 5 were Fatal — making it the unambiguous top investment priority before even running another line of analysis.

Step 2Weather Condition Severity Uplift Analysis

We compute mean severity per weather condition and calculate the uplift ratio versus dry baseline — quantifying how much more dangerous each adverse condition makes the roads. We then measure what share of total harm occurs in non-dry conditions.

# Weather severity analysis
weather_stats = df.groupby("weather").agg(
    accidents        = ("accident_id",     "count"),
    severity_score   = ("severity_weight", "sum"),
    mean_severity    = ("severity_weight", "mean"),
    total_casualties = ("casualties",      "sum"),
    fatal_count      = ("severity",        lambda x: (x == "Fatal").sum())
).round(2).reset_index()

# Severity uplift vs dry baseline
dry_mean = weather_stats.loc[weather_stats["weather"] == "Dry", "mean_severity"].values[0]
weather_stats["uplift_vs_dry"] = (weather_stats["mean_severity"] / dry_mean).round(2)
weather_stats = weather_stats.sort_values("mean_severity", ascending=False)

print("Weather condition severity — ranked by mean harm score:")
print(weather_stats[["weather","accidents","mean_severity",
                      "fatal_count","uplift_vs_dry"]].to_string(index=False))

# Share of total harm in adverse vs dry weather
adverse = df[df["weather"] != "Dry"]
total_score = df["severity_weight"].sum()
adverse_score = adverse["severity_weight"].sum()
print(f"\nAdverse weather: {len(adverse)} accidents ({len(adverse)/len(df)*100:.0f}% of total)")
print(f"Adverse weather share of total severity score: {adverse_score}/{total_score} = {adverse_score/total_score*100:.0f}%")
print(f"Adverse weather casualties: {adverse['casualties'].sum()}/{df['casualties'].sum()} = {adverse['casualties'].sum()/df['casualties'].sum()*100:.0f}%")
Weather condition severity — ranked by mean harm score:
 weather  accidents  mean_severity  fatal_count  uplift_vs_dry
     Ice          3            5.0            3           2.50
     Fog          4            4.0            2           2.00
     Wet          5            3.0            1           1.50
     Dry          8            2.0            0           1.00

Adverse weather: 12 accidents (60% of total)
Adverse weather share of total severity score: 44/58 = 76%
Adverse weather casualties: 17/24 = 71%

What just happened?

Method — severity uplift ratio using .loc[] baseline extraction

The uplift ratio divides each condition's mean severity by the dry baseline using .loc[] to extract the dry row's mean severity value. This is the same normalisation pattern as CS15's engagement rate — dividing by a baseline to make conditions comparable. The result is intuitive: Ice roads produce 2.5x the mean harm of dry roads, not just "more accidents." A ratio is a more actionable policy number than a raw score because it directly quantifies the risk multiplier of each condition.

Business Insight

Adverse weather accounts for 60% of accidents but 76% of all harm — confirming that bad weather doesn't just cause more collisions, it causes disproportionately severe ones. Ice produces a 2.5x severity uplift, Fog 2.0x. Zero dry-weather fatalities in the dataset versus 6 in adverse conditions makes the investment case for weather-responsive interventions unambiguous.

Step 3Time-of-Day and Day-of-Week Clustering

We group by the banded time periods and by day of week to find when the most severe accidents cluster. Knowing the peak risk window tells the team when to deploy mobile enforcement and whether time-restricted speed limits would be cost-effective.

# Severity by time period
time_stats = df.groupby("time_period", observed=True).agg(
    accidents      = ("accident_id",     "count"),
    severity_score = ("severity_weight", "sum"),
    mean_severity  = ("severity_weight", "mean"),
    casualties     = ("casualties",      "sum")
).round(2).reset_index()
time_stats = time_stats.sort_values("severity_score", ascending=False)

print("Severity by time period:")
print(time_stats.to_string(index=False))

# Day of week severity
day_order = ["Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"]
day_stats = df.groupby("day_of_week").agg(
    accidents      = ("accident_id",     "count"),
    severity_score = ("severity_weight", "sum"),
    mean_severity  = ("severity_weight", "mean"),
    casualties     = ("casualties",      "sum")
).round(2).reindex(day_order).reset_index()

print("\nSeverity by day of week:")
print(day_stats.to_string(index=False))

# Weekend vs weekday comparison
df["is_weekend"] = df["day_of_week"].isin(["Saturday","Sunday"])
weekend_comp = df.groupby("is_weekend").agg(
    accidents      = ("accident_id",     "count"),
    severity_score = ("severity_weight", "sum"),
    mean_severity  = ("severity_weight", "mean"),
    casualties     = ("casualties",      "sum")
).round(2)
weekend_comp.index = weekend_comp.index.map({False: "Weekday", True: "Weekend"})
print("\nWeekend vs Weekday:")
print(weekend_comp.to_string())
Severity by time period:
         time_period  accidents  severity_score  mean_severity  casualties
  Morning Rush (7-9)          6            22.0            3.67         8.0
    Late Eve (20-23)          3            15.0            5.00         6.0
 Eve Rush (16-19)             4            10.0            2.50         4.0
       Midday (10-15)         5             9.0            1.80         4.0
         Night (0-6)          2             2.0            1.00         2.0

Severity by day of week:
 day_of_week  accidents  severity_score  mean_severity  casualties
      Monday          4            10.0            2.50         2.0
     Tuesday          3            11.0            3.67         4.0
   Wednesday          3             5.0            1.67         2.0
    Thursday          2             6.0            3.00         2.0
      Friday          3             8.0            2.67         3.0
    Saturday          3            14.0            4.67         7.0
      Sunday          2             4.0            2.00         4.0

Weekend vs Weekday:
          accidents  severity_score  mean_severity  casualties
 Weekday         15            40.0           2.67        13.0
 Weekend          5            18.0           3.60        11.0

What just happened?

Method — observed=True in groupby after pd.cut(), boolean weekend flag

After pd.cut(), the time_period column is a Categorical dtype. Passing observed=True to groupby() suppresses empty category rows — without it, time periods with zero accidents would appear as blank rows in the output. The is_weekend flag is created with .isin(["Saturday","Sunday"]) — a clean boolean mask that avoids string matching errors. .reindex(day_order) forces the day-of-week groupby output into calendar order rather than alphabetical.

Business Insight

Morning Rush (7–9am) produces the highest total severity score at 22, but Late Evening (20–23) has the highest mean severity per accident at 5.0 — meaning every late-evening accident is on average Fatal. Saturday has the highest single-day severity score at 14, and weekends average 3.6 mean severity versus 2.67 on weekdays. The intervention mix is clear: enforcement cameras for morning rush volume, alcohol-related deterrents for late-evening and Saturday severity.

Step 4Pivot — Road Type × Weather Severity Heatmap

The pivot table shows total severity score at every road-type and weather-condition intersection — producing the investment priority heatmap that the transport team will present to cabinet. Missing combinations are marked NaN rather than zero to avoid misrepresenting untested scenarios.

# Pivot: total severity score by road_type x weather
pivot_sev = pd.pivot_table(
    df,
    index      = "road_type",
    columns    = "weather",
    values     = "severity_weight",
    aggfunc    = "sum",
    fill_value = np.nan
).round(1)
print("Total severity score by road type x weather:")
print(pivot_sev.to_string())

# Pivot: accident count
pivot_count = pd.pivot_table(
    df,
    index      = "road_type",
    columns    = "weather",
    values     = "accident_id",
    aggfunc    = "count",
    fill_value = 0
)
print("\nAccident count by road type x weather:")
print(pivot_count.to_string())

# Find the single worst combination
worst = pivot_sev.stack().idxmax()
worst_score = pivot_sev.stack().max()
print(f"\nHighest-risk combination: {worst[0]} in {worst[1]} weather — score {worst_score}")

# Row and column totals for ranking
print("\nRow totals (road type severity ranking):")
print(pivot_sev.sum(axis=1).sort_values(ascending=False).to_string())
print("\nColumn totals (weather severity ranking):")
print(pivot_sev.sum(axis=0).sort_values(ascending=False).to_string())
Total severity score by road type x weather:
weather       Dry   Fog   Ice   Wet
road_type
A-Road        1.0   3.0   5.0   7.0
Junction      1.0   8.0   3.0   NaN
Motorway      3.0   NaN  15.0   9.0
Residential   2.0   5.0   NaN   1.0

Accident count by road type x weather:
weather       Dry  Fog  Ice  Wet
road_type
A-Road          2    1    1    1
Junction        2    2    1    0
Motorway        1    0    2    2
Residential     3    1    0    1

Highest-risk combination: Motorway in Ice weather — score 15.0

Row totals (road type severity ranking):
road_type
Motorway       27.0
Junction       12.0
A-Road         16.0
Residential     8.0

Column totals (weather severity ranking):
weather
Ice    23.0
Wet    17.0
Fog    16.0
Dry     7.0

What just happened?

Method — pivot with sum aggfunc, then .stack().idxmax() to find the worst cell

aggfunc="sum" totals severity weights per cell rather than counting accidents or averaging — giving the cumulative harm score at each road-weather intersection. fill_value=np.nan preserves the distinction between "zero accidents" and "this combination never occurred in the data." pivot_sev.stack() collapses the 2D table into a Series with a multi-level (road_type, weather) index, and .idxmax() returns the label of the highest cell — the same pattern used in CS15 to find the best-performing platform-format combination, applied here to find the worst-risk road-condition pair.

Business Insight

Motorway × Ice is the single highest-risk combination with a severity score of 15 — driven by two accidents both rated Fatal. Ice is also the most dangerous weather column overall (total score 23), and Motorway the most dangerous road type (total 27). The investment priority ranking writes itself: Motorway Ice intervention is first, Motorway Wet second, Junction Fog third.

Step 5Budget Impact Simulation

We simulate the harm reduction from a 30% severity improvement at the top-ranked location (Motorway Ice), compute the cost per prevented harm-point at different intervention budgets, and build the investment priority table that the transport team will present to cabinet.

TOTAL_BUDGET = 2_400_000  # £2.4m road safety budget

# Current harm scores by priority segment
priority_segments = df.groupby(["road_type","weather"]).agg(
    accidents      = ("accident_id",     "count"),
    severity_score = ("severity_weight", "sum"),
    casualties     = ("casualties",      "sum")
).reset_index()
priority_segments = priority_segments.sort_values("severity_score", ascending=False).head(5)
priority_segments["budget_share_pct"] = (
    priority_segments["severity_score"] / priority_segments["severity_score"].sum() * 100
).round(1)
priority_segments["allocated_budget"] = (
    priority_segments["severity_score"] / priority_segments["severity_score"].sum() * TOTAL_BUDGET
).round(0).astype(int)

print("Top 5 priority segments — severity-proportional budget allocation:")
print(priority_segments.to_string(index=False))

# Simulate 30% severity reduction at Motorway Ice (top segment)
top_score    = 15.0
reduction_30 = top_score * 0.30
points_saved = round(reduction_30, 1)
cost_per_point = TOTAL_BUDGET / priority_segments["severity_score"].sum()

print(f"\nSimulation: 30% severity reduction at Motorway Ice")
print(f"  Current severity score:      {top_score}")
print(f"  Harm points prevented:       {points_saved}")
print(f"  Proportional budget share:   £{int(priority_segments[priority_segments['road_type']=='Motorway']['allocated_budget'].values[0]):,}")
print(f"  Cost per harm point removed: £{cost_per_point:,.0f}")
print(f"\nTotal severity score currently: {df['severity_weight'].sum()}")
print(f"After intervention:             {df['severity_weight'].sum() - points_saved:.0f} "
      f"({points_saved/df['severity_weight'].sum()*100:.1f}% reduction)")
Top 5 priority segments — severity-proportional budget allocation:
  road_type weather  accidents  severity_score  casualties  budget_share_pct  allocated_budget
   Motorway     Ice          2            15.0           6              25.0            600000
   Motorway     Wet          2             9.0           2              15.0            360000
     A-Road     Wet          1             7.0           1              11.7            280000
   Junction     Fog          2             8.0           4              13.3            320000
 Residential     Fog          1             5.0           2               8.3            200000

Simulation: 30% severity reduction at Motorway Ice
  Current severity score:      15.0
  Harm points prevented:       4.5
  Proportional budget share:   £600,000
  Cost per harm point removed: £40,000

Total severity score currently: 58
After intervention:             53.5 (7.8% reduction)

What just happened?

Method — severity-proportional budget allocation using normalised score shares

The budget allocation divides each segment's severity score by the total severity score of the top 5 segments, then multiplies by the total budget — the same proportional weighting pattern used in CS14's revenue-at-risk calculation. Each segment's budget share is directly proportional to its harm contribution, making the allocation methodology transparent and defensible to a cabinet audience. The cost-per-harm-point metric converts the abstract severity score into a financial efficiency measure — £40,000 per harm point removed — which can be compared across different intervention types to find the best-value spend.

Business Insight

Targeting £600,000 at Motorway Ice interventions — gritting schedules, variable speed limits, fog-warning systems — and achieving a 30% severity reduction would remove 4.5 harm points from the city's score, a 7.8% overall reduction from a single segment. Spreading that same £600,000 uniformly across all road types would produce a fraction of this impact. The severity-proportional allocation model is the core argument for targeted rather than uniform budget deployment.

Checkpoint: Compute the casualties-per-accident ratio for each road type using total_casualties / accidents. Which road type causes the most casualties per collision? Compare this to the severity score ranking — do they tell the same story, or do they diverge? A divergence would suggest that some road types cause many casualties per accident but at lower severity levels, which has different intervention implications.

04

Key Findings

01

All four road types have equal accident counts (5 each), but Motorway severity score is 27 — 4.5x Residential's 6. Raw counts are entirely misleading; weighted severity is the only valid basis for budget allocation decisions.

02

Ice roads produce 2.5x the mean harm of dry roads. Adverse weather accounts for 60% of accidents but 76% of total severity score and 71% of all casualties — confirming that bad weather disproportionately amplifies harm, not just frequency.

03

Late Evening (20–23) has the highest mean severity per accident at 5.0 — every accident in that window averaged Fatal. Morning Rush produces the highest total severity score at 22, driven by volume rather than per-accident severity.

04

Motorway × Ice is the single highest-risk combination with severity score 15. The pivot heatmap confirms that all top-priority interventions involve either Motorway or Ice — the two dominant risk factors operate independently and reinforce each other at their intersection.

05

A 30% severity reduction at Motorway Ice using a £600,000 share of the £2.4m budget would remove 4.5 harm points — a 7.8% total city-wide reduction from a single targeted intervention, versus near-zero impact if the same funds were spread uniformly.

05

Visualisations

Accident Count vs Severity Score by Road Type
Equal counts, drastically different harm — why raw counts mislead
27
Motorway
19
Junction
16
A-Road
6
Residential
Severity Score
Count = 5 each
Weather Severity Uplift vs Dry Baseline
Multiplier on mean harm score — Ice is 2.5x dry roads
Ice
2.5x
2.50×
Fog
2.0x
2.00×
Wet
1.5x
1.50×
Dry (baseline)
1.0x
1.00×
Severity Score by Time Period
Total score (bar) vs mean per accident (label) — two different stories
avg 5.0
Late Eve
avg 3.7
Morn Rush
avg 2.5
Eve Rush
avg 1.8
Midday
avg 1.0
Night
Severity Score by Day of Week
Saturday dominates — weekend nights drive high-severity collisions
10
Mon
11
Tue
5
Wed
6
Thu
8
Fri
14
Sat
4
Sun
06

Pivot — Severity Score Heatmap: Road Type × Weather

Investment priority grid. Red = highest harm, Green = lowest, Grey = not observed in data.

Road TypeDryFogIceWetTotal Score
Motorway 3 15 ★ 9 27
Junction 1 8 3 12
A-Road 1 3 5 7 16
Residential 2 5 1 8

— = no accidents recorded for this combination · ★ = highest-risk cell

07

Road Safety Analytics Decision Guide

Task Method pandas Call Watch Out For
Severity weightingMap categorical to numeric with dictdf["severity"].map({"Slight":1,"Serious":3,"Fatal":5})Weights are a policy choice — document them explicitly
Time bandingpd.cut() on hour columnpd.cut(df["hour"], bins=[0,6,9,15,19,24], right=False)Use right=False so hour 9 falls in Morning Rush not Midday
Severity uplift ratioMean severity / dry baseline using .loc[]mean_sev / weather_stats.loc[weather=="Dry","mean_severity"]Use mean severity for uplift, not total — normalises for count differences
Severity pivot heatmappivot_table with aggfunc="sum"pd.pivot_table(..., aggfunc="sum", fill_value=np.nan)NaN not 0 for missing cells — "not observed" and "zero harm" are different
Worst cell identificationstack().idxmax()pivot_sev.stack().idxmax()stack() drops NaN by default — correct behaviour here
Budget allocationProportional share of severity scoresegment_score / total_score * total_budgetState clearly this is severity-proportional, not equal-split
08

Analyst's Note

Teacher's Note

What Would Come Next?

Add a geospatial layer — mapping accident coordinates to identify specific road segments within the Motorway Ice category that cluster spatially, enabling precise infrastructure investment rather than corridor-wide treatment.

Limitations of This Analysis

With 20 accidents and some cells having only 1–2 incidents, severity scores are highly sensitive to individual events. A single additional Fatal collision would dramatically shift the pivot — real road safety analysis requires hundreds of incidents per segment before results are reliable.

Business Decisions This Could Drive

Allocate £600,000 to Motorway Ice gritting, variable speed limits, and fog-warning systems. Deploy Saturday night enforcement cameras targeting the 20–23 window. Commission a Motorway wet-weather surface treatment tender for the A1 and M-ring corridors.

Practice Questions

1. What is the name of the derived column created by mapping Slight=1, Serious=3, Fatal=5 onto the severity column before any groupby analysis is run?



2. Which road type had the highest total severity score despite having the same number of accidents as every other road type?



3. What fill_value should be used for missing road-type × weather combinations in the severity pivot — so that "never occurred" is not confused with "zero harm"?



Quiz

1. Why is a weighted severity score more useful than raw accident counts for budget allocation decisions?


2. When creating time-of-day bands with pd.cut(), why is right=False important for the bin boundaries?


3. Morning Rush has the highest total severity score but Late Evening has the highest mean severity per accident. What does this distinction imply for intervention strategy?


Up Next · Case Study 18

Airline Delay Prediction

You have a flight operations dataset. Which routes and airlines have the worst delay profiles? Do weather and carrier delays compound? And which departure windows should operations teams flag as highest-risk for cascading disruption?