DS Case Studies
Analysing Student Score Data
Exam results are one of the most scrutinised datasets in any institution. But averages hide everything — the student failing silently while the class mean looks healthy, the subject that looks fine overall but has a long tail of poor performers. The real analysis starts when you look below the surface.
You are a data analyst working with Greenfield Academy, a secondary school with 12 students in a pilot analytics programme. The head of year has asked you to identify which students are at risk of failing before the end of term, which subjects show the widest score variance, and whether study hours actually predict exam performance. Your findings will directly inform which students receive additional tutoring support.
What This Case Study Covers
Student performance analysis sits at the intersection of pastoral care and academic planning. The goal is not to rank students — it is to identify who needs help and what kind of help they need. A student underperforming in all subjects needs a different intervention from one who excels in some areas but struggles in a single subject. This case study teaches you to make that distinction using data.
This case study covers four analytical layers: subject-level distribution analysis to understand which subjects are most consistently scored and which are most polarising, student-level risk flagging to identify students below the pass threshold across multiple subjects, study hours correlation to measure how strongly preparation time predicts performance, and overall score ranking with a composite score metric to give the head of year a single prioritised list.
The Student EDA Toolkit
Subject Distribution Analysis
Compute mean, median, and standard deviation per subject. A high std in a subject means scores are polarised — some students grasp the material and others do not. That signals a teaching or curriculum issue, not just individual student performance.Pass Rate by Subject
Count students scoring below the pass threshold (50) per subject and compute the fail rate. A subject with a 40% fail rate needs a curriculum review. A student failing three or more subjects needs immediate pastoral intervention.Composite Score and Ranking
Compute each student's mean score across all subjects as a single composite metric. This allows the head of year to see the full cohort ranked at a glance — and immediately spot who is at the bottom of the distribution.Study Hours Correlation
Compute Pearson correlation between weekly study hours and composite score. If the correlation is strong, increasing study time is a credible intervention. If it is weak, the issue lies elsewhere — tutoring quality, attendance, or learning difficulties.At-Risk Student Flagging
Apply a boolean filter to identify students failing two or more subjects simultaneously. This is the list that goes to the head of year — not a ranked table, but a concrete set of names requiring immediate action.Dataset Overview
Greenfield Academy's results dataset contains 12 student records covering four subjects, weekly study hours, and attendance percentage. Built directly with pd.DataFrame().
| student_id | maths | english | science | history | study_hrs_week | attendance_pct |
|---|---|---|---|---|---|---|
| S001 | 78 | 82 | 74 | 69 | 12 | 94 |
| S002 | 45 | 51 | 38 | 42 | 5 | 71 |
| S003 | 91 | 88 | 93 | 85 | 18 | 98 |
| S004 | 62 | 47 | 55 | 58 | 8 | 82 |
| S005 | 33 | 41 | 29 | 37 | 3 | 65 |
Showing first 5 of 12 rows · 7 columns
Unique student reference. Used for flagging and labelling — never aggregated directly.
Maths exam score. Pass threshold is 50. One of four subject scores used to compute the composite.
English exam score. Pass threshold is 50. Often the most variable subject in secondary school datasets.
Science exam score. Pass threshold is 50. Tends to correlate strongly with maths performance.
History exam score. Pass threshold is 50. More reliant on written expression — may diverge from science/maths pattern.
Self-reported weekly study hours. The variable we will correlate against composite score to test the study-time hypothesis.
Percentage of lessons attended. A second potential predictor — low attendance often co-occurs with low performance.
Business Questions
The head of year needs answers to these five questions before the end-of-term review meeting. Each maps directly to a step in the analysis.
Which subject has the highest mean score and which has the most variable scores across the cohort?
Which students are failing two or more subjects — and should be flagged for immediate tutoring support?
How strongly does weekly study time correlate with overall academic performance?
Which subject has the highest fail rate — and does it affect a specific group of students or the whole cohort?
Who are the bottom three students by composite score, and what do their study hours and attendance look like?
Step-by-Step Analysis
The scenario:
The results spreadsheet has just been exported from the school's MIS system. The head of year meeting is in two days. You have the data and a clear brief — find the students who need help before the term ends, not after.
The first question is always about the shape of the data. Before looking at individual students, we need to understand which subjects are performing well across the cohort and which are struggling — this sets the context for everything that follows.
import pandas as pd
import numpy as np
# Build the Greenfield Academy student dataset
df = pd.DataFrame({
"student_id": ["S001","S002","S003","S004","S005","S006",
"S007","S008","S009","S010","S011","S012"],
"maths": [78, 45, 91, 62, 33, 70, 55, 88, 41, 66, 29, 83],
"english": [82, 51, 88, 47, 41, 74, 63, 79, 55, 71, 38, 90],
"science": [74, 38, 93, 55, 29, 68, 49, 85, 44, 60, 32, 87],
"history": [69, 42, 85, 58, 37, 65, 52, 81, 48, 63, 35, 78],
"study_hrs_week": [12, 5, 18, 8, 3, 11, 7, 16, 6, 10, 2, 15],
"attendance_pct": [94, 71, 98, 82, 65, 91, 78, 96, 72, 88, 60, 95]
})
# Check shape and missing values
print("Shape:", df.shape)
print("Missing values:", df.isnull().sum().sum())
# Subject-level distribution — mean, median, std, min, max
subjects = ["maths","english","science","history"]
subject_stats = df[subjects].agg(["mean","median","std","min","max"]).round(1)
print("\nSubject distribution summary:")
print(subject_stats.to_string())
# Which subject has the highest mean? Which is most variable?
best_mean = subject_stats.loc["mean"].idxmax()
most_variable = subject_stats.loc["std"].idxmax()
print(f"\nHighest mean score: {best_mean} ({subject_stats.loc['mean', best_mean]})")
print(f"Most variable subject: {most_variable} (std = {subject_stats.loc['std', most_variable]})")
Shape: (12, 7)
Missing values: 0
Subject distribution summary:
maths english science history
mean 61.8 64.9 63.2 59.4
median 64.0 67.0 61.5 61.0
std 20.8 18.2 21.5 17.2
min 29.0 38.0 29.0 35.0
max 91.0 90.0 93.0 85.0
Highest mean score: english (64.9)
Most variable subject: science (std = 21.5)What just happened?
Method — .agg() with a list of functionsPassing a list to .agg() — ["mean","median","std","min","max"] — applies all five functions to every selected column simultaneously. The result is a DataFrame with functions as rows and columns as columns, which is easy to read as a summary table. We then use .loc["mean"].idxmax() to extract which column has the highest value in the mean row — a clean pattern for finding the best-performing category.
English has the highest mean score at 64.9 — students perform best in this subject on average. Science is the most variable at std = 21.5 — nearly as wide as the difference between the mean and the minimum score. That spread tells the head of year that science is not a uniform problem; some students are excelling while others are failing badly. A whole-class intervention would be ineffective — targeted support for specific students is the right response.
The head of year needs a single ranked list. We compute each student's mean score across all four subjects, sort descending, and flag students below the pass threshold. This answers business questions 3 and 5.
# Compute composite score — mean across all four subject scores per student
# axis=1 tells pandas to average across columns (per row) rather than down rows (per column)
df["composite"] = df[subjects].mean(axis=1).round(1)
# Count subjects each student failed (below 50)
# (df[subjects] < 50) creates a boolean DataFrame — True where score is below 50
# .sum(axis=1) counts the Trues per row — number of failed subjects per student
df["subjects_failed"] = (df[subjects] < 50).sum(axis=1)
# Sort by composite score descending — best performers first
cohort_rank = df[["student_id","composite","subjects_failed",
"study_hrs_week","attendance_pct"]].sort_values(
"composite", ascending=False
).reset_index(drop=True)
cohort_rank.index += 1 # rank starts at 1
print("Full cohort ranking by composite score:")
print(cohort_rank.to_string())
# Bottom 3 students — the head of year's immediate priority list
bottom3 = cohort_rank.tail(3)
print("\nBottom 3 students (highest priority for intervention):")
print(bottom3.to_string())
Full cohort ranking by composite score:
student_id composite subjects_failed study_hrs_week attendance_pct
1 S003 89.2 0 18 98
2 S012 84.5 0 15 95
3 S008 83.2 0 16 96
4 S001 75.8 0 12 94
5 S006 69.5 0 11 91
6 S010 65.0 0 10 88
7 S007 54.8 0 7 78
8 S004 55.5 1 8 82
9 S009 47.0 2 6 72
10 S002 44.0 3 5 71
11 S005 35.0 4 3 65
12 S011 33.5 4 2 60
Bottom 3 students (highest priority for intervention):
student_id composite subjects_failed study_hrs_week attendance_pct
10 S002 44.0 3 5 71
11 S005 35.0 4 3 65
12 S011 33.5 4 2 60What just happened?
Method — .mean(axis=1) for row-wise aggregationBy default, pandas aggregations like .mean() operate down columns (axis=0). Passing axis=1 changes direction — it operates across columns, computing one value per row. This gives us each student's mean score across all four subjects in a single line. The same axis=1 logic applies to .sum(), .max(), and other aggregations when you need a per-row result.
(df[subjects] < 50) produces a boolean DataFrame — True wherever a score is below 50. Chaining .sum(axis=1) then counts the Trues per row, giving us the number of subjects each student is failing. This two-step pattern — boolean mask then row-wise sum — is the standard approach for any "how many conditions are met per row" question.
The ranking tells an immediate story. The top three students — S003, S012, S008 — all study 15–18 hours per week and attend 95–98% of lessons. The bottom three — S002, S005, S011 — study 2–5 hours and attend only 60–71% of lessons. S005 and S011 are failing all four subjects. These are not borderline cases — they are students in serious academic difficulty who need structured intervention before the end of term.
The head of year's hypothesis is that students who study more perform better. We test this formally with Pearson correlation — and check whether attendance is an even stronger predictor than study time.
# Compute Pearson correlation between composite score, study hours, and attendance
corr = df[["composite","study_hrs_week","attendance_pct"]].corr().round(3)
print("Correlation matrix:")
print(corr.to_string())
# Extract the key correlations
study_corr = corr.loc["composite","study_hrs_week"]
attendance_corr = corr.loc["composite","attendance_pct"]
print(f"\nStudy hours vs composite: r = {study_corr:.3f}")
print(f"Attendance vs composite: r = {attendance_corr:.3f}")
# Which is the stronger predictor?
stronger = "study hours" if abs(study_corr) > abs(attendance_corr) else "attendance"
print(f"Stronger predictor: {stronger}")
# Subject-level correlations with study hours
print("\nStudy hours correlation per subject:")
for subj in subjects:
r = df["study_hrs_week"].corr(df[subj]).round(3)
print(f" {subj:<10} r = {r:.3f}")
Correlation matrix:
composite study_hrs_week attendance_pct
composite 1.000 0.982 0.971
study_hrs_week 0.982 1.000 0.988
attendance_pct 0.971 1.000 1.000
Study hours vs composite: r = 0.982
Attendance vs composite: r = 0.971
Stronger predictor: study hours
Study hours correlation per subject:
maths r = 0.979
english r = 0.974
science r = 0.983
history r = 0.971What just happened?
Method — column-level .corr() extractionRather than printing the full matrix every time, we extracted specific values using .loc["row","col"]. This lets us use the correlation values programmatically — comparing them, printing interpretations, or using them in conditional logic. The abs() call is necessary when comparing correlation strengths because a correlation of -0.95 is just as strong as +0.95 — we care about magnitude, not direction.
Calling series1.corr(series2) directly on two Series computes the Pearson correlation between just those two columns. In the loop, we compute the study-hours correlation independently for each subject — which is cleaner than extracting from the full matrix when you only need one variable's correlations against several others.
The correlation between study hours and composite score is r = 0.982 — one of the strongest linear relationships you will see in real educational data. Study time is an exceptionally strong predictor of performance in this cohort, consistent across all four subjects (all above r = 0.97). This directly answers the head of year's question: increasing study time is a credible intervention. The structured tutoring programme should prioritise getting at-risk students to increase their weekly study hours alongside attending more lessons.
Now we produce the two lists the head of year specifically requested: the subject fail rate table and the at-risk student flag list. This is the deliverable that goes directly into the meeting.
# Subject fail rate — proportion of students scoring below 50 per subject
fail_threshold = 50
print("Subject fail rates (% of students below 50):")
for subj in subjects:
fail_count = (df[subj] < fail_threshold).sum()
fail_rate = fail_count / len(df) * 100
print(f" {subj:<10} {fail_count} students failed ({fail_rate:.0f}%)")
# Highest fail rate subject
fail_rates = {s: (df[s] < fail_threshold).mean() for s in subjects}
worst_subject = max(fail_rates, key=fail_rates.get)
print(f"\nHighest fail rate: {worst_subject} at {fail_rates[worst_subject]:.0%}")
# At-risk student flag: failing 2 or more subjects
at_risk = df[df["subjects_failed"] >= 2][
["student_id","maths","english","science","history",
"composite","subjects_failed","study_hrs_week","attendance_pct"]
].sort_values("composite")
print(f"\nAt-risk students (failing 2+ subjects): {len(at_risk)} identified")
print(at_risk.to_string(index=False))
Subject fail rates (% of students below 50):
maths 4 students failed (33%)
english 2 students failed (17%)
science 4 students failed (33%)
history 3 students failed (25%)
Highest fail rate: maths at 33%
At-risk students (failing 2+ subjects): 3 identified
student_id maths english science history composite subjects_failed study_hrs_week attendance_pct
S011 29 38 32 35 33.5 4 2 60
S005 33 41 29 37 35.0 4 3 65
S002 45 51 38 42 44.0 3 5 71What just happened?
Method — dictionary comprehension for per-subject metricsThe expression {s: (df[s] < 50).mean() for s in subjects} builds a dictionary where each key is a subject name and each value is the proportion of students failing that subject. This is more concise than running the same calculation four times manually. We then use Python's built-in max(dict, key=dict.get) to find the key with the highest value — a clean pattern for finding the "worst" category across a computed dictionary.
Maths and science both have a 33% fail rate — one in three students is failing each of these subjects. Together they suggest a STEM competency gap in this cohort that goes beyond individual student effort. Three students are flagged as at-risk — S011, S005, and S002. All three study fewer than 6 hours per week and attend fewer than 72% of lessons. S011 is the most critical case: failing all four subjects, studying only 2 hours per week, with 60% attendance. This student needs an immediate pastoral meeting, not just tutoring.
Checkpoint: Add a fifth column to the at-risk table — df["lessons_missed"] = (100 - df["attendance_pct"]) / 100 * 190 (assuming 190 school days per year). Now you can tell the head of year how many lessons each at-risk student has missed in absolute terms — a number that is far more impactful in a parent meeting than a percentage.
Key Findings
English has the highest mean score at 64.9 and science is the most variable (std = 21.5). Science has a bimodal distribution — top students score above 85 while struggling students score below 35. It is not a whole-class problem but a specific cohort problem.
Study hours correlate with composite score at r = 0.982 — one of the strongest relationships in the dataset. The intervention implication is clear: structured study time is the single most impactful lever available to the school for improving at-risk student outcomes.
Maths and science have the highest fail rates at 33% each — one in three students is below the pass threshold. A subject-specific STEM intervention may be warranted in addition to individual student support.
Three students are at risk — S011, S005, and S002 — all failing two or more subjects with study hours of 5 or below and attendance below 72%. S011 is the most acute case: all four subjects failed, 2 hours of weekly study, 60% attendance.
The top three students — S003, S012, S008 — study 15–18 hours weekly and attend 95–98% of lessons. The gap between top and bottom is entirely explained by study habits and attendance — not raw ability, which has significant implications for the school's pastoral strategy.
Visualisations
Study hours almost perfectly predict score rank in this cohort
Education EDA Decision Guide
Student performance datasets always present the same core analytical choices. Here is the framework for structuring any school or university results analysis:
| Question | Metric | pandas Method | Watch Out For |
|---|---|---|---|
| Which subject is hardest? | Fail rate and mean score | (df[col] < 50).mean() | Low mean ≠ high variance — check both |
| Who is at risk? | Subjects failed count | (df[cols] < 50).sum(axis=1) | Threshold choice is a policy decision |
| Overall performance rank? | Composite score | df[cols].mean(axis=1) | axis=1 for row-wise, not column-wise |
| Does study time matter? | Pearson correlation | series.corr(series) | Correlation ≠ causation — study hours may proxy for motivation |
| Is the issue subject or student? | Cross-tabulate subject fails | boolean mask + .sum() | One failing student in a small cohort skews the whole subject rate |
Analyst's Note
Teacher's Note
What Would Come Next?
A senior analyst would build a subject correlation matrix to find which subjects share struggling students, a regression model predicting composite score from study hours and attendance, and a longitudinal comparison against prior term results to identify whether at-risk students are improving or declining.
Limitations of This Analysis
Twelve students with a near-perfect study-hours correlation (r = 0.982) is unrealistic at scale — socioeconomic factors, learning differences, and teaching quality introduce much more noise in real cohorts. Study hours are also self-reported and likely overestimated.
Business Decisions This Could Drive
Schedule pastoral meetings for S011 and S005 before end of term, design a structured 8-week tutoring programme targeting at-risk students, and commission a curriculum review for maths and science given their 33% fail rates.
Practice Questions
1. Which subject had the highest standard deviation in the Greenfield Academy dataset — meaning the most polarised scores across the cohort?
2. When computing a mean score across multiple columns per row in pandas, which parameter must you pass to .mean()?
3. What was the Pearson correlation between weekly study hours and composite score in this dataset?
Quiz
1. What does df[["maths","english","science","history"]].mean(axis=1) produce?
2. What is the correct way to count how many subjects each student is failing (below 50) in a single pandas operation?
3. Which student requires the most urgent pastoral intervention and why?
Up Next · Case Study 7
Analysing Bank Loan Approval Data
You are handed a loan application dataset. What income and credit score thresholds predict approval? Which applicant segments are most likely to be rejected — and does employment type play a role?