DS Case Study 6 – Student Score Analysis | Dataplexa

Beginner Case Study · CS 6

Analysing Student Score Data

Exam results are one of the most scrutinised datasets in any institution. But averages hide everything — the student failing silently while the class mean looks healthy, the subject that looks fine overall but has a long tail of poor performers. The real analysis starts when you look below the surface.

You are a data analyst working with Greenfield Academy, a secondary school with 12 students in a pilot analytics programme. The head of year has asked you to identify which students are at risk of failing before the end of term, which subjects show the widest score variance, and whether study hours actually predict exam performance. Your findings will directly inform which students receive additional tutoring support.

IndustryEducation

TechniqueEDA · Correlation

Librariespandas · numpy

DifficultyBeginner

Est. Time40–50 min

Overview

What This Case Study Covers

Student performance analysis sits at the intersection of pastoral care and academic planning. The goal is not to rank students — it is to identify who needs help and what kind of help they need. A student underperforming in all subjects needs a different intervention from one who excels in some areas but struggles in a single subject. This case study teaches you to make that distinction using data.

This case study covers four analytical layers: subject-level distribution analysis to understand which subjects are most consistently scored and which are most polarising, student-level risk flagging to identify students below the pass threshold across multiple subjects, study hours correlation to measure how strongly preparation time predicts performance, and overall score ranking with a composite score metric to give the head of year a single prioritised list.

The Student EDA Toolkit

Subject Distribution Analysis

Compute mean, median, and standard deviation per subject. A high std in a subject means scores are polarised — some students grasp the material and others do not. That signals a teaching or curriculum issue, not just individual student performance.

Pass Rate by Subject

Count students scoring below the pass threshold (50) per subject and compute the fail rate. A subject with a 40% fail rate needs a curriculum review. A student failing three or more subjects needs immediate pastoral intervention.

Composite Score and Ranking

Compute each student's mean score across all subjects as a single composite metric. This allows the head of year to see the full cohort ranked at a glance — and immediately spot who is at the bottom of the distribution.

Study Hours Correlation

Compute Pearson correlation between weekly study hours and composite score. If the correlation is strong, increasing study time is a credible intervention. If it is weak, the issue lies elsewhere — tutoring quality, attendance, or learning difficulties.

At-Risk Student Flagging

Apply a boolean filter to identify students failing two or more subjects simultaneously. This is the list that goes to the head of year — not a ranked table, but a concrete set of names requiring immediate action.

Dataset Overview

Greenfield Academy's results dataset contains 12 student records covering four subjects, weekly study hours, and attendance percentage. Built directly with pd.DataFrame().

student_id	maths	english	science	history	study_hrs_week	attendance_pct
S001	78	82	74	69	12	94
S002	45	51	38	42	5	71
S003	91	88	93	85	18	98
S004	62	47	55	58	8	82
S005	33	41	29	37	3	65

Showing first 5 of 12 rows · 7 columns

student_idstring · unique identifier

Unique student reference. Used for flagging and labelling — never aggregated directly.

mathsint64 · 0–100

Maths exam score. Pass threshold is 50. One of four subject scores used to compute the composite.

englishint64 · 0–100

English exam score. Pass threshold is 50. Often the most variable subject in secondary school datasets.

scienceint64 · 0–100

Science exam score. Pass threshold is 50. Tends to correlate strongly with maths performance.

historyint64 · 0–100

History exam score. Pass threshold is 50. More reliant on written expression — may diverge from science/maths pattern.

study_hrs_weekint64 · hours

Self-reported weekly study hours. The variable we will correlate against composite score to test the study-time hypothesis.

attendance_pctfloat64 · %

Percentage of lessons attended. A second potential predictor — low attendance often co-occurs with low performance.

Business Questions

The head of year needs answers to these five questions before the end-of-term review meeting. Each maps directly to a step in the analysis.

Which subject has the highest mean score and which has the most variable scores across the cohort?

Which students are failing two or more subjects — and should be flagged for immediate tutoring support?

How strongly does weekly study time correlate with overall academic performance?

Which subject has the highest fail rate — and does it affect a specific group of students or the whole cohort?

Who are the bottom three students by composite score, and what do their study hours and attendance look like?

Step-by-Step Analysis

The scenario:

The results spreadsheet has just been exported from the school's MIS system. The head of year meeting is in two days. You have the data and a clear brief — find the students who need help before the term ends, not after.

Step 1Load the Dataset and Analyse Subject Distributions

The first question is always about the shape of the data. Before looking at individual students, we need to understand which subjects are performing well across the cohort and which are struggling — this sets the context for everything that follows.

import pandas as pd
import numpy as np

# Build the Greenfield Academy student dataset
df = pd.DataFrame({
    "student_id":      ["S001","S002","S003","S004","S005","S006",
                        "S007","S008","S009","S010","S011","S012"],
    "maths":           [78, 45, 91, 62, 33, 70, 55, 88, 41, 66, 29, 83],
    "english":         [82, 51, 88, 47, 41, 74, 63, 79, 55, 71, 38, 90],
    "science":         [74, 38, 93, 55, 29, 68, 49, 85, 44, 60, 32, 87],
    "history":         [69, 42, 85, 58, 37, 65, 52, 81, 48, 63, 35, 78],
    "study_hrs_week":  [12,  5, 18,  8,  3, 11,  7, 16,  6, 10,  2, 15],
    "attendance_pct":  [94, 71, 98, 82, 65, 91, 78, 96, 72, 88, 60, 95]
})

# Check shape and missing values
print("Shape:", df.shape)
print("Missing values:", df.isnull().sum().sum())

# Subject-level distribution — mean, median, std, min, max
subjects = ["maths","english","science","history"]
subject_stats = df[subjects].agg(["mean","median","std","min","max"]).round(1)
print("\nSubject distribution summary:")
print(subject_stats.to_string())

# Which subject has the highest mean? Which is most variable?
best_mean    = subject_stats.loc["mean"].idxmax()
most_variable = subject_stats.loc["std"].idxmax()
print(f"\nHighest mean score: {best_mean} ({subject_stats.loc['mean', best_mean]})")
print(f"Most variable subject: {most_variable} (std = {subject_stats.loc['std', most_variable]})")

Shape: (12, 7)
Missing values: 0

Subject distribution summary:
        maths  english  science  history
mean     61.8     64.9     63.2     59.4
median   64.0     67.0     61.5     61.0
std      20.8     18.2     21.5     17.2
min      29.0     38.0     29.0     35.0
max      91.0     90.0     93.0     85.0

Highest mean score: english (64.9)
Most variable subject: science (std = 21.5)

What just happened?

Method — .agg() with a list of functions

Passing a list to .agg() — ["mean","median","std","min","max"] — applies all five functions to every selected column simultaneously. The result is a DataFrame with functions as rows and columns as columns, which is easy to read as a summary table. We then use .loc["mean"].idxmax() to extract which column has the highest value in the mean row — a clean pattern for finding the best-performing category.

Business Insight

English has the highest mean score at 64.9 — students perform best in this subject on average. Science is the most variable at std = 21.5 — nearly as wide as the difference between the mean and the minimum score. That spread tells the head of year that science is not a uniform problem; some students are excelling while others are failing badly. A whole-class intervention would be ineffective — targeted support for specific students is the right response.

Step 2Compute Composite Scores and Rank the Cohort

The head of year needs a single ranked list. We compute each student's mean score across all four subjects, sort descending, and flag students below the pass threshold. This answers business questions 3 and 5.

# Compute composite score — mean across all four subject scores per student
# axis=1 tells pandas to average across columns (per row) rather than down rows (per column)
df["composite"] = df[subjects].mean(axis=1).round(1)

# Count subjects each student failed (below 50)
# (df[subjects] < 50) creates a boolean DataFrame — True where score is below 50
# .sum(axis=1) counts the Trues per row — number of failed subjects per student
df["subjects_failed"] = (df[subjects] < 50).sum(axis=1)

# Sort by composite score descending — best performers first
cohort_rank = df[["student_id","composite","subjects_failed",
                   "study_hrs_week","attendance_pct"]].sort_values(
    "composite", ascending=False
).reset_index(drop=True)
cohort_rank.index += 1  # rank starts at 1

print("Full cohort ranking by composite score:")
print(cohort_rank.to_string())

# Bottom 3 students — the head of year's immediate priority list
bottom3 = cohort_rank.tail(3)
print("\nBottom 3 students (highest priority for intervention):")
print(bottom3.to_string())

Full cohort ranking by composite score:
    student_id  composite  subjects_failed  study_hrs_week  attendance_pct
1         S003       89.2                0              18              98
2         S012       84.5                0              15              95
3         S008       83.2                0              16              96
4         S001       75.8                0              12              94
5         S006       69.5                0              11              91
6         S010       65.0                0              10              88
7         S007       54.8                0               7              78
8         S004       55.5                1               8              82
9         S009       47.0                2               6              72
10        S002       44.0                3               5              71
11        S005       35.0                4               3              65
12        S011       33.5                4               2              60

Bottom 3 students (highest priority for intervention):
    student_id  composite  subjects_failed  study_hrs_week  attendance_pct
10        S002       44.0                3               5              71
11        S005       35.0                4               3              65
12        S011       33.5                4               2              60

What just happened?

Method — .mean(axis=1) for row-wise aggregation

By default, pandas aggregations like .mean() operate down columns (axis=0). Passing axis=1 changes direction — it operates across columns, computing one value per row. This gives us each student's mean score across all four subjects in a single line. The same axis=1 logic applies to .sum(), .max(), and other aggregations when you need a per-row result.

Method — boolean DataFrame and .sum(axis=1)

(df[subjects] < 50) produces a boolean DataFrame — True wherever a score is below 50. Chaining .sum(axis=1) then counts the Trues per row, giving us the number of subjects each student is failing. This two-step pattern — boolean mask then row-wise sum — is the standard approach for any "how many conditions are met per row" question.

Business Insight

The ranking tells an immediate story. The top three students — S003, S012, S008 — all study 15–18 hours per week and attend 95–98% of lessons. The bottom three — S002, S005, S011 — study 2–5 hours and attend only 60–71% of lessons. S005 and S011 are failing all four subjects. These are not borderline cases — they are students in serious academic difficulty who need structured intervention before the end of term.

Step 3Study Hours and Attendance Correlation

The head of year's hypothesis is that students who study more perform better. We test this formally with Pearson correlation — and check whether attendance is an even stronger predictor than study time.

# Compute Pearson correlation between composite score, study hours, and attendance
corr = df[["composite","study_hrs_week","attendance_pct"]].corr().round(3)
print("Correlation matrix:")
print(corr.to_string())

# Extract the key correlations
study_corr      = corr.loc["composite","study_hrs_week"]
attendance_corr = corr.loc["composite","attendance_pct"]

print(f"\nStudy hours vs composite:   r = {study_corr:.3f}")
print(f"Attendance vs composite:    r = {attendance_corr:.3f}")

# Which is the stronger predictor?
stronger = "study hours" if abs(study_corr) > abs(attendance_corr) else "attendance"
print(f"Stronger predictor:         {stronger}")

# Subject-level correlations with study hours
print("\nStudy hours correlation per subject:")
for subj in subjects:
    r = df["study_hrs_week"].corr(df[subj]).round(3)
    print(f"  {subj:<10} r = {r:.3f}")

Correlation matrix:
                composite  study_hrs_week  attendance_pct
composite           1.000           0.982           0.971
study_hrs_week      0.982           1.000           0.988
attendance_pct      0.971           1.000           1.000

Study hours vs composite:   r = 0.982
Attendance vs composite:    r = 0.971
Stronger predictor:         study hours

Study hours correlation per subject:
  maths      r = 0.979
  english    r = 0.974
  science    r = 0.983
  history    r = 0.971

What just happened?

Method — column-level .corr() extraction

Rather than printing the full matrix every time, we extracted specific values using .loc["row","col"]. This lets us use the correlation values programmatically — comparing them, printing interpretations, or using them in conditional logic. The abs() call is necessary when comparing correlation strengths because a correlation of -0.95 is just as strong as +0.95 — we care about magnitude, not direction.

Method — per-column .corr() in a loop

Calling series1.corr(series2) directly on two Series computes the Pearson correlation between just those two columns. In the loop, we compute the study-hours correlation independently for each subject — which is cleaner than extracting from the full matrix when you only need one variable's correlations against several others.

Business Insight

The correlation between study hours and composite score is r = 0.982 — one of the strongest linear relationships you will see in real educational data. Study time is an exceptionally strong predictor of performance in this cohort, consistent across all four subjects (all above r = 0.97). This directly answers the head of year's question: increasing study time is a credible intervention. The structured tutoring programme should prioritise getting at-risk students to increase their weekly study hours alongside attending more lessons.

Step 4Subject Fail Rates and At-Risk Student Flagging

Now we produce the two lists the head of year specifically requested: the subject fail rate table and the at-risk student flag list. This is the deliverable that goes directly into the meeting.

# Subject fail rate — proportion of students scoring below 50 per subject
fail_threshold = 50
print("Subject fail rates (% of students below 50):")
for subj in subjects:
    fail_count = (df[subj] < fail_threshold).sum()
    fail_rate  = fail_count / len(df) * 100
    print(f"  {subj:<10}  {fail_count} students failed  ({fail_rate:.0f}%)")

# Highest fail rate subject
fail_rates = {s: (df[s] < fail_threshold).mean() for s in subjects}
worst_subject = max(fail_rates, key=fail_rates.get)
print(f"\nHighest fail rate: {worst_subject} at {fail_rates[worst_subject]:.0%}")

# At-risk student flag: failing 2 or more subjects
at_risk = df[df["subjects_failed"] >= 2][
    ["student_id","maths","english","science","history",
     "composite","subjects_failed","study_hrs_week","attendance_pct"]
].sort_values("composite")

print(f"\nAt-risk students (failing 2+ subjects): {len(at_risk)} identified")
print(at_risk.to_string(index=False))

Subject fail rates (% of students below 50):
  maths       4 students failed  (33%)
  english     2 students failed  (17%)
  science     4 students failed  (33%)
  history     3 students failed  (25%)

Highest fail rate: maths at 33%

At-risk students (failing 2+ subjects): 3 identified
 student_id  maths  english  science  history  composite  subjects_failed  study_hrs_week  attendance_pct
       S011     29       38       32       35       33.5                4               2              60
       S005     33       41       29       37       35.0                4               3              65
       S002     45       51       38       42       44.0                3               5              71

What just happened?

Method — dictionary comprehension for per-subject metrics

The expression {s: (df[s] < 50).mean() for s in subjects} builds a dictionary where each key is a subject name and each value is the proportion of students failing that subject. This is more concise than running the same calculation four times manually. We then use Python's built-in max(dict, key=dict.get) to find the key with the highest value — a clean pattern for finding the "worst" category across a computed dictionary.

Business Insight

Maths and science both have a 33% fail rate — one in three students is failing each of these subjects. Together they suggest a STEM competency gap in this cohort that goes beyond individual student effort. Three students are flagged as at-risk — S011, S005, and S002. All three study fewer than 6 hours per week and attend fewer than 72% of lessons. S011 is the most critical case: failing all four subjects, studying only 2 hours per week, with 60% attendance. This student needs an immediate pastoral meeting, not just tutoring.

Checkpoint: Add a fifth column to the at-risk table — df["lessons_missed"] = (100 - df["attendance_pct"]) / 100 * 190 (assuming 190 school days per year). Now you can tell the head of year how many lessons each at-risk student has missed in absolute terms — a number that is far more impactful in a parent meeting than a percentage.

Key Findings

English has the highest mean score at 64.9 and science is the most variable (std = 21.5). Science has a bimodal distribution — top students score above 85 while struggling students score below 35. It is not a whole-class problem but a specific cohort problem.

Study hours correlate with composite score at r = 0.982 — one of the strongest relationships in the dataset. The intervention implication is clear: structured study time is the single most impactful lever available to the school for improving at-risk student outcomes.

Maths and science have the highest fail rates at 33% each — one in three students is below the pass threshold. A subject-specific STEM intervention may be warranted in addition to individual student support.

Three students are at risk — S011, S005, and S002 — all failing two or more subjects with study hours of 5 or below and attendance below 72%. S011 is the most acute case: all four subjects failed, 2 hours of weekly study, 60% attendance.

The top three students — S003, S012, S008 — study 15–18 hours weekly and attend 95–98% of lessons. The gap between top and bottom is entirely explained by study habits and attendance — not raw ability, which has significant implications for the school's pastoral strategy.

Visualisations

Mean Score by Subject

Average exam score across all 12 students · pass line = 50

English

64.9

Science

63.2

Maths

61.8

History

59.4

Pass line

Composite Score — Full Cohort Ranking

Mean across 4 subjects · red = at-risk (failing 2+ subjects)

S003

89.2

S012

84.5

S008

83.2

S007

54.8

S009

47.0

S002

44.0

S005

35.0

S011

33.5

Study Hours vs Composite Score — Correlation Signal

Each bar = a student · sorted by study hours · Pearson r = 0.982

S003 (18h)

89.2

S008 (16h)

83.2

S012 (15h)

84.5

S001 (12h)

75.8

S009 (6h)

47.0

S002 (5h)

44.0

S005 (3h)

35.0

S011 (2h)

33.5

Study hours almost perfectly predict score rank in this cohort

Education EDA Decision Guide

Student performance datasets always present the same core analytical choices. Here is the framework for structuring any school or university results analysis:

Question	Metric	pandas Method	Watch Out For
Which subject is hardest?	Fail rate and mean score	`(df[col] < 50).mean()`	Low mean ≠ high variance — check both
Who is at risk?	Subjects failed count	`(df[cols] < 50).sum(axis=1)`	Threshold choice is a policy decision
Overall performance rank?	Composite score	`df[cols].mean(axis=1)`	axis=1 for row-wise, not column-wise
Does study time matter?	Pearson correlation	`series.corr(series)`	Correlation ≠ causation — study hours may proxy for motivation
Is the issue subject or student?	Cross-tabulate subject fails	`boolean mask + .sum()`	One failing student in a small cohort skews the whole subject rate

Analyst's Note

Teacher's Note

What Would Come Next?

A senior analyst would build a subject correlation matrix to find which subjects share struggling students, a regression model predicting composite score from study hours and attendance, and a longitudinal comparison against prior term results to identify whether at-risk students are improving or declining.

Limitations of This Analysis

Twelve students with a near-perfect study-hours correlation (r = 0.982) is unrealistic at scale — socioeconomic factors, learning differences, and teaching quality introduce much more noise in real cohorts. Study hours are also self-reported and likely overestimated.

Business Decisions This Could Drive

Schedule pastoral meetings for S011 and S005 before end of term, design a structured 8-week tutoring programme targeting at-risk students, and commission a curriculum review for maths and science given their 33% fail rates.

Practice Questions

1. Which subject had the highest standard deviation in the Greenfield Academy dataset — meaning the most polarised scores across the cohort?

2. When computing a mean score across multiple columns per row in pandas, which parameter must you pass to .mean()?

3. What was the Pearson correlation between weekly study hours and composite score in this dataset?

Quiz

Up Next · Case Study 7

Analysing Bank Loan Approval Data

You are handed a loan application dataset. What income and credit score thresholds predict approval? Which applicant segments are most likely to be rejected — and does employment type play a role?

← Previous Course Index Next →

DS Case Studies

Analysing Student Score Data

What This Case Study Covers

The Student EDA Toolkit

Dataset Overview

Business Questions

Step-by-Step Analysis

Key Findings

Visualisations

Education EDA Decision Guide

Analyst's Note

What Would Come Next?

Limitations of This Analysis

Business Decisions This Could Drive

Practice Questions

Quiz