AI Course
Model Evaluation Metrics
Building a machine learning model is not enough. We must measure how good the model is. Model Evaluation Metrics help us understand whether a model is performing well or not.
Without proper evaluation, a model may look accurate but fail badly in real-world situations.
Why Model Evaluation Is Important
Different models behave differently on different datasets. Evaluation metrics help us compare models, tune them, and decide which one is suitable for production.
- Detects overfitting and underfitting
- Compares multiple models objectively
- Helps choose correct algorithms
- Ensures business reliability
Real-World Example
Suppose a medical AI predicts whether a patient has a disease. If the model is wrong, the impact is serious. Accuracy alone is not enough; we must check precision, recall, and other metrics.
Classification vs Regression Metrics
Evaluation metrics depend on the type of problem:
- Classification: Predicts categories (Yes/No)
- Regression: Predicts numerical values
Classification Metrics
Accuracy
Accuracy measures how many predictions are correct out of total predictions.
from sklearn.metrics import accuracy_score
y_true = [1, 0, 1, 1, 0]
y_pred = [1, 0, 1, 0, 0]
print(accuracy_score(y_true, y_pred))
An accuracy of 0.8 means the model predicted correctly 80% of the time.
Precision
Precision tells how many predicted positives are actually correct.
from sklearn.metrics import precision_score
print(precision_score(y_true, y_pred))
High precision means fewer false positives.
Recall
Recall measures how many actual positives were correctly identified.
from sklearn.metrics import recall_score
print(recall_score(y_true, y_pred))
High recall means fewer false negatives.
F1 Score
F1 Score balances precision and recall.
from sklearn.metrics import f1_score
print(f1_score(y_true, y_pred))
F1 Score is useful when classes are imbalanced.
Confusion Matrix
A confusion matrix shows how predictions are distributed across classes.
from sklearn.metrics import confusion_matrix
print(confusion_matrix(y_true, y_pred))
This matrix helps visualize true positives, false positives, true negatives, and false negatives.
Regression Metrics
Mean Absolute Error (MAE)
MAE measures average absolute difference between predicted and actual values.
from sklearn.metrics import mean_absolute_error
y_true = [100, 150, 200]
y_pred = [110, 140, 190]
print(mean_absolute_error(y_true, y_pred))
Lower MAE means better predictions.
Mean Squared Error (MSE)
MSE penalizes larger errors more heavily.
from sklearn.metrics import mean_squared_error
print(mean_squared_error(y_true, y_pred))
R-Squared Score
R² explains how much variance is captured by the model.
from sklearn.metrics import r2_score
print(r2_score(y_true, y_pred))
An R² value close to 1 indicates a strong model.
Choosing the Right Metric
- Use accuracy only when classes are balanced
- Use precision when false positives are costly
- Use recall when false negatives are dangerous
- Use MAE or MSE for regression problems
Practice Questions
Practice 1: Which metric measures overall correctness?
Practice 2: Which metric focuses on identifying positives?
Practice 3: Which regression metric uses absolute error?
Quick Quiz
Quiz 1: Which metric balances precision and recall?
Quiz 2: Which tool visualizes prediction errors?
Quiz 3: MAE and MSE are used in which type of problem?
Coming up next: Overfitting vs Underfitting — understanding why models fail.