AI Lesson 38 – Model Evaluation Metrics | Dataplexa

Model Evaluation Metrics

Building a machine learning model is not enough. We must measure how good the model is. Model Evaluation Metrics help us understand whether a model is performing well or not.

Without proper evaluation, a model may look accurate but fail badly in real-world situations.

Why Model Evaluation Is Important

Different models behave differently on different datasets. Evaluation metrics help us compare models, tune them, and decide which one is suitable for production.

  • Detects overfitting and underfitting
  • Compares multiple models objectively
  • Helps choose correct algorithms
  • Ensures business reliability

Real-World Example

Suppose a medical AI predicts whether a patient has a disease. If the model is wrong, the impact is serious. Accuracy alone is not enough; we must check precision, recall, and other metrics.

Classification vs Regression Metrics

Evaluation metrics depend on the type of problem:

  • Classification: Predicts categories (Yes/No)
  • Regression: Predicts numerical values

Classification Metrics

Accuracy

Accuracy measures how many predictions are correct out of total predictions.


from sklearn.metrics import accuracy_score

y_true = [1, 0, 1, 1, 0]
y_pred = [1, 0, 1, 0, 0]

print(accuracy_score(y_true, y_pred))
  
0.8

An accuracy of 0.8 means the model predicted correctly 80% of the time.

Precision

Precision tells how many predicted positives are actually correct.


from sklearn.metrics import precision_score

print(precision_score(y_true, y_pred))
  
1.0

High precision means fewer false positives.

Recall

Recall measures how many actual positives were correctly identified.


from sklearn.metrics import recall_score

print(recall_score(y_true, y_pred))
  
0.67

High recall means fewer false negatives.

F1 Score

F1 Score balances precision and recall.


from sklearn.metrics import f1_score

print(f1_score(y_true, y_pred))
  
0.8

F1 Score is useful when classes are imbalanced.

Confusion Matrix

A confusion matrix shows how predictions are distributed across classes.


from sklearn.metrics import confusion_matrix

print(confusion_matrix(y_true, y_pred))
  
[[2 0] [1 2]]

This matrix helps visualize true positives, false positives, true negatives, and false negatives.

Regression Metrics

Mean Absolute Error (MAE)

MAE measures average absolute difference between predicted and actual values.


from sklearn.metrics import mean_absolute_error

y_true = [100, 150, 200]
y_pred = [110, 140, 190]

print(mean_absolute_error(y_true, y_pred))
  
10.0

Lower MAE means better predictions.

Mean Squared Error (MSE)

MSE penalizes larger errors more heavily.


from sklearn.metrics import mean_squared_error

print(mean_squared_error(y_true, y_pred))
  
100.0

R-Squared Score

R² explains how much variance is captured by the model.


from sklearn.metrics import r2_score

print(r2_score(y_true, y_pred))
  
0.94

An R² value close to 1 indicates a strong model.

Choosing the Right Metric

  • Use accuracy only when classes are balanced
  • Use precision when false positives are costly
  • Use recall when false negatives are dangerous
  • Use MAE or MSE for regression problems

Practice Questions

Practice 1: Which metric measures overall correctness?



Practice 2: Which metric focuses on identifying positives?



Practice 3: Which regression metric uses absolute error?



Quick Quiz

Quiz 1: Which metric balances precision and recall?





Quiz 2: Which tool visualizes prediction errors?





Quiz 3: MAE and MSE are used in which type of problem?





Coming up next: Overfitting vs Underfitting — understanding why models fail.