AI Lesson 43 – Activation Functions | Dataplexa

Model Evaluation Metrics

Model Evaluation Metrics help us understand how well a machine learning model is performing. Training a model is only half the job. The real question is whether the model is making correct and reliable predictions.

Different problems need different metrics. A model that works well for predicting house prices may need a completely different evaluation approach than a model detecting fraud.

Why Model Evaluation Is Important

Accuracy alone does not always tell the full story. A model can appear accurate but still fail in real-world scenarios.

  • Helps compare multiple models
  • Detects overfitting and underfitting
  • Ensures reliability before deployment
  • Guides model improvement decisions

Real-World Connection

Imagine a medical diagnosis system that predicts whether a patient has a disease. Even a small number of wrong predictions can be dangerous. In such cases, evaluation metrics beyond accuracy become critical.

Classification Metrics

Classification models predict categories. Common examples include spam detection, fraud detection, and disease diagnosis.

Accuracy

Accuracy measures how many predictions the model got right overall.


from sklearn.metrics import accuracy_score

y_true = [1, 0, 1, 1, 0]
y_pred = [1, 0, 1, 0, 0]

print(accuracy_score(y_true, y_pred))
  
0.8

The model correctly predicted 80% of the cases.

Precision

Precision measures how many predicted positives were actually correct.


from sklearn.metrics import precision_score

print(precision_score(y_true, y_pred))
  
1.0

A precision of 1.0 means every positive prediction was correct.

Recall

Recall measures how many actual positives were correctly identified.


from sklearn.metrics import recall_score

print(recall_score(y_true, y_pred))
  
0.67

The model detected 67% of all actual positive cases.

F1-Score

The F1-Score balances precision and recall into a single value.


from sklearn.metrics import f1_score

print(f1_score(y_true, y_pred))
  
0.8

F1-Score is useful when false positives and false negatives both matter.

Regression Metrics

Regression models predict continuous values such as prices, temperatures, or sales.

Mean Absolute Error (MAE)

MAE measures the average absolute difference between predictions and actual values.


from sklearn.metrics import mean_absolute_error

y_true = [100, 150, 200]
y_pred = [110, 140, 190]

print(mean_absolute_error(y_true, y_pred))
  
10.0

On average, predictions are off by 10 units.

Mean Squared Error (MSE)

MSE penalizes larger errors more heavily by squaring them.


from sklearn.metrics import mean_squared_error

print(mean_squared_error(y_true, y_pred))
  
100.0

Larger mistakes have a stronger impact on this metric.

R-Squared (R²)

R² explains how much variance in the target variable is captured by the model.


from sklearn.metrics import r2_score

print(r2_score(y_true, y_pred))
  
0.97

An R² value close to 1 indicates a strong model fit.

Choosing the Right Metric

  • Use accuracy for balanced classification problems
  • Use precision and recall for imbalanced datasets
  • Use MAE or MSE for regression tasks
  • Use R² to understand explained variance

Practice Questions

Practice 1: Which metric measures overall correct predictions?



Practice 2: Which metric measures detected actual positives?



Practice 3: Which regression metric measures average absolute error?



Quick Quiz

Quiz 1: Which metric balances precision and recall?





Quiz 2: Which metric penalizes large errors more?





Quiz 3: Precision and recall are critical for which datasets?





Coming up next: Overfitting and Underfitting — understanding when models learn too much or too little.