AI Lesson 57 – Autoencoders & Applications | Dataplexa

Model Evaluation Metrics

Building a machine learning model is only half the job. The real question is: how good is the model? Model Evaluation Metrics help us measure how well a model performs, how reliable its predictions are, and whether it can be trusted in real-world scenarios.

Different problems require different evaluation metrics. A model that performs well in one metric may perform poorly in another. Understanding these metrics is essential to building successful AI systems.

Real-World Connection

Imagine a medical test that detects a disease. If it predicts “healthy” for everyone, accuracy may look high, but the test is useless. Evaluation metrics help us understand whether predictions are truly meaningful, not just mathematically correct.

Why Model Evaluation Is Important

  • Measures model performance objectively
  • Helps compare different models
  • Identifies overfitting and underfitting
  • Guides model improvement

Evaluation Metrics for Classification Models

Classification models predict categories such as spam/not spam or fraud/not fraud. The most common metrics are Accuracy, Precision, Recall, and F1-score.

Accuracy

Accuracy measures how many predictions were correct out of all predictions.

Accuracy is useful when classes are balanced, but it can be misleading when one class dominates the dataset.

Accuracy Example (Python)


from sklearn.metrics import accuracy_score

y_true = [1, 0, 1, 1, 0]
y_pred = [1, 0, 1, 0, 0]

accuracy = accuracy_score(y_true, y_pred)
print(accuracy)
  
0.8

Understanding the Output

The model correctly predicted 80% of the outcomes. However, accuracy alone does not tell us which type of mistakes were made.

Precision

Precision measures how many predicted positive cases were actually positive. It is important when false positives are costly.

For example, in spam detection, marking a legitimate email as spam is undesirable.

Recall

Recall measures how many actual positive cases were correctly identified. It is important when missing positive cases is dangerous.

For example, in disease detection, failing to detect a sick patient can be critical.

Precision and Recall Example


from sklearn.metrics import precision_score, recall_score

precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)

print("Precision:", precision)
print("Recall:", recall)
  
Precision: 1.0 Recall: 0.6666666666666666

Understanding Precision and Recall

The model was very accurate when it predicted a positive outcome, but it failed to detect all actual positive cases.

F1 Score

The F1 score balances precision and recall into a single value. It is useful when both false positives and false negatives matter.

F1 Score Example


from sklearn.metrics import f1_score

f1 = f1_score(y_true, y_pred)
print(f1)
  
0.8

Confusion Matrix

A confusion matrix shows a detailed breakdown of correct and incorrect predictions. It helps visualize where the model is making mistakes.

Confusion Matrix Example


from sklearn.metrics import confusion_matrix

matrix = confusion_matrix(y_true, y_pred)
print(matrix)
  
[[2 0] [1 2]]

Evaluation Metrics for Regression Models

Regression models predict continuous values such as prices or temperatures. Common metrics include Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared.

Mean Absolute Error (MAE)

MAE measures the average magnitude of errors without considering direction. It is easy to interpret.

Regression Metrics Example


from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

y_true = [100, 200, 300]
y_pred = [110, 190, 310]

print("MAE:", mean_absolute_error(y_true, y_pred))
print("MSE:", mean_squared_error(y_true, y_pred))
print("R2:", r2_score(y_true, y_pred))
  
MAE: 10.0 MSE: 100.0 R2: 0.98

Practice Questions

Practice 1: Which metric measures overall correctness?



Practice 2: Which metric focuses on false positives?



Practice 3: Which regression metric measures average absolute error?



Quick Quiz

Quiz 1: Which metric balances precision and recall?





Quiz 2: Which tool shows detailed prediction results?





Quiz 3: MAE and MSE are used for which models?





Coming up next: Overfitting and Underfitting — understanding why models fail and how to fix them.