AI Lesson 57 – Autoencoders & Applications | Dataplexa

Model Evaluation Metrics

Building a machine learning model is only half the job. The real question is: how good is the model? Model Evaluation Metrics help us measure how well a model performs, how reliable its predictions are, and whether it can be trusted in real-world scenarios.

Different problems require different evaluation metrics. A model that performs well in one metric may perform poorly in another. Understanding these metrics is essential to building successful AI systems.

Real-World Connection

Imagine a medical test that detects a disease. If it predicts “healthy” for everyone, accuracy may look high, but the test is useless. Evaluation metrics help us understand whether predictions are truly meaningful, not just mathematically correct.

Why Model Evaluation Is Important

Measures model performance objectively
Helps compare different models
Identifies overfitting and underfitting
Guides model improvement

Evaluation Metrics for Classification Models

Classification models predict categories such as spam/not spam or fraud/not fraud. The most common metrics are Accuracy, Precision, Recall, and F1-score.

Accuracy

Accuracy measures how many predictions were correct out of all predictions.

Accuracy is useful when classes are balanced, but it can be misleading when one class dominates the dataset.

Accuracy Example (Python)


from sklearn.metrics import accuracy_score

y_true = [1, 0, 1, 1, 0]
y_pred = [1, 0, 1, 0, 0]

accuracy = accuracy_score(y_true, y_pred)
print(accuracy)

0.8

Understanding the Output

The model correctly predicted 80% of the outcomes. However, accuracy alone does not tell us which type of mistakes were made.

Precision

Precision measures how many predicted positive cases were actually positive. It is important when false positives are costly.

For example, in spam detection, marking a legitimate email as spam is undesirable.

Recall

Recall measures how many actual positive cases were correctly identified. It is important when missing positive cases is dangerous.

For example, in disease detection, failing to detect a sick patient can be critical.

Precision and Recall Example


from sklearn.metrics import precision_score, recall_score

precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)

print("Precision:", precision)
print("Recall:", recall)

Precision: 1.0 Recall: 0.6666666666666666

Understanding Precision and Recall

The model was very accurate when it predicted a positive outcome, but it failed to detect all actual positive cases.

F1 Score

The F1 score balances precision and recall into a single value. It is useful when both false positives and false negatives matter.

F1 Score Example


from sklearn.metrics import f1_score

f1 = f1_score(y_true, y_pred)
print(f1)

0.8

Confusion Matrix

A confusion matrix shows a detailed breakdown of correct and incorrect predictions. It helps visualize where the model is making mistakes.

Confusion Matrix Example


from sklearn.metrics import confusion_matrix

matrix = confusion_matrix(y_true, y_pred)
print(matrix)

[[2 0] [1 2]]

Evaluation Metrics for Regression Models

Regression models predict continuous values such as prices or temperatures. Common metrics include Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared.

Mean Absolute Error (MAE)

MAE measures the average magnitude of errors without considering direction. It is easy to interpret.

Regression Metrics Example


from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

y_true = [100, 200, 300]
y_pred = [110, 190, 310]

print("MAE:", mean_absolute_error(y_true, y_pred))
print("MSE:", mean_squared_error(y_true, y_pred))
print("R2:", r2_score(y_true, y_pred))

MAE: 10.0 MSE: 100.0 R2: 0.98

Practice Questions

Practice 1: Which metric measures overall correctness?

Practice 2: Which metric focuses on false positives?

Practice 3: Which regression metric measures average absolute error?

Quick Quiz

Quiz 1: Which metric balances precision and recall?

Accuracy
F1 Score
Recall

Quiz 2: Which tool shows detailed prediction results?

Accuracy
Confusion Matrix
Loss Function

Quiz 3: MAE and MSE are used for which models?

Classification
Regression
Clustering

Coming up next: Overfitting and Underfitting — understanding why models fail and how to fix them.

← Previous Course Index Next →

AI Course