NumPy in Machine Learning
NumPy is the foundation of almost every machine learning library in Python. Before models are trained, evaluated, or deployed, data is represented and processed using NumPy arrays.
In this lesson, you will learn how NumPy fits into machine learning workflows and why it is essential before using libraries like scikit-learn, TensorFlow, or PyTorch.
Why NumPy Is Critical for Machine Learning
Machine learning algorithms work with numerical data. NumPy provides:
- Fast numerical computation
- Efficient memory usage
- Vectorized operations
- Linear algebra support
Almost every ML dataset eventually becomes a NumPy array.
Representing Features and Labels
In machine learning:
- Features (X) represent input variables
- Labels (y) represent output or target values
import numpy as np
# Features: hours studied and hours slept
X = np.array([
[5, 7],
[3, 6],
[8, 8],
[2, 5]
])
# Labels: exam score
y = np.array([75, 60, 90, 55])
print(X)
print(y)
Here, each row represents one student.
Vectorized Computation in ML
Machine learning relies heavily on vectorized operations instead of loops.
# Increase all feature values by 10%
X_scaled = X * 1.1
print(X_scaled)
This operation is applied to the entire dataset at once, making it fast and efficient.
Feature Normalization
Feature scaling is critical in ML to ensure all features contribute equally.
mean = X.mean(axis=0)
std = X.std(axis=0)
X_normalized = (X - mean) / std
print(X_normalized)
This standardization process is widely used before training models.
Matrix Multiplication in Models
Many ML models rely on matrix multiplication.
Example: simple linear regression prediction
weights = np.array([4, 3])
bias = 10
prediction = np.dot(X, weights) + bias
print(prediction)
This computation forms the backbone of many ML algorithms.
Loss Calculation Using NumPy
Machine learning models learn by minimizing a loss function.
Example: Mean Squared Error (MSE)
predicted = np.array([70, 65, 85, 60])
actual = y
mse = np.mean((predicted - actual) ** 2)
print(mse)
Lower loss indicates better model performance.
Gradient Concept with NumPy
Gradients tell models how to update weights during training.
errors = predicted - actual
gradient = np.dot(X.T, errors) / len(X)
print(gradient)
This is the mathematical foundation of gradient descent.
NumPy vs Machine Learning Libraries
NumPy does not train models directly, but:
- scikit-learn uses NumPy arrays internally
- TensorFlow tensors are NumPy-compatible
- PyTorch tensors can convert to NumPy
Understanding NumPy makes learning ML libraries much easier.
Real-World ML Workflow with NumPy
- Load data
- Clean and preprocess
- Convert to NumPy arrays
- Normalize features
- Train ML models
Practice Exercise
Task
- Create a feature matrix with 5 rows and 2 columns
- Normalize the features
- Apply a weight vector
- Compute predictions using dot product
What’s Next?
In the final lesson, you will apply everything you learned by building a complete NumPy Project using real numerical workflows.