NumPy Lesson 29 – Numpy in ML | Dataplexa

NumPy in Machine Learning

NumPy is the foundation of almost every machine learning library in Python. Before models are trained, evaluated, or deployed, data is represented and processed using NumPy arrays.

In this lesson, you will learn how NumPy fits into machine learning workflows and why it is essential before using libraries like scikit-learn, TensorFlow, or PyTorch.


Why NumPy Is Critical for Machine Learning

Machine learning algorithms work with numerical data. NumPy provides:

  • Fast numerical computation
  • Efficient memory usage
  • Vectorized operations
  • Linear algebra support

Almost every ML dataset eventually becomes a NumPy array.


Representing Features and Labels

In machine learning:

  • Features (X) represent input variables
  • Labels (y) represent output or target values
import numpy as np

# Features: hours studied and hours slept
X = np.array([
    [5, 7],
    [3, 6],
    [8, 8],
    [2, 5]
])

# Labels: exam score
y = np.array([75, 60, 90, 55])

print(X)
print(y)

Here, each row represents one student.


Vectorized Computation in ML

Machine learning relies heavily on vectorized operations instead of loops.

# Increase all feature values by 10%
X_scaled = X * 1.1
print(X_scaled)

This operation is applied to the entire dataset at once, making it fast and efficient.


Feature Normalization

Feature scaling is critical in ML to ensure all features contribute equally.

mean = X.mean(axis=0)
std = X.std(axis=0)

X_normalized = (X - mean) / std
print(X_normalized)

This standardization process is widely used before training models.


Matrix Multiplication in Models

Many ML models rely on matrix multiplication.

Example: simple linear regression prediction

weights = np.array([4, 3])
bias = 10

prediction = np.dot(X, weights) + bias
print(prediction)

This computation forms the backbone of many ML algorithms.


Loss Calculation Using NumPy

Machine learning models learn by minimizing a loss function.

Example: Mean Squared Error (MSE)

predicted = np.array([70, 65, 85, 60])
actual = y

mse = np.mean((predicted - actual) ** 2)
print(mse)

Lower loss indicates better model performance.


Gradient Concept with NumPy

Gradients tell models how to update weights during training.

errors = predicted - actual
gradient = np.dot(X.T, errors) / len(X)
print(gradient)

This is the mathematical foundation of gradient descent.


NumPy vs Machine Learning Libraries

NumPy does not train models directly, but:

  • scikit-learn uses NumPy arrays internally
  • TensorFlow tensors are NumPy-compatible
  • PyTorch tensors can convert to NumPy

Understanding NumPy makes learning ML libraries much easier.


Real-World ML Workflow with NumPy

  1. Load data
  2. Clean and preprocess
  3. Convert to NumPy arrays
  4. Normalize features
  5. Train ML models

Practice Exercise

Task

  • Create a feature matrix with 5 rows and 2 columns
  • Normalize the features
  • Apply a weight vector
  • Compute predictions using dot product

What’s Next?

In the final lesson, you will apply everything you learned by building a complete NumPy Project using real numerical workflows.