Python Lesson 43 – NumPy | Dataplexa

NumPy

NumPy is the foundation of scientific computing in Python. Nearly every data science, machine learning, and numerical library — pandas, scikit-learn, TensorFlow, PyTorch — is built on top of it. Its core contribution is the ndarray: a fixed-type, multi-dimensional array that stores data in contiguous memory and executes operations in compiled C code. The result is numerical computation that is 10 to 100 times faster than equivalent pure Python loops.

This lesson covers array creation, indexing, vectorised operations, broadcasting, linear algebra, and the performance principles that make NumPy indispensable.

Why NumPy Beats Python Lists for Numerical Work

Python lists are flexible but slow for numerical computation — each element is a full Python object with overhead, and loops in Python are interpreted one step at a time. NumPy arrays store raw numbers in a contiguous block of memory and perform operations in C — no Python loop, no object overhead.

# NumPy vs Python list — speed comparison

import numpy as np
import time

n = 10_000_000

# Python list — slow loop
py_list = list(range(n))
start = time.perf_counter()
result = [x * 2 for x in py_list]
print(f"Python list: {time.perf_counter() - start:.3f}s")

# NumPy array — vectorised C operation
arr = np.arange(n)
start = time.perf_counter()
result = arr * 2
print(f"NumPy array: {time.perf_counter() - start:.3f}s")

# Memory comparison
import sys
print(f"\nPython list size: {sys.getsizeof(py_list):,} bytes")
print(f"NumPy array size: {arr.nbytes:,} bytes")
Python list: 0.412s
NumPy array: 0.008s

Python list size: 89,095,160 bytes
NumPy array size: 80,000,000 bytes
  • NumPy is roughly 50x faster here — the gap grows with array size and operation complexity
  • NumPy arrays are also more memory-efficient — a Python list of ints stores object pointers, not raw numbers
  • NumPy's speed comes from vectorisation: one C operation applied to the entire array, no Python loop

1. Creating Arrays

NumPy provides many ways to create arrays. Choosing the right creation function saves time and keeps code readable.

# Array creation — the most common patterns

import numpy as np

# From a Python list
a = np.array([1, 2, 3, 4, 5])
print("1D array:", a)
print("dtype:", a.dtype)       # int64 by default
print("shape:", a.shape)       # (5,)

# 2D array (matrix) from nested list
m = np.array([[1, 2, 3],
              [4, 5, 6]])
print("\n2D array:\n", m)
print("shape:", m.shape)       # (2, 3) — 2 rows, 3 columns

# Filled arrays
print("\nzeros:  ", np.zeros(5))
print("ones:   ", np.ones((2, 3)))
print("full:   ", np.full((2, 2), 7))
print("eye:    \n", np.eye(3))      # 3x3 identity matrix

# Range-based
print("arange: ", np.arange(0, 10, 2))        # 0,2,4,6,8
print("linspace:", np.linspace(0, 1, 5))       # 5 evenly spaced from 0 to 1

# Random
rng = np.random.default_rng(seed=42)           # reproducible
print("random: ", rng.integers(0, 10, size=5))
print("normal: ", rng.standard_normal(4).round(3))
1D array: [1 2 3 4 5]
dtype: int64
shape: (5,)

2D array:
[[1 2 3]
[4 5 6]]
shape: (2, 3)

zeros: [0. 0. 0. 0. 0.]
ones: [[1. 1. 1.]
[1. 1. 1.]]
full: [[7 7]
[7 7]]
eye:
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
arange: [0 2 4 6 8]
linspace: [0. 0.25 0.5 0.75 1. ]
random: [0 7 6 4 4]
normal: [ 0.304 1.765 -0.977 -0.151]
  • np.array() — from existing data; np.zeros/ones/full — pre-filled; np.arange/linspace — range-based
  • shape is a tuple: (rows, columns) for 2D, (depth, rows, columns) for 3D
  • Use np.random.default_rng(seed=) for reproducible random arrays — the modern NumPy random API
  • Specify dtype explicitly when needed: np.zeros(5, dtype=np.float32)

2. Indexing and Slicing

NumPy indexing extends Python list slicing to multiple dimensions and adds powerful boolean and fancy indexing.

# Indexing and slicing — 1D, 2D, boolean, fancy

import numpy as np

a = np.array([10, 20, 30, 40, 50])

# Basic indexing and slicing — same as Python lists
print(a[0])       # 10
print(a[-1])      # 50
print(a[1:4])     # [20 30 40]
print(a[::2])     # [10 30 50]

# 2D indexing — [row, column]
m = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

print(m[0, 2])    # 3   — row 0, col 2
print(m[1, :])    # [4 5 6]  — entire row 1
print(m[:, 1])    # [2 5 8]  — entire column 1
print(m[0:2, 1:]) # [[2 3], [5 6]] — sub-matrix

# Boolean indexing — select elements matching a condition
a = np.array([15, 3, 42, 7, 28, 11])
mask = a > 10
print("\nMask:  ", mask)
print("Above 10:", a[mask])           # [15 42 28 11]
print("Even:    ", a[a % 2 == 0])     # [42 28]

# Fancy indexing — select by list of indices
print("Fancy:   ", a[[0, 2, 4]])      # [15 42 28]
10
50
[20 30 40]
[10 30 50]
3
[4 5 6]
[2 5 8]
[[2 3]
[5 6]]

Mask: [ True False True False True True]
Above 10: [15 42 28 11]
Even: [42 28]
Fancy: [15 42 28]
  • 2D indexing uses [row, col] — not [row][col] like nested Python lists
  • : selects all elements along an axis — m[:, 1] means all rows, column 1
  • Boolean indexing returns a new array of elements where the mask is True — extremely useful for filtering
  • NumPy slices return views, not copies — modifying a slice modifies the original array. Use .copy() to avoid this.

3. Vectorised Operations

In NumPy, arithmetic operators work element-wise on entire arrays without writing a single loop. These are called ufuncs — universal functions implemented in C.

# Vectorised operations — element-wise, no loops

import numpy as np

prices = np.array([9.99, 24.99, 4.99, 49.99, 14.99])
qty    = np.array([3, 1, 10, 2, 5])

# Arithmetic — applied to every element simultaneously
revenue   = prices * qty
discounted = prices * 0.9   # 10% off every price

print("Revenue:    ", revenue.round(2))
print("Discounted: ", discounted.round(2))
print("Total revenue: $", revenue.sum().round(2))

# Universal functions — applied element-wise
angles = np.linspace(0, np.pi, 5)
print("\nAngles:  ", angles.round(3))
print("sin:     ", np.sin(angles).round(3))
print("sqrt:    ", np.sqrt(np.array([1, 4, 9, 16, 25])))

# Aggregation
data = np.array([14, 7, 23, 5, 31, 18, 9])
print("\nSum:   ", data.sum())
print("Mean:  ", data.mean().round(2))
print("Std:   ", data.std().round(2))
print("Min:   ", data.min(), " at index", data.argmin())
print("Max:   ", data.max(), " at index", data.argmax())
Revenue: [ 29.97 24.99 49.9 99.98 74.95]
Discounted: [ 8.99 22.49 4.49 44.99 13.49]
Total revenue: $ 279.79

Angles: [0. 0.785 1.571 2.356 3.142]
sin: [0. 0.707 1. 0.707 0. ]
sqrt: [1. 2. 3. 4. 5.]

Sum: 107
Mean: 15.29
Std: 8.73
Min: 5 at index 3
Max: 31 at index 4
  • Arithmetic operators (+, -, *, /, **) are all vectorised — applied to every element
  • NumPy ufuncs: np.sin, np.cos, np.sqrt, np.exp, np.log, np.abs — all element-wise
  • Aggregation methods: sum, mean, std, min, max, argmin, argmax, cumsum
  • For 2D arrays, pass axis=0 (along columns) or axis=1 (along rows) to aggregate in one direction

4. Broadcasting

Broadcasting lets NumPy perform operations on arrays with different shapes — it automatically expands smaller arrays to match larger ones without copying data. This is one of NumPy's most powerful and frequently misunderstood features.

# Broadcasting — operating on arrays of different shapes

import numpy as np

# Scalar broadcasts to every element
a = np.array([1, 2, 3, 4, 5])
print(a + 10)        # [11 12 13 14 15]
print(a * 2)         # [ 2  4  6  8 10]

# 1D array broadcasts across rows of a 2D array
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

row_bias = np.array([10, 20, 30])   # shape (3,)
print("\nMatrix + row_bias:")
print(matrix + row_bias)            # adds [10,20,30] to each row

# Column broadcast — reshape to (3,1)
col_bias = np.array([[100], [200], [300]])   # shape (3,1)
print("\nMatrix + col_bias:")
print(matrix + col_bias)            # adds 100/200/300 to each column

# Practical: normalise each column to zero mean
data = np.array([[10, 200, 0.5],
                 [20, 400, 1.0],
                 [30, 600, 1.5]])
means = data.mean(axis=0)           # mean of each column
normalised = data - means           # broadcast subtraction
print("\nNormalised (zero mean per column):")
print(normalised)
[11 12 13 14 15]
[ 2 4 6 8 10]

Matrix + row_bias:
[[11 22 33]
[14 25 36]
[17 28 39]]

Matrix + col_bias:
[[101 102 103]
[204 205 206]
[307 308 309]]

Normalised (zero mean per column):
[[-10. -200. -0.5]
[ 0. 0. 0. ]
[ 10. 200. 0.5]]
  • Broadcasting rule: dimensions are compared from the right — they must either be equal or one of them must be 1
  • Scalar is broadcast to every element; 1D array is broadcast across rows; (n,1) array broadcasts across columns
  • Column-mean normalisation is a one-liner in NumPy thanks to broadcasting — no loop needed

5. Reshaping and Stacking

# Reshaping and stacking arrays

import numpy as np

a = np.arange(12)
print("Original:", a)

# Reshape — total elements must stay the same
m = a.reshape(3, 4)    # 3 rows, 4 columns
print("Reshaped (3,4):\n", m)

# -1 means "infer this dimension"
m2 = a.reshape(4, -1)  # 4 rows, infer columns → (4,3)
print("Reshaped (4,-1):\n", m2)

# Flatten back to 1D
print("Flattened:", m.flatten())

# Transpose
print("Transposed:\n", m.T)   # rows become columns

# Stacking arrays
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

print("\nvstack:\n", np.vstack([x, y]))   # vertical — new rows
print("hstack:  ", np.hstack([x, y]))    # horizontal — extend columns
Original: [ 0 1 2 3 4 5 6 7 8 9 10 11]
Reshaped (3,4):
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Reshaped (4,-1):
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
Flattened: [ 0 1 2 3 4 5 6 7 8 9 10 11]
Transposed:
[[ 0 4 8]
[ 1 5 9]
[ 2 6 10]
[ 3 7 11]]

vstack:
[[1 2 3]
[4 5 6]]
hstack: [1 2 3 4 5 6]

6. Linear Algebra

NumPy's linalg module provides the matrix operations that underpin machine learning — dot products, matrix multiplication, determinants, inverses, and eigenvalues.

# Linear algebra with numpy.linalg

import numpy as np

A = np.array([[2, 1],
              [5, 3]])
B = np.array([[1, 2],
              [3, 4]])

# Matrix multiplication — @ operator or np.matmul
print("A @ B:\n", A @ B)

# Dot product of two vectors
v = np.array([1, 2, 3])
w = np.array([4, 5, 6])
print("\nDot product:", np.dot(v, w))    # 1*4 + 2*5 + 3*6 = 32

# Determinant and inverse
print("\nDet(A):", np.linalg.det(A))
print("Inv(A):\n", np.linalg.inv(A))

# Solve system of equations: Ax = b
b = np.array([3, 10])
x = np.linalg.solve(A, b)
print("\nSolution x:", x)   # verify: A @ x should equal b
print("Verify A @ x:", A @ x)
A @ B:
[[ 5 8]
[14 22]]

Dot product: 32

Det(A): 1.0
Inv(A):
[[ 3. -1.]
[-5. 2.]]

Solution x: [1. 1.]
Verify A @ x: [ 3. 10.]
  • Use @ for matrix multiplication — it is cleaner than np.matmul() and the standard in modern Python
  • np.linalg.solve(A, b) is numerically more stable than np.linalg.inv(A) @ b — prefer it for solving equations
  • Other useful linalg functions: np.linalg.norm (vector/matrix norm), np.linalg.eig (eigenvalues/vectors), np.linalg.svd (singular value decomposition)

Summary Table

Concept Key Functions Purpose
Array creation array, zeros, ones, arange, linspace Build arrays from data or patterns
Indexing a[i], a[r,c], a[mask], a[[i,j]] Select elements by position, condition, or index list
Vectorised ops +, -, *, /, np.sin, np.sqrt Element-wise operations — no Python loop
Aggregation sum, mean, std, min, max, argmin, argmax Reduce arrays to summary statistics
Broadcasting Automatic shape expansion Operate on arrays of different shapes
Reshaping reshape, flatten, T, vstack, hstack Change array dimensions and combine arrays
Linear algebra @, np.dot, linalg.inv, linalg.solve Matrix operations for ML and science

Practice Questions

Practice 1. What is the key reason NumPy array operations are faster than Python list loops?



Practice 2. What does m[:, 1] select from a 2D array?



Practice 3. What does a.reshape(4, -1) do when a has 12 elements?



Practice 4. Why should you use np.linalg.solve(A, b) instead of np.linalg.inv(A) @ b?



Practice 5. What is the difference between a NumPy slice (view) and a copy?



Quiz

Quiz 1. What does np.linspace(0, 1, 5) return?






Quiz 2. What does boolean indexing a[a > 10] return?






Quiz 3. What operator is used for matrix multiplication in modern NumPy and Python?






Quiz 4. What does data.mean(axis=0) compute for a 2D array?






Quiz 5. What is the broadcasting rule that allows a shape (3,) array to be added to a shape (3, 3) matrix?






Next up — Machine Learning with Python: building and evaluating models using scikit-learn.