Python Course
NumPy
NumPy is the foundation of scientific computing in Python. Nearly every data science, machine learning, and numerical library — pandas, scikit-learn, TensorFlow, PyTorch — is built on top of it. Its core contribution is the ndarray: a fixed-type, multi-dimensional array that stores data in contiguous memory and executes operations in compiled C code. The result is numerical computation that is 10 to 100 times faster than equivalent pure Python loops.
This lesson covers array creation, indexing, vectorised operations, broadcasting, linear algebra, and the performance principles that make NumPy indispensable.
Why NumPy Beats Python Lists for Numerical Work
Python lists are flexible but slow for numerical computation — each element is a full Python object with overhead, and loops in Python are interpreted one step at a time. NumPy arrays store raw numbers in a contiguous block of memory and perform operations in C — no Python loop, no object overhead.
# NumPy vs Python list — speed comparison
import numpy as np
import time
n = 10_000_000
# Python list — slow loop
py_list = list(range(n))
start = time.perf_counter()
result = [x * 2 for x in py_list]
print(f"Python list: {time.perf_counter() - start:.3f}s")
# NumPy array — vectorised C operation
arr = np.arange(n)
start = time.perf_counter()
result = arr * 2
print(f"NumPy array: {time.perf_counter() - start:.3f}s")
# Memory comparison
import sys
print(f"\nPython list size: {sys.getsizeof(py_list):,} bytes")
print(f"NumPy array size: {arr.nbytes:,} bytes")NumPy array: 0.008s
Python list size: 89,095,160 bytes
NumPy array size: 80,000,000 bytes
- NumPy is roughly 50x faster here — the gap grows with array size and operation complexity
- NumPy arrays are also more memory-efficient — a Python list of ints stores object pointers, not raw numbers
- NumPy's speed comes from vectorisation: one C operation applied to the entire array, no Python loop
1. Creating Arrays
NumPy provides many ways to create arrays. Choosing the right creation function saves time and keeps code readable.
# Array creation — the most common patterns
import numpy as np
# From a Python list
a = np.array([1, 2, 3, 4, 5])
print("1D array:", a)
print("dtype:", a.dtype) # int64 by default
print("shape:", a.shape) # (5,)
# 2D array (matrix) from nested list
m = np.array([[1, 2, 3],
[4, 5, 6]])
print("\n2D array:\n", m)
print("shape:", m.shape) # (2, 3) — 2 rows, 3 columns
# Filled arrays
print("\nzeros: ", np.zeros(5))
print("ones: ", np.ones((2, 3)))
print("full: ", np.full((2, 2), 7))
print("eye: \n", np.eye(3)) # 3x3 identity matrix
# Range-based
print("arange: ", np.arange(0, 10, 2)) # 0,2,4,6,8
print("linspace:", np.linspace(0, 1, 5)) # 5 evenly spaced from 0 to 1
# Random
rng = np.random.default_rng(seed=42) # reproducible
print("random: ", rng.integers(0, 10, size=5))
print("normal: ", rng.standard_normal(4).round(3))dtype: int64
shape: (5,)
2D array:
[[1 2 3]
[4 5 6]]
shape: (2, 3)
zeros: [0. 0. 0. 0. 0.]
ones: [[1. 1. 1.]
[1. 1. 1.]]
full: [[7 7]
[7 7]]
eye:
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
arange: [0 2 4 6 8]
linspace: [0. 0.25 0.5 0.75 1. ]
random: [0 7 6 4 4]
normal: [ 0.304 1.765 -0.977 -0.151]
np.array()— from existing data;np.zeros/ones/full— pre-filled;np.arange/linspace— range-basedshapeis a tuple:(rows, columns)for 2D,(depth, rows, columns)for 3D- Use
np.random.default_rng(seed=)for reproducible random arrays — the modern NumPy random API - Specify dtype explicitly when needed:
np.zeros(5, dtype=np.float32)
2. Indexing and Slicing
NumPy indexing extends Python list slicing to multiple dimensions and adds powerful boolean and fancy indexing.
# Indexing and slicing — 1D, 2D, boolean, fancy
import numpy as np
a = np.array([10, 20, 30, 40, 50])
# Basic indexing and slicing — same as Python lists
print(a[0]) # 10
print(a[-1]) # 50
print(a[1:4]) # [20 30 40]
print(a[::2]) # [10 30 50]
# 2D indexing — [row, column]
m = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(m[0, 2]) # 3 — row 0, col 2
print(m[1, :]) # [4 5 6] — entire row 1
print(m[:, 1]) # [2 5 8] — entire column 1
print(m[0:2, 1:]) # [[2 3], [5 6]] — sub-matrix
# Boolean indexing — select elements matching a condition
a = np.array([15, 3, 42, 7, 28, 11])
mask = a > 10
print("\nMask: ", mask)
print("Above 10:", a[mask]) # [15 42 28 11]
print("Even: ", a[a % 2 == 0]) # [42 28]
# Fancy indexing — select by list of indices
print("Fancy: ", a[[0, 2, 4]]) # [15 42 28]50
[20 30 40]
[10 30 50]
3
[4 5 6]
[2 5 8]
[[2 3]
[5 6]]
Mask: [ True False True False True True]
Above 10: [15 42 28 11]
Even: [42 28]
Fancy: [15 42 28]
- 2D indexing uses
[row, col]— not[row][col]like nested Python lists :selects all elements along an axis —m[:, 1]means all rows, column 1- Boolean indexing returns a new array of elements where the mask is
True— extremely useful for filtering - NumPy slices return views, not copies — modifying a slice modifies the original array. Use
.copy()to avoid this.
3. Vectorised Operations
In NumPy, arithmetic operators work element-wise on entire arrays without writing a single loop. These are called ufuncs — universal functions implemented in C.
# Vectorised operations — element-wise, no loops
import numpy as np
prices = np.array([9.99, 24.99, 4.99, 49.99, 14.99])
qty = np.array([3, 1, 10, 2, 5])
# Arithmetic — applied to every element simultaneously
revenue = prices * qty
discounted = prices * 0.9 # 10% off every price
print("Revenue: ", revenue.round(2))
print("Discounted: ", discounted.round(2))
print("Total revenue: $", revenue.sum().round(2))
# Universal functions — applied element-wise
angles = np.linspace(0, np.pi, 5)
print("\nAngles: ", angles.round(3))
print("sin: ", np.sin(angles).round(3))
print("sqrt: ", np.sqrt(np.array([1, 4, 9, 16, 25])))
# Aggregation
data = np.array([14, 7, 23, 5, 31, 18, 9])
print("\nSum: ", data.sum())
print("Mean: ", data.mean().round(2))
print("Std: ", data.std().round(2))
print("Min: ", data.min(), " at index", data.argmin())
print("Max: ", data.max(), " at index", data.argmax())Discounted: [ 8.99 22.49 4.49 44.99 13.49]
Total revenue: $ 279.79
Angles: [0. 0.785 1.571 2.356 3.142]
sin: [0. 0.707 1. 0.707 0. ]
sqrt: [1. 2. 3. 4. 5.]
Sum: 107
Mean: 15.29
Std: 8.73
Min: 5 at index 3
Max: 31 at index 4
- Arithmetic operators (
+,-,*,/,**) are all vectorised — applied to every element - NumPy ufuncs:
np.sin,np.cos,np.sqrt,np.exp,np.log,np.abs— all element-wise - Aggregation methods:
sum,mean,std,min,max,argmin,argmax,cumsum - For 2D arrays, pass
axis=0(along columns) oraxis=1(along rows) to aggregate in one direction
4. Broadcasting
Broadcasting lets NumPy perform operations on arrays with different shapes — it automatically expands smaller arrays to match larger ones without copying data. This is one of NumPy's most powerful and frequently misunderstood features.
# Broadcasting — operating on arrays of different shapes
import numpy as np
# Scalar broadcasts to every element
a = np.array([1, 2, 3, 4, 5])
print(a + 10) # [11 12 13 14 15]
print(a * 2) # [ 2 4 6 8 10]
# 1D array broadcasts across rows of a 2D array
matrix = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
row_bias = np.array([10, 20, 30]) # shape (3,)
print("\nMatrix + row_bias:")
print(matrix + row_bias) # adds [10,20,30] to each row
# Column broadcast — reshape to (3,1)
col_bias = np.array([[100], [200], [300]]) # shape (3,1)
print("\nMatrix + col_bias:")
print(matrix + col_bias) # adds 100/200/300 to each column
# Practical: normalise each column to zero mean
data = np.array([[10, 200, 0.5],
[20, 400, 1.0],
[30, 600, 1.5]])
means = data.mean(axis=0) # mean of each column
normalised = data - means # broadcast subtraction
print("\nNormalised (zero mean per column):")
print(normalised)[ 2 4 6 8 10]
Matrix + row_bias:
[[11 22 33]
[14 25 36]
[17 28 39]]
Matrix + col_bias:
[[101 102 103]
[204 205 206]
[307 308 309]]
Normalised (zero mean per column):
[[-10. -200. -0.5]
[ 0. 0. 0. ]
[ 10. 200. 0.5]]
- Broadcasting rule: dimensions are compared from the right — they must either be equal or one of them must be 1
- Scalar is broadcast to every element; 1D array is broadcast across rows; (n,1) array broadcasts across columns
- Column-mean normalisation is a one-liner in NumPy thanks to broadcasting — no loop needed
5. Reshaping and Stacking
# Reshaping and stacking arrays
import numpy as np
a = np.arange(12)
print("Original:", a)
# Reshape — total elements must stay the same
m = a.reshape(3, 4) # 3 rows, 4 columns
print("Reshaped (3,4):\n", m)
# -1 means "infer this dimension"
m2 = a.reshape(4, -1) # 4 rows, infer columns → (4,3)
print("Reshaped (4,-1):\n", m2)
# Flatten back to 1D
print("Flattened:", m.flatten())
# Transpose
print("Transposed:\n", m.T) # rows become columns
# Stacking arrays
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
print("\nvstack:\n", np.vstack([x, y])) # vertical — new rows
print("hstack: ", np.hstack([x, y])) # horizontal — extend columnsReshaped (3,4):
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Reshaped (4,-1):
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
Flattened: [ 0 1 2 3 4 5 6 7 8 9 10 11]
Transposed:
[[ 0 4 8]
[ 1 5 9]
[ 2 6 10]
[ 3 7 11]]
vstack:
[[1 2 3]
[4 5 6]]
hstack: [1 2 3 4 5 6]
6. Linear Algebra
NumPy's linalg module provides the matrix operations that underpin machine learning — dot products, matrix multiplication, determinants, inverses, and eigenvalues.
# Linear algebra with numpy.linalg
import numpy as np
A = np.array([[2, 1],
[5, 3]])
B = np.array([[1, 2],
[3, 4]])
# Matrix multiplication — @ operator or np.matmul
print("A @ B:\n", A @ B)
# Dot product of two vectors
v = np.array([1, 2, 3])
w = np.array([4, 5, 6])
print("\nDot product:", np.dot(v, w)) # 1*4 + 2*5 + 3*6 = 32
# Determinant and inverse
print("\nDet(A):", np.linalg.det(A))
print("Inv(A):\n", np.linalg.inv(A))
# Solve system of equations: Ax = b
b = np.array([3, 10])
x = np.linalg.solve(A, b)
print("\nSolution x:", x) # verify: A @ x should equal b
print("Verify A @ x:", A @ x)[[ 5 8]
[14 22]]
Dot product: 32
Det(A): 1.0
Inv(A):
[[ 3. -1.]
[-5. 2.]]
Solution x: [1. 1.]
Verify A @ x: [ 3. 10.]
- Use
@for matrix multiplication — it is cleaner thannp.matmul()and the standard in modern Python np.linalg.solve(A, b)is numerically more stable thannp.linalg.inv(A) @ b— prefer it for solving equations- Other useful linalg functions:
np.linalg.norm(vector/matrix norm),np.linalg.eig(eigenvalues/vectors),np.linalg.svd(singular value decomposition)
Summary Table
| Concept | Key Functions | Purpose |
|---|---|---|
| Array creation | array, zeros, ones, arange, linspace |
Build arrays from data or patterns |
| Indexing | a[i], a[r,c], a[mask], a[[i,j]] |
Select elements by position, condition, or index list |
| Vectorised ops | +, -, *, /, np.sin, np.sqrt |
Element-wise operations — no Python loop |
| Aggregation | sum, mean, std, min, max, argmin, argmax |
Reduce arrays to summary statistics |
| Broadcasting | Automatic shape expansion | Operate on arrays of different shapes |
| Reshaping | reshape, flatten, T, vstack, hstack |
Change array dimensions and combine arrays |
| Linear algebra | @, np.dot, linalg.inv, linalg.solve |
Matrix operations for ML and science |
Practice Questions
Practice 1. What is the key reason NumPy array operations are faster than Python list loops?
Practice 2. What does m[:, 1] select from a 2D array?
Practice 3. What does a.reshape(4, -1) do when a has 12 elements?
Practice 4. Why should you use np.linalg.solve(A, b) instead of np.linalg.inv(A) @ b?
Practice 5. What is the difference between a NumPy slice (view) and a copy?
Quiz
Quiz 1. What does np.linspace(0, 1, 5) return?
Quiz 2. What does boolean indexing a[a > 10] return?
Quiz 3. What operator is used for matrix multiplication in modern NumPy and Python?
Quiz 4. What does data.mean(axis=0) compute for a 2D array?
Quiz 5. What is the broadcasting rule that allows a shape (3,) array to be added to a shape (3, 3) matrix?
Next up — Machine Learning with Python: building and evaluating models using scikit-learn.