Python Lesson 43 – NumPy | Dataplexa

NumPy

NumPy is the foundation of scientific computing in Python. Nearly every data science, machine learning, and numerical library — pandas, scikit-learn, TensorFlow, PyTorch — is built on top of it. Its core contribution is the ndarray: a fixed-type, multi-dimensional array that stores data in contiguous memory and executes operations in compiled C code. The result is numerical computation that is 10 to 100 times faster than equivalent pure Python loops.

This lesson covers array creation, indexing, vectorised operations, broadcasting, linear algebra, and the performance principles that make NumPy indispensable.

Why NumPy Beats Python Lists for Numerical Work

Python lists are flexible but slow for numerical computation — each element is a full Python object with overhead, and loops in Python are interpreted one step at a time. NumPy arrays store raw numbers in a contiguous block of memory and perform operations in C — no Python loop, no object overhead.

# NumPy vs Python list — speed comparison

import numpy as np
import time

n = 10_000_000

# Python list — slow loop
py_list = list(range(n))
start = time.perf_counter()
result = [x * 2 for x in py_list]
print(f"Python list: {time.perf_counter() - start:.3f}s")

# NumPy array — vectorised C operation
arr = np.arange(n)
start = time.perf_counter()
result = arr * 2
print(f"NumPy array: {time.perf_counter() - start:.3f}s")

# Memory comparison
import sys
print(f"\nPython list size: {sys.getsizeof(py_list):,} bytes")
print(f"NumPy array size: {arr.nbytes:,} bytes")

Python list: 0.412s
NumPy array: 0.008s

Python list size: 89,095,160 bytes
NumPy array size: 80,000,000 bytes

NumPy is roughly 50x faster here — the gap grows with array size and operation complexity
NumPy arrays are also more memory-efficient — a Python list of ints stores object pointers, not raw numbers
NumPy's speed comes from vectorisation: one C operation applied to the entire array, no Python loop

1. Creating Arrays

NumPy provides many ways to create arrays. Choosing the right creation function saves time and keeps code readable.

# Array creation — the most common patterns

import numpy as np

# From a Python list
a = np.array([1, 2, 3, 4, 5])
print("1D array:", a)
print("dtype:", a.dtype)       # int64 by default
print("shape:", a.shape)       # (5,)

# 2D array (matrix) from nested list
m = np.array([[1, 2, 3],
              [4, 5, 6]])
print("\n2D array:\n", m)
print("shape:", m.shape)       # (2, 3) — 2 rows, 3 columns

# Filled arrays
print("\nzeros:  ", np.zeros(5))
print("ones:   ", np.ones((2, 3)))
print("full:   ", np.full((2, 2), 7))
print("eye:    \n", np.eye(3))      # 3x3 identity matrix

# Range-based
print("arange: ", np.arange(0, 10, 2))        # 0,2,4,6,8
print("linspace:", np.linspace(0, 1, 5))       # 5 evenly spaced from 0 to 1

# Random
rng = np.random.default_rng(seed=42)           # reproducible
print("random: ", rng.integers(0, 10, size=5))
print("normal: ", rng.standard_normal(4).round(3))

1D array: [1 2 3 4 5]
dtype: int64
shape: (5,)

2D array:
[[1 2 3]
[4 5 6]]
shape: (2, 3)

zeros: [0. 0. 0. 0. 0.]
ones: [[1. 1. 1.]
[1. 1. 1.]]
full: [[7 7]
[7 7]]
eye:
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
arange: [0 2 4 6 8]
linspace: [0. 0.25 0.5 0.75 1. ]
random: [0 7 6 4 4]
normal: [ 0.304 1.765 -0.977 -0.151]

np.array() — from existing data; np.zeros/ones/full — pre-filled; np.arange/linspace — range-based
shape is a tuple: (rows, columns) for 2D, (depth, rows, columns) for 3D
Use np.random.default_rng(seed=) for reproducible random arrays — the modern NumPy random API
Specify dtype explicitly when needed: np.zeros(5, dtype=np.float32)

2. Indexing and Slicing

NumPy indexing extends Python list slicing to multiple dimensions and adds powerful boolean and fancy indexing.

# Indexing and slicing — 1D, 2D, boolean, fancy

import numpy as np

a = np.array([10, 20, 30, 40, 50])

# Basic indexing and slicing — same as Python lists
print(a[0])       # 10
print(a[-1])      # 50
print(a[1:4])     # [20 30 40]
print(a[::2])     # [10 30 50]

# 2D indexing — [row, column]
m = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

print(m[0, 2])    # 3   — row 0, col 2
print(m[1, :])    # [4 5 6]  — entire row 1
print(m[:, 1])    # [2 5 8]  — entire column 1
print(m[0:2, 1:]) # [[2 3], [5 6]] — sub-matrix

# Boolean indexing — select elements matching a condition
a = np.array([15, 3, 42, 7, 28, 11])
mask = a > 10
print("\nMask:  ", mask)
print("Above 10:", a[mask])           # [15 42 28 11]
print("Even:    ", a[a % 2 == 0])     # [42 28]

# Fancy indexing — select by list of indices
print("Fancy:   ", a[[0, 2, 4]])      # [15 42 28]

10
50
[20 30 40]
[10 30 50]
3
[4 5 6]
[2 5 8]
[[2 3]
[5 6]]

Mask: [ True False True False True True]
Above 10: [15 42 28 11]
Even: [42 28]
Fancy: [15 42 28]

2D indexing uses [row, col] — not [row][col] like nested Python lists
: selects all elements along an axis — m[:, 1] means all rows, column 1
Boolean indexing returns a new array of elements where the mask is True — extremely useful for filtering
NumPy slices return views, not copies — modifying a slice modifies the original array. Use .copy() to avoid this.

3. Vectorised Operations

In NumPy, arithmetic operators work element-wise on entire arrays without writing a single loop. These are called ufuncs — universal functions implemented in C.

# Vectorised operations — element-wise, no loops

import numpy as np

prices = np.array([9.99, 24.99, 4.99, 49.99, 14.99])
qty    = np.array([3, 1, 10, 2, 5])

# Arithmetic — applied to every element simultaneously
revenue   = prices * qty
discounted = prices * 0.9   # 10% off every price

print("Revenue:    ", revenue.round(2))
print("Discounted: ", discounted.round(2))
print("Total revenue: $", revenue.sum().round(2))

# Universal functions — applied element-wise
angles = np.linspace(0, np.pi, 5)
print("\nAngles:  ", angles.round(3))
print("sin:     ", np.sin(angles).round(3))
print("sqrt:    ", np.sqrt(np.array([1, 4, 9, 16, 25])))

# Aggregation
data = np.array([14, 7, 23, 5, 31, 18, 9])
print("\nSum:   ", data.sum())
print("Mean:  ", data.mean().round(2))
print("Std:   ", data.std().round(2))
print("Min:   ", data.min(), " at index", data.argmin())
print("Max:   ", data.max(), " at index", data.argmax())

Revenue: [ 29.97 24.99 49.9 99.98 74.95]
Discounted: [ 8.99 22.49 4.49 44.99 13.49]
Total revenue: $ 279.79

Angles: [0. 0.785 1.571 2.356 3.142]
sin: [0. 0.707 1. 0.707 0. ]
sqrt: [1. 2. 3. 4. 5.]

Sum: 107
Mean: 15.29
Std: 8.73
Min: 5 at index 3
Max: 31 at index 4

Arithmetic operators (+, -, *, /, **) are all vectorised — applied to every element
NumPy ufuncs: np.sin, np.cos, np.sqrt, np.exp, np.log, np.abs — all element-wise
Aggregation methods: sum, mean, std, min, max, argmin, argmax, cumsum
For 2D arrays, pass axis=0 (along columns) or axis=1 (along rows) to aggregate in one direction

4. Broadcasting

Broadcasting lets NumPy perform operations on arrays with different shapes — it automatically expands smaller arrays to match larger ones without copying data. This is one of NumPy's most powerful and frequently misunderstood features.

# Broadcasting — operating on arrays of different shapes

import numpy as np

# Scalar broadcasts to every element
a = np.array([1, 2, 3, 4, 5])
print(a + 10)        # [11 12 13 14 15]
print(a * 2)         # [ 2  4  6  8 10]

# 1D array broadcasts across rows of a 2D array
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

row_bias = np.array([10, 20, 30])   # shape (3,)
print("\nMatrix + row_bias:")
print(matrix + row_bias)            # adds [10,20,30] to each row

# Column broadcast — reshape to (3,1)
col_bias = np.array([[100], [200], [300]])   # shape (3,1)
print("\nMatrix + col_bias:")
print(matrix + col_bias)            # adds 100/200/300 to each column

# Practical: normalise each column to zero mean
data = np.array([[10, 200, 0.5],
                 [20, 400, 1.0],
                 [30, 600, 1.5]])
means = data.mean(axis=0)           # mean of each column
normalised = data - means           # broadcast subtraction
print("\nNormalised (zero mean per column):")
print(normalised)

[11 12 13 14 15]
[ 2 4 6 8 10]

Matrix + row_bias:
[[11 22 33]
[14 25 36]
[17 28 39]]

Matrix + col_bias:
[[101 102 103]
[204 205 206]
[307 308 309]]

Normalised (zero mean per column):
[[-10. -200. -0.5]
[ 0. 0. 0. ]
[ 10. 200. 0.5]]

Broadcasting rule: dimensions are compared from the right — they must either be equal or one of them must be 1
Scalar is broadcast to every element; 1D array is broadcast across rows; (n,1) array broadcasts across columns
Column-mean normalisation is a one-liner in NumPy thanks to broadcasting — no loop needed

5. Reshaping and Stacking

# Reshaping and stacking arrays

import numpy as np

a = np.arange(12)
print("Original:", a)

# Reshape — total elements must stay the same
m = a.reshape(3, 4)    # 3 rows, 4 columns
print("Reshaped (3,4):\n", m)

# -1 means "infer this dimension"
m2 = a.reshape(4, -1)  # 4 rows, infer columns → (4,3)
print("Reshaped (4,-1):\n", m2)

# Flatten back to 1D
print("Flattened:", m.flatten())

# Transpose
print("Transposed:\n", m.T)   # rows become columns

# Stacking arrays
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

print("\nvstack:\n", np.vstack([x, y]))   # vertical — new rows
print("hstack:  ", np.hstack([x, y]))    # horizontal — extend columns

Original: [ 0 1 2 3 4 5 6 7 8 9 10 11]
Reshaped (3,4):
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Reshaped (4,-1):
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
Flattened: [ 0 1 2 3 4 5 6 7 8 9 10 11]
Transposed:
[[ 0 4 8]
[ 1 5 9]
[ 2 6 10]
[ 3 7 11]]

vstack:
[[1 2 3]
[4 5 6]]
hstack: [1 2 3 4 5 6]

6. Linear Algebra

NumPy's linalg module provides the matrix operations that underpin machine learning — dot products, matrix multiplication, determinants, inverses, and eigenvalues.

# Linear algebra with numpy.linalg

import numpy as np

A = np.array([[2, 1],
              [5, 3]])
B = np.array([[1, 2],
              [3, 4]])

# Matrix multiplication — @ operator or np.matmul
print("A @ B:\n", A @ B)

# Dot product of two vectors
v = np.array([1, 2, 3])
w = np.array([4, 5, 6])
print("\nDot product:", np.dot(v, w))    # 1*4 + 2*5 + 3*6 = 32

# Determinant and inverse
print("\nDet(A):", np.linalg.det(A))
print("Inv(A):\n", np.linalg.inv(A))

# Solve system of equations: Ax = b
b = np.array([3, 10])
x = np.linalg.solve(A, b)
print("\nSolution x:", x)   # verify: A @ x should equal b
print("Verify A @ x:", A @ x)

A @ B:
[[ 5 8]
[14 22]]

Dot product: 32

Det(A): 1.0
Inv(A):
[[ 3. -1.]
[-5. 2.]]

Solution x: [1. 1.]
Verify A @ x: [ 3. 10.]

Use @ for matrix multiplication — it is cleaner than np.matmul() and the standard in modern Python
np.linalg.solve(A, b) is numerically more stable than np.linalg.inv(A) @ b — prefer it for solving equations
Other useful linalg functions: np.linalg.norm (vector/matrix norm), np.linalg.eig (eigenvalues/vectors), np.linalg.svd (singular value decomposition)

Summary Table

Concept	Key Functions	Purpose
Array creation	`array, zeros, ones, arange, linspace`	Build arrays from data or patterns
Indexing	`a[i], a[r,c], a[mask], a[[i,j]]`	Select elements by position, condition, or index list
Vectorised ops	`+, -, *, /, np.sin, np.sqrt`	Element-wise operations — no Python loop
Aggregation	`sum, mean, std, min, max, argmin, argmax`	Reduce arrays to summary statistics
Broadcasting	Automatic shape expansion	Operate on arrays of different shapes
Reshaping	`reshape, flatten, T, vstack, hstack`	Change array dimensions and combine arrays
Linear algebra	`@, np.dot, linalg.inv, linalg.solve`	Matrix operations for ML and science

Practice Questions

Practice 1. What is the key reason NumPy array operations are faster than Python list loops?

Practice 2. What does m[:, 1] select from a 2D array?

Practice 3. What does a.reshape(4, -1) do when a has 12 elements?

Practice 4. Why should you use np.linalg.solve(A, b) instead of np.linalg.inv(A) @ b?

Practice 5. What is the difference between a NumPy slice (view) and a copy?

Quiz

Quiz 1. What does np.linspace(0, 1, 5) return?

5 evenly spaced values from 0 to 1 inclusive: [0.0, 0.25, 0.5, 0.75, 1.0]
5 values from 0 to 1 exclusive: [0.0, 0.2, 0.4, 0.6, 0.8]
Integers 0 through 4
A range with step size 5

Quiz 2. What does boolean indexing a[a > 10] return?

A new array containing only the elements of a that are greater than 10
The indices of elements greater than 10
A boolean array of True/False values
The count of elements greater than 10

Quiz 3. What operator is used for matrix multiplication in modern NumPy and Python?

@ (the matmul operator)
* (element-wise multiply)
** (power operator)
x (cross product symbol)

Quiz 4. What does data.mean(axis=0) compute for a 2D array?

The mean of each column — axis=0 collapses rows
The mean of each row — axis=0 collapses columns
The overall mean of all elements
The mean of the first row only

Quiz 5. What is the broadcasting rule that allows a shape (3,) array to be added to a shape (3, 3) matrix?

The 1D array is broadcast across each row — treated as if it were copied into all 3 rows
NumPy raises a shape mismatch error
The 1D array is broadcast down each column
The matrix is flattened to 1D before addition

Next up — Machine Learning with Python: building and evaluating models using scikit-learn.

← Previous Course Index Next →

Python Course