Python Course
Multiprocessing
In the last lesson you saw that Python's Global Interpreter Lock prevents threads from running CPU-bound code in true parallel. Multiprocessing is the solution. Instead of multiple threads sharing one process, multiprocessing spawns entirely separate processes — each with its own Python interpreter and memory space, each with its own GIL. True parallelism, full use of every CPU core.
This lesson covers Python's multiprocessing module, the ProcessPoolExecutor, inter-process communication, shared memory, and the practical patterns for squeezing maximum performance out of CPU-intensive work.
Processes vs Threads — The Key Differences
- Threads share memory within one process — lightweight, fast to create, limited by the GIL for CPU work
- Processes have completely separate memory — heavier to start, immune to the GIL, true parallelism on multiple cores
- Use threads for I/O-bound work: downloading files, making API calls, reading databases
- Use processes for CPU-bound work: image processing, number crunching, machine learning preprocessing, compression
Creating Processes
multiprocessing.Process works almost identically to threading.Thread — pass a target function, call start(), then join(). The critical difference is that each process runs in a completely separate memory space.
Real-world use: a video transcoding pipeline spawns one process per video file — each process runs on its own CPU core, transcoding all files simultaneously.
# Creating processes — CPU-bound work in parallel
import multiprocessing
import time
def cpu_task(name, n):
"""Compute-heavy task — sum of squares."""
result = sum(i * i for i in range(n))
print(f"[{name}] Result: {result:,}")
if __name__ == "__main__": # required guard on Windows and macOS
n = 5_000_000
# Sequential — one after another
start = time.perf_counter()
cpu_task("Task-1", n)
cpu_task("Task-2", n)
cpu_task("Task-3", n)
print(f"Sequential: {time.perf_counter() - start:.2f}s\n")
# Parallel — all three at once on separate CPU cores
start = time.perf_counter()
processes = [
multiprocessing.Process(target=cpu_task, args=(f"Task-{i}", n))
for i in range(1, 4)
]
for p in processes: p.start()
for p in processes: p.join()
print(f"Parallel: {time.perf_counter() - start:.2f}s")[Task-2] Result: 41,666,658,333,325,000
[Task-3] Result: 41,666,658,333,325,000
Sequential: 4.82s
[Task-2] Result: 41,666,658,333,325,000
[Task-1] Result: 41,666,658,333,325,000
[Task-3] Result: 41,666,658,333,325,000
Parallel: 1.74s
- The
if __name__ == "__main__":guard is required on Windows and macOS — without it, spawning new processes causes infinite recursion - Each process gets a full copy of the program's memory at the time of spawning — changes in one process do not affect others
p.start()launches the process,p.join()waits for it to finish — same pattern as threads- Spawning processes is slower than threads — only worthwhile for tasks that take more than a fraction of a second
ProcessPoolExecutor — The Modern Approach
concurrent.futures.ProcessPoolExecutor is the high-level, recommended way to use multiprocessing. It manages a pool of worker processes, distributes work automatically, and collects results cleanly.
Real-world use: processing a dataset of 100,000 images — submit all resize operations to the pool and collect results as they finish, using all available CPU cores automatically.
# ProcessPoolExecutor — high-level process pool
from concurrent.futures import ProcessPoolExecutor, as_completed
import time
def is_prime(n):
"""CPU-bound — check if a number is prime."""
if n < 2: return False
for i in range(2, int(n**0.5) + 1):
if n % i == 0:
return False
return True
def count_primes(start, end):
"""Count primes in a range."""
return sum(1 for n in range(start, end) if is_prime(n))
if __name__ == "__main__":
ranges = [(0, 250_000), (250_000, 500_000),
(500_000, 750_000), (750_000, 1_000_000)]
start = time.perf_counter()
with ProcessPoolExecutor() as executor:
futures = [executor.submit(count_primes, s, e) for s, e in ranges]
results = [f.result() for f in futures]
total = sum(results)
print(f"Primes under 1,000,000: {total:,}")
print(f"Time: {time.perf_counter() - start:.2f}s")Time: 0.81s
ProcessPoolExecutor()with no argument defaults to the number of CPU cores on the machineexecutor.submit(fn, *args)schedules work and returns aFuture— identical API toThreadPoolExecutorexecutor.map(fn, iterable)is a simpler form that returns results in submission order- The
withblock waits for all tasks and shuts down the pool cleanly
Pool.map — Simple Parallel Mapping
For the common pattern of applying one function to many inputs, multiprocessing.Pool.map is the most concise tool. It splits the work across processes and collects results in the original order.
# Pool.map — apply a function to many inputs in parallel
from multiprocessing import Pool
import time
def square(n):
return n * n
if __name__ == "__main__":
numbers = list(range(1, 13))
start = time.perf_counter()
with Pool() as pool:
results = pool.map(square, numbers) # distributes across all cores
print("Squares:", results)
print(f"Time: {time.perf_counter() - start:.4f}s")
# pool.starmap for functions that take multiple arguments
pairs = [(2, 3), (4, 5), (6, 7)]
with Pool() as pool:
results = pool.starmap(pow, pairs) # [8, 1024, 279936]
print("Powers:", results)Time: 0.0821s
Powers: [8, 1024, 279936]
pool.map(fn, iterable)— apply fn to every item; results returned in orderpool.starmap(fn, iterable_of_tuples)— like map but unpacks each tuple as separate argumentspool.imap(fn, iterable)— lazy version, yields results one at a time without loading all into memory- Always use
with Pool() as pool:to ensure the pool is properly terminated
Inter-Process Communication — Queue and Pipe
Processes cannot share variables directly — each has its own memory. To pass data between processes, use Queue (multi-producer, multi-consumer) or Pipe (two-way connection between exactly two processes).
# Queue — safe communication between processes
from multiprocessing import Process, Queue
def producer(q, items):
for item in items:
q.put(item)
print(f"Produced: {item}")
q.put(None) # sentinel — signals consumer to stop
def consumer(q):
while True:
item = q.get()
if item is None:
break
print(f"Consumed: {item * 2}")
if __name__ == "__main__":
q = Queue()
p1 = Process(target=producer, args=(q, [10, 20, 30, 40]))
p2 = Process(target=consumer, args=(q,))
p1.start()
p2.start()
p1.join()
p2.join()Produced: 20
Produced: 30
Produced: 40
Consumed: 20
Consumed: 40
Consumed: 60
Consumed: 80
Queue.put(item)adds to the queue;Queue.get()removes and returns the next item — both are process-safe- Use a sentinel value (like
None) to signal a consumer that no more items are coming multiprocessing.Pipe()returns a pair of connection objects for two-way communication between exactly two processes
Shared Memory — Value and Array
For simple cases where processes need to share a single number or a fixed-size array, multiprocessing.Value and multiprocessing.Array provide shared memory with process-safe locking.
# Shared memory — Value and Array
from multiprocessing import Process, Value, Array
import ctypes
def increment(counter, n):
for _ in range(n):
with counter.get_lock(): # process-safe lock
counter.value += 1
if __name__ == "__main__":
counter = Value(ctypes.c_int, 0) # shared integer, starting at 0
processes = [Process(target=increment, args=(counter, 50_000)) for _ in range(4)]
for p in processes: p.start()
for p in processes: p.join()
print("Final counter:", counter.value) # always 200,000Value(typecode, initial)— shared scalar; use C type codes likectypes.c_int,ctypes.c_doubleArray(typecode, size)— shared fixed-size array- Always use
value.get_lock()as a context manager when modifying shared values from multiple processes - For anything more complex, use a
QueueorManagerinstead
Choosing Between Threads and Processes
| Factor | Threads | Processes |
|---|---|---|
| Best for | I/O-bound (network, disk, DB) | CPU-bound (computation, image processing) |
| Memory | Shared — fast, but needs locking | Separate — safe, but needs IPC |
| GIL impact | Limited by GIL for CPU work | Each process has its own GIL — true parallelism |
| Startup cost | Fast — milliseconds | Slower — tens of milliseconds |
| Communication | Direct shared variables (with locks) | Queue, Pipe, Value, Array |
Practice Questions
Practice 1. Why does multiprocessing bypass the GIL where multithreading cannot?
Practice 2. Why is the if __name__ == "__main__": guard required when using multiprocessing?
Practice 3. What is the difference between pool.map() and pool.starmap()?
Practice 4. What tool do you use to safely pass data between two processes?
Practice 5. What lock method should you use when modifying a shared Value from multiple processes?
Quiz
Quiz 1. What is the key reason to use multiprocessing over multithreading for CPU-bound tasks?
Quiz 2. What does ProcessPoolExecutor() with no arguments default to?
Quiz 3. Why can processes not share variables directly the way threads can?
Quiz 4. What is the role of a sentinel value like None in a multiprocessing Queue?
Quiz 5. When would you choose threads over processes even for a performance-sensitive task?
Next up — Working with APIs: making HTTP requests, handling responses, and consuming REST APIs in Python.