Python Lesson 39 – | Dataplexa

Multiprocessing

In the last lesson you saw that Python's Global Interpreter Lock prevents threads from running CPU-bound code in true parallel. Multiprocessing is the solution. Instead of multiple threads sharing one process, multiprocessing spawns entirely separate processes — each with its own Python interpreter and memory space, each with its own GIL. True parallelism, full use of every CPU core.

This lesson covers Python's multiprocessing module, the ProcessPoolExecutor, inter-process communication, shared memory, and the practical patterns for squeezing maximum performance out of CPU-intensive work.

Processes vs Threads — The Key Differences

Threads share memory within one process — lightweight, fast to create, limited by the GIL for CPU work
Processes have completely separate memory — heavier to start, immune to the GIL, true parallelism on multiple cores
Use threads for I/O-bound work: downloading files, making API calls, reading databases
Use processes for CPU-bound work: image processing, number crunching, machine learning preprocessing, compression

Creating Processes

multiprocessing.Process works almost identically to threading.Thread — pass a target function, call start(), then join(). The critical difference is that each process runs in a completely separate memory space.

Real-world use: a video transcoding pipeline spawns one process per video file — each process runs on its own CPU core, transcoding all files simultaneously.

# Creating processes — CPU-bound work in parallel

import multiprocessing
import time

def cpu_task(name, n):
    """Compute-heavy task — sum of squares."""
    result = sum(i * i for i in range(n))
    print(f"[{name}] Result: {result:,}")

if __name__ == "__main__":   # required guard on Windows and macOS
    n = 5_000_000

    # Sequential — one after another
    start = time.perf_counter()
    cpu_task("Task-1", n)
    cpu_task("Task-2", n)
    cpu_task("Task-3", n)
    print(f"Sequential: {time.perf_counter() - start:.2f}s\n")

    # Parallel — all three at once on separate CPU cores
    start = time.perf_counter()
    processes = [
        multiprocessing.Process(target=cpu_task, args=(f"Task-{i}", n))
        for i in range(1, 4)
    ]
    for p in processes: p.start()
    for p in processes: p.join()
    print(f"Parallel: {time.perf_counter() - start:.2f}s")

[Task-1] Result: 41,666,658,333,325,000
[Task-2] Result: 41,666,658,333,325,000
[Task-3] Result: 41,666,658,333,325,000
Sequential: 4.82s

[Task-2] Result: 41,666,658,333,325,000
[Task-1] Result: 41,666,658,333,325,000
[Task-3] Result: 41,666,658,333,325,000
Parallel: 1.74s

The if __name__ == "__main__": guard is required on Windows and macOS — without it, spawning new processes causes infinite recursion
Each process gets a full copy of the program's memory at the time of spawning — changes in one process do not affect others
p.start() launches the process, p.join() waits for it to finish — same pattern as threads
Spawning processes is slower than threads — only worthwhile for tasks that take more than a fraction of a second

ProcessPoolExecutor — The Modern Approach

concurrent.futures.ProcessPoolExecutor is the high-level, recommended way to use multiprocessing. It manages a pool of worker processes, distributes work automatically, and collects results cleanly.

Real-world use: processing a dataset of 100,000 images — submit all resize operations to the pool and collect results as they finish, using all available CPU cores automatically.

# ProcessPoolExecutor — high-level process pool

from concurrent.futures import ProcessPoolExecutor, as_completed
import time

def is_prime(n):
    """CPU-bound — check if a number is prime."""
    if n < 2: return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

def count_primes(start, end):
    """Count primes in a range."""
    return sum(1 for n in range(start, end) if is_prime(n))

if __name__ == "__main__":
    ranges = [(0, 250_000), (250_000, 500_000),
              (500_000, 750_000), (750_000, 1_000_000)]

    start = time.perf_counter()
    with ProcessPoolExecutor() as executor:
        futures = [executor.submit(count_primes, s, e) for s, e in ranges]
        results = [f.result() for f in futures]

    total = sum(results)
    print(f"Primes under 1,000,000: {total:,}")
    print(f"Time: {time.perf_counter() - start:.2f}s")

Primes under 1,000,000: 78,498
Time: 0.81s

ProcessPoolExecutor() with no argument defaults to the number of CPU cores on the machine
executor.submit(fn, *args) schedules work and returns a Future — identical API to ThreadPoolExecutor
executor.map(fn, iterable) is a simpler form that returns results in submission order
The with block waits for all tasks and shuts down the pool cleanly

Pool.map — Simple Parallel Mapping

For the common pattern of applying one function to many inputs, multiprocessing.Pool.map is the most concise tool. It splits the work across processes and collects results in the original order.

# Pool.map — apply a function to many inputs in parallel

from multiprocessing import Pool
import time

def square(n):
    return n * n

if __name__ == "__main__":
    numbers = list(range(1, 13))

    start = time.perf_counter()
    with Pool() as pool:
        results = pool.map(square, numbers)   # distributes across all cores
    print("Squares:", results)
    print(f"Time: {time.perf_counter() - start:.4f}s")

    # pool.starmap for functions that take multiple arguments
    pairs = [(2, 3), (4, 5), (6, 7)]
    with Pool() as pool:
        results = pool.starmap(pow, pairs)   # [8, 1024, 279936]
    print("Powers:", results)

Squares: [1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144]
Time: 0.0821s
Powers: [8, 1024, 279936]

pool.map(fn, iterable) — apply fn to every item; results returned in order
pool.starmap(fn, iterable_of_tuples) — like map but unpacks each tuple as separate arguments
pool.imap(fn, iterable) — lazy version, yields results one at a time without loading all into memory
Always use with Pool() as pool: to ensure the pool is properly terminated

Inter-Process Communication — Queue and Pipe

Processes cannot share variables directly — each has its own memory. To pass data between processes, use Queue (multi-producer, multi-consumer) or Pipe (two-way connection between exactly two processes).

# Queue — safe communication between processes

from multiprocessing import Process, Queue

def producer(q, items):
    for item in items:
        q.put(item)
        print(f"Produced: {item}")
    q.put(None)   # sentinel — signals consumer to stop

def consumer(q):
    while True:
        item = q.get()
        if item is None:
            break
        print(f"Consumed: {item * 2}")

if __name__ == "__main__":
    q = Queue()
    p1 = Process(target=producer, args=(q, [10, 20, 30, 40]))
    p2 = Process(target=consumer, args=(q,))

    p1.start()
    p2.start()
    p1.join()
    p2.join()

Produced: 10
Produced: 20
Produced: 30
Produced: 40
Consumed: 20
Consumed: 40
Consumed: 60
Consumed: 80

Queue.put(item) adds to the queue; Queue.get() removes and returns the next item — both are process-safe
Use a sentinel value (like None) to signal a consumer that no more items are coming
multiprocessing.Pipe() returns a pair of connection objects for two-way communication between exactly two processes

Shared Memory — Value and Array

For simple cases where processes need to share a single number or a fixed-size array, multiprocessing.Value and multiprocessing.Array provide shared memory with process-safe locking.

# Shared memory — Value and Array

from multiprocessing import Process, Value, Array
import ctypes

def increment(counter, n):
    for _ in range(n):
        with counter.get_lock():   # process-safe lock
            counter.value += 1

if __name__ == "__main__":
    counter = Value(ctypes.c_int, 0)   # shared integer, starting at 0
    processes = [Process(target=increment, args=(counter, 50_000)) for _ in range(4)]

    for p in processes: p.start()
    for p in processes: p.join()

    print("Final counter:", counter.value)   # always 200,000

Final counter: 200,000

Value(typecode, initial) — shared scalar; use C type codes like ctypes.c_int, ctypes.c_double
Array(typecode, size) — shared fixed-size array
Always use value.get_lock() as a context manager when modifying shared values from multiple processes
For anything more complex, use a Queue or Manager instead

Choosing Between Threads and Processes

Factor	Threads	Processes
Best for	I/O-bound (network, disk, DB)	CPU-bound (computation, image processing)
Memory	Shared — fast, but needs locking	Separate — safe, but needs IPC
GIL impact	Limited by GIL for CPU work	Each process has its own GIL — true parallelism
Startup cost	Fast — milliseconds	Slower — tens of milliseconds
Communication	Direct shared variables (with locks)	Queue, Pipe, Value, Array

Practice Questions

Practice 1. Why does multiprocessing bypass the GIL where multithreading cannot?

Practice 2. Why is the if __name__ == "__main__": guard required when using multiprocessing?

Practice 3. What is the difference between pool.map() and pool.starmap()?

Practice 4. What tool do you use to safely pass data between two processes?

Practice 5. What lock method should you use when modifying a shared Value from multiple processes?

Quiz

Quiz 1. What is the key reason to use multiprocessing over multithreading for CPU-bound tasks?

Each process has its own GIL allowing true parallel execution across CPU cores
Processes start faster than threads
Processes share memory making communication easier
Multiprocessing removes the need for locks

Quiz 2. What does ProcessPoolExecutor() with no arguments default to?

The number of CPU cores on the machine
A single worker process
4 worker processes always
An unlimited number of processes

Quiz 3. Why can processes not share variables directly the way threads can?

Each process has its own separate memory space — changes in one do not affect others
Processes run on different machines
The GIL prevents variable sharing between processes
Python variables are immutable across process boundaries

Quiz 4. What is the role of a sentinel value like None in a multiprocessing Queue?

To signal the consumer that no more items will be added and it should stop
To reset the queue back to empty
To raise an exception in the consumer process
To pause the producer until the consumer catches up

Quiz 5. When would you choose threads over processes even for a performance-sensitive task?

When the task is I/O-bound — the GIL is released during I/O waits so threads provide real concurrency
When you need more than 4 parallel workers
When processes are too slow to start up
When the task involves shared mutable state

Next up — Working with APIs: making HTTP requests, handling responses, and consuming REST APIs in Python.

← Previous Course Index Next →

Python Course