Python Lesson 39 – | Dataplexa

Multiprocessing

In the last lesson you saw that Python's Global Interpreter Lock prevents threads from running CPU-bound code in true parallel. Multiprocessing is the solution. Instead of multiple threads sharing one process, multiprocessing spawns entirely separate processes — each with its own Python interpreter and memory space, each with its own GIL. True parallelism, full use of every CPU core.

This lesson covers Python's multiprocessing module, the ProcessPoolExecutor, inter-process communication, shared memory, and the practical patterns for squeezing maximum performance out of CPU-intensive work.

Processes vs Threads — The Key Differences

  • Threads share memory within one process — lightweight, fast to create, limited by the GIL for CPU work
  • Processes have completely separate memory — heavier to start, immune to the GIL, true parallelism on multiple cores
  • Use threads for I/O-bound work: downloading files, making API calls, reading databases
  • Use processes for CPU-bound work: image processing, number crunching, machine learning preprocessing, compression

Creating Processes

multiprocessing.Process works almost identically to threading.Thread — pass a target function, call start(), then join(). The critical difference is that each process runs in a completely separate memory space.

Real-world use: a video transcoding pipeline spawns one process per video file — each process runs on its own CPU core, transcoding all files simultaneously.

# Creating processes — CPU-bound work in parallel

import multiprocessing
import time

def cpu_task(name, n):
    """Compute-heavy task — sum of squares."""
    result = sum(i * i for i in range(n))
    print(f"[{name}] Result: {result:,}")

if __name__ == "__main__":   # required guard on Windows and macOS
    n = 5_000_000

    # Sequential — one after another
    start = time.perf_counter()
    cpu_task("Task-1", n)
    cpu_task("Task-2", n)
    cpu_task("Task-3", n)
    print(f"Sequential: {time.perf_counter() - start:.2f}s\n")

    # Parallel — all three at once on separate CPU cores
    start = time.perf_counter()
    processes = [
        multiprocessing.Process(target=cpu_task, args=(f"Task-{i}", n))
        for i in range(1, 4)
    ]
    for p in processes: p.start()
    for p in processes: p.join()
    print(f"Parallel: {time.perf_counter() - start:.2f}s")
[Task-1] Result: 41,666,658,333,325,000
[Task-2] Result: 41,666,658,333,325,000
[Task-3] Result: 41,666,658,333,325,000
Sequential: 4.82s

[Task-2] Result: 41,666,658,333,325,000
[Task-1] Result: 41,666,658,333,325,000
[Task-3] Result: 41,666,658,333,325,000
Parallel: 1.74s
  • The if __name__ == "__main__": guard is required on Windows and macOS — without it, spawning new processes causes infinite recursion
  • Each process gets a full copy of the program's memory at the time of spawning — changes in one process do not affect others
  • p.start() launches the process, p.join() waits for it to finish — same pattern as threads
  • Spawning processes is slower than threads — only worthwhile for tasks that take more than a fraction of a second

ProcessPoolExecutor — The Modern Approach

concurrent.futures.ProcessPoolExecutor is the high-level, recommended way to use multiprocessing. It manages a pool of worker processes, distributes work automatically, and collects results cleanly.

Real-world use: processing a dataset of 100,000 images — submit all resize operations to the pool and collect results as they finish, using all available CPU cores automatically.

# ProcessPoolExecutor — high-level process pool

from concurrent.futures import ProcessPoolExecutor, as_completed
import time

def is_prime(n):
    """CPU-bound — check if a number is prime."""
    if n < 2: return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

def count_primes(start, end):
    """Count primes in a range."""
    return sum(1 for n in range(start, end) if is_prime(n))

if __name__ == "__main__":
    ranges = [(0, 250_000), (250_000, 500_000),
              (500_000, 750_000), (750_000, 1_000_000)]

    start = time.perf_counter()
    with ProcessPoolExecutor() as executor:
        futures = [executor.submit(count_primes, s, e) for s, e in ranges]
        results = [f.result() for f in futures]

    total = sum(results)
    print(f"Primes under 1,000,000: {total:,}")
    print(f"Time: {time.perf_counter() - start:.2f}s")
Primes under 1,000,000: 78,498
Time: 0.81s
  • ProcessPoolExecutor() with no argument defaults to the number of CPU cores on the machine
  • executor.submit(fn, *args) schedules work and returns a Future — identical API to ThreadPoolExecutor
  • executor.map(fn, iterable) is a simpler form that returns results in submission order
  • The with block waits for all tasks and shuts down the pool cleanly

Pool.map — Simple Parallel Mapping

For the common pattern of applying one function to many inputs, multiprocessing.Pool.map is the most concise tool. It splits the work across processes and collects results in the original order.

# Pool.map — apply a function to many inputs in parallel

from multiprocessing import Pool
import time

def square(n):
    return n * n

if __name__ == "__main__":
    numbers = list(range(1, 13))

    start = time.perf_counter()
    with Pool() as pool:
        results = pool.map(square, numbers)   # distributes across all cores
    print("Squares:", results)
    print(f"Time: {time.perf_counter() - start:.4f}s")

    # pool.starmap for functions that take multiple arguments
    pairs = [(2, 3), (4, 5), (6, 7)]
    with Pool() as pool:
        results = pool.starmap(pow, pairs)   # [8, 1024, 279936]
    print("Powers:", results)
Squares: [1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144]
Time: 0.0821s
Powers: [8, 1024, 279936]
  • pool.map(fn, iterable) — apply fn to every item; results returned in order
  • pool.starmap(fn, iterable_of_tuples) — like map but unpacks each tuple as separate arguments
  • pool.imap(fn, iterable) — lazy version, yields results one at a time without loading all into memory
  • Always use with Pool() as pool: to ensure the pool is properly terminated

Inter-Process Communication — Queue and Pipe

Processes cannot share variables directly — each has its own memory. To pass data between processes, use Queue (multi-producer, multi-consumer) or Pipe (two-way connection between exactly two processes).

# Queue — safe communication between processes

from multiprocessing import Process, Queue

def producer(q, items):
    for item in items:
        q.put(item)
        print(f"Produced: {item}")
    q.put(None)   # sentinel — signals consumer to stop

def consumer(q):
    while True:
        item = q.get()
        if item is None:
            break
        print(f"Consumed: {item * 2}")

if __name__ == "__main__":
    q = Queue()
    p1 = Process(target=producer, args=(q, [10, 20, 30, 40]))
    p2 = Process(target=consumer, args=(q,))

    p1.start()
    p2.start()
    p1.join()
    p2.join()
Produced: 10
Produced: 20
Produced: 30
Produced: 40
Consumed: 20
Consumed: 40
Consumed: 60
Consumed: 80
  • Queue.put(item) adds to the queue; Queue.get() removes and returns the next item — both are process-safe
  • Use a sentinel value (like None) to signal a consumer that no more items are coming
  • multiprocessing.Pipe() returns a pair of connection objects for two-way communication between exactly two processes

Shared Memory — Value and Array

For simple cases where processes need to share a single number or a fixed-size array, multiprocessing.Value and multiprocessing.Array provide shared memory with process-safe locking.

# Shared memory — Value and Array

from multiprocessing import Process, Value, Array
import ctypes

def increment(counter, n):
    for _ in range(n):
        with counter.get_lock():   # process-safe lock
            counter.value += 1

if __name__ == "__main__":
    counter = Value(ctypes.c_int, 0)   # shared integer, starting at 0
    processes = [Process(target=increment, args=(counter, 50_000)) for _ in range(4)]

    for p in processes: p.start()
    for p in processes: p.join()

    print("Final counter:", counter.value)   # always 200,000
Final counter: 200,000
  • Value(typecode, initial) — shared scalar; use C type codes like ctypes.c_int, ctypes.c_double
  • Array(typecode, size) — shared fixed-size array
  • Always use value.get_lock() as a context manager when modifying shared values from multiple processes
  • For anything more complex, use a Queue or Manager instead

Choosing Between Threads and Processes

Factor Threads Processes
Best for I/O-bound (network, disk, DB) CPU-bound (computation, image processing)
Memory Shared — fast, but needs locking Separate — safe, but needs IPC
GIL impact Limited by GIL for CPU work Each process has its own GIL — true parallelism
Startup cost Fast — milliseconds Slower — tens of milliseconds
Communication Direct shared variables (with locks) Queue, Pipe, Value, Array

Practice Questions

Practice 1. Why does multiprocessing bypass the GIL where multithreading cannot?



Practice 2. Why is the if __name__ == "__main__": guard required when using multiprocessing?



Practice 3. What is the difference between pool.map() and pool.starmap()?



Practice 4. What tool do you use to safely pass data between two processes?



Practice 5. What lock method should you use when modifying a shared Value from multiple processes?



Quiz

Quiz 1. What is the key reason to use multiprocessing over multithreading for CPU-bound tasks?






Quiz 2. What does ProcessPoolExecutor() with no arguments default to?






Quiz 3. Why can processes not share variables directly the way threads can?






Quiz 4. What is the role of a sentinel value like None in a multiprocessing Queue?






Quiz 5. When would you choose threads over processes even for a performance-sensitive task?






Next up — Working with APIs: making HTTP requests, handling responses, and consuming REST APIs in Python.