GenAI Lesson 9 – Compute Infra | Dataplexa

Compute Infrastructure for Generative AI

Generative AI does not fail because of bad models. It fails when compute is misunderstood.

Most GenAI problems in production are not algorithmic — they are infrastructure problems.

To build real GenAI systems, you must understand how computation, memory, and data movement work together.

Why Compute Matters in GenAI

Generative models perform billions of mathematical operations to produce a single response.

Without the right hardware and system design, even the best model becomes unusable.

This is why compute is a first-class design decision, not an implementation detail.

Core Compute Components

A GenAI system depends on three primary resources:

  • Processing units (CPUs, GPUs, TPUs)
  • Memory (RAM, VRAM)
  • Storage and data pipelines

Each plays a distinct role.

CPUs vs GPUs: Why CPUs Are Not Enough

CPUs are designed for sequential logic and branching.

GenAI workloads require massive parallel computation.

That is where GPUs come in.

Thinking Before Coding

Ask yourself:

Why multiply one number at a time when you can multiply thousands simultaneously?

This is the mental shift GPUs enable.

Sequential vs Parallel Computation


# Sequential computation (CPU-like)

result = []
for i in range(5):
    result.append(i * 2)

print(result)
  

This loop processes one element at a time.

[0, 2, 4, 6, 8]

Now consider parallel thinking.


# Vectorized computation (GPU-like concept)

import numpy as np

data = np.array([0,1,2,3,4])
result = data * 2

print(result)
  

Here, operations happen simultaneously.

[0 2 4 6 8]

GenAI models rely heavily on this parallelism.

Why GPUs Are Essential for Training

During training, models perform:

  • Large matrix multiplications
  • Backpropagation across billions of parameters
  • Gradient updates at scale

GPUs are optimized for these operations.

Training large models on CPUs would take years.

Memory: The Hidden Bottleneck

Compute power alone is not enough.

If the model does not fit into memory, it cannot run efficiently.

Types of Memory in GenAI

  • System RAM (CPU memory)
  • VRAM (GPU memory)
  • Disk storage (datasets, checkpoints)

Large language models can require tens of gigabytes of VRAM.

Why VRAM Matters

Every token generated requires:

  • Model weights
  • Intermediate activations
  • Attention caches

If VRAM runs out, performance collapses.

Inference Compute Is Different

Training and inference have different compute goals.

Inference prioritizes:

  • Low latency
  • High throughput
  • Cost efficiency

This is why inference optimization techniques exist.

Batching: Using Compute Efficiently

Instead of processing one request at a time, systems batch multiple requests together.

Batching Concept Example


# Simulating batched inputs

inputs = ["Hello", "Explain AI", "Write code"]
print(len(inputs))
  

By batching, the GPU works at full capacity.

3

Batching is one of the biggest cost optimizations in production.

Distributed Compute

Single machines are often not enough.

Large models are trained and served across multiple devices.

  • Data parallelism
  • Model parallelism
  • Pipeline parallelism

These techniques split computation intelligently.

Cloud vs On-Premise Compute

Most GenAI systems run in the cloud due to:

  • Elastic scaling
  • Access to powerful GPUs
  • Managed infrastructure

However, cost control becomes critical at scale.

Why Engineers Must Understand Compute

When GenAI systems fail, engineers ask:

  • Is this a memory issue?
  • Is the GPU saturated?
  • Is batching configured correctly?

Understanding compute allows faster debugging and better system design.

Practice

Which hardware is best suited for parallel matrix operations?



Which memory stores model weights during inference?



What technique improves throughput by processing multiple requests together?



Quick Quiz

Why are GPUs preferred for GenAI workloads?





Which phase prioritizes latency and cost?





Which resource often becomes the first bottleneck?





Recap: Compute infrastructure determines whether GenAI systems are scalable, affordable, and reliable.

Next up: We move from hardware to applications — where GenAI creates real business value.