GenAI Lesson 35 – GPT | Dataplexa

GPT Architecture: Inside Modern Large Language Models

GPT models are built to generate language by predicting what comes next.

Unlike encoder-based models, GPT does not try to understand everything first.

Understanding emerges naturally as a result of generation.

The Core Design Philosophy

GPT follows a simple but powerful rule:

Given everything so far, predict the next token.

This single objective drives all capabilities: reasoning, coding, summarization, and dialogue.

How GPT Processes Input

A GPT model receives:

A sequence of tokens
Position information
A causal attention mask

There is no separation between input and output.

Everything is treated as one growing sequence.

Causal Self-Attention Explained

Causal attention ensures the model never sees the future.

Each token attends only to:

Itself
All previous tokens

This is what makes GPT autoregressive.

Why This Matters

Because the model cannot cheat by seeing future tokens, it learns real language structure.

This makes generation coherent and controllable.

Thinking Before Writing Code

Before implementing GPT-style models, engineers decide:

Maximum context window
Embedding dimension
Number of layers
Number of attention heads

These choices directly affect: cost, latency, and model capability.

Minimal GPT Block Structure

This code shows the core building block used repeatedly in GPT.


import torch
import torch.nn as nn

class GPTBlock(nn.Module):
    def __init__(self, dim, heads):
        super().__init__()
        self.attn = nn.MultiheadAttention(dim, heads)
        self.ff = nn.Sequential(
            nn.Linear(dim, dim * 4),
            nn.GELU(),
            nn.Linear(dim * 4, dim)
        )
        self.ln1 = nn.LayerNorm(dim)
        self.ln2 = nn.LayerNorm(dim)

    def forward(self, x, mask):
        attn_out, _ = self.attn(x, x, x, attn_mask=mask)
        x = self.ln1(x + attn_out)
        ff_out = self.ff(x)
        x = self.ln2(x + ff_out)
        return x

What Happens Inside This Block

Step-by-step behavior:

Self-attention mixes past context
Causal mask blocks future tokens
Residual connections preserve information
Feed-forward layers expand representation

This block is stacked dozens of times in real GPT models.

Token Prediction Head

After passing through all layers, GPT maps hidden states to vocabulary logits.


logits = hidden_states @ vocab_embedding.T

Each logit represents how likely a token is to appear next.

How Generation Happens

During inference:

One token is predicted
That token is appended
The process repeats

This loop continues until stopping conditions are met.

Sampling Strategies Matter

GPT does not always pick the highest-probability token.

Sampling strategies control creativity.


next_token = sample(logits, temperature=0.8, top_p=0.9)

Temperature and top-p affect diversity and coherence.

Why GPT Is Good at Coding

Code is also a sequence.

GPT learns:

Syntax patterns
Logical structure
Long-range dependencies

This enables autocomplete, refactoring, and debugging.

Real-World Systems Built on GPT

Chat assistants
Code copilots
Document generators
AI agents

Most modern GenAI products rely on this architecture.

Limitations to Be Aware Of

Hallucinations
Context length limits
High compute cost

These limitations shape system design choices.

How Learners Should Practice

To internalize GPT architecture:

Experiment with small models locally
Visualize attention patterns
Modify sampling parameters

Practice focuses on understanding behavior, not memorization.

Practice

What does GPT predict at each step?

What type of attention mask does GPT use?

GPT generation is described as what process?

Quick Quiz

GPT belongs to which architecture?

Decoder-only
Encoder-only
Hybrid

GPT primarily focuses on?

Text generation
Classification
Clustering

What controls creativity in GPT outputs?

Sampling strategy
Number of layers
Optimizer

Recap: GPT uses decoder-only, causal attention to generate language token by token.

Next up: Tokenization — how text becomes numbers inside LLMs.

← Previous Course Index Next →

Generative AI Course