Generative AI Course
Variational Autoencoders (VAE)
In the previous lesson, you learned how autoencoders compress data and reconstruct it through a bottleneck.
That works well for learning representations, but it still leaves one major limitation:
Autoencoders do not truly generate new data.
They reconstruct what they have already seen.
To move from compression to generation, we need a probabilistic approach. This is where Variational Autoencoders come in.
The Core Problem VAEs Solve
Imagine you want to:
- Generate new images
- Create new samples similar to training data
- Explore a smooth latent space
A standard autoencoder cannot do this reliably.
Its latent space is unordered and discontinuous, which makes sampling unpredictable.
VAEs solve this by forcing the latent space to follow a structured probability distribution.
How Engineers Think About VAEs
Engineers do not start by saying:
“Let’s add probability because it sounds advanced.”
They ask:
How can we sample new points and still get meaningful outputs?
VAEs answer this by learning a distribution instead of a single point.
Key Difference: Autoencoder vs VAE
The most important mental shift is this:
- Autoencoder → deterministic encoding
- VAE → probabilistic encoding
Instead of mapping input → one latent vector, VAEs map input → a distribution defined by a mean and variance.
VAE Architecture Overview
A VAE still has an encoder and decoder, but the encoder now outputs:
- Mean (μ)
- Log variance (log σ²)
From these, the model samples a latent vector.
This sampling step is what enables generation.
Why Sampling Is Tricky
Sampling is not directly differentiable, which breaks backpropagation.
VAEs solve this using the reparameterization trick.
This is a critical concept for GenAI, so pay close attention to the logic.
Implementing a Simple VAE
Before coding, define the goal clearly:
We want to encode inputs into a smooth latent distribution and generate new samples from it.
import torch
import torch.nn as nn
We will extend what you already know from autoencoders.
Defining the Encoder
Notice that the encoder now produces two outputs.
class VAE(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 256)
self.fc_mu = nn.Linear(256, 32)
self.fc_logvar = nn.Linear(256, 32)
self.fc2 = nn.Linear(32, 256)
self.fc3 = nn.Linear(256, 784)
Here:
fc_mulearns the meanfc_logvarlearns variance
Reparameterization Trick
Instead of sampling directly, we sample noise and transform it.
def reparameterize(self, mu, logvar):
std = torch.exp(0.5 * logvar)
eps = torch.randn_like(std)
return mu + eps * std
What is happening internally:
- Noise is sampled independently
- Distribution structure is preserved
- Gradients can still flow
Forward Pass Logic
The forward pass connects all parts together.
def forward(self, x):
h = torch.relu(self.fc1(x))
mu = self.fc_mu(h)
logvar = self.fc_logvar(h)
z = self.reparameterize(mu, logvar)
h2 = torch.relu(self.fc2(z))
return self.fc3(h2), mu, logvar
At this stage, the model can both:
- Reconstruct inputs
- Generate new samples
Training Objective
VAEs use a combined loss function:
- Reconstruction loss
- KL divergence
KL divergence forces the latent space to match a normal distribution.
This is what enables smooth interpolation and sampling.
Why VAEs Matter in GenAI
VAEs introduce three critical GenAI ideas:
- Latent space structure
- Probabilistic generation
- Controlled sampling
These ideas directly appear in:
- Diffusion models
- Image generation pipelines
- Modern generative architectures
Practice
What does a VAE encoder output?
Which trick enables backpropagation?
Which term enforces latent regularization?
Quick Quiz
VAEs differ from autoencoders by being:
VAEs organize which space?
Main advantage of VAEs is:
Recap: Variational Autoencoders introduce probabilistic latent spaces that enable true data generation.
Next up: GAN Basics — competing networks and adversarial learning.