Introduction to Recurrent Neural Networks (RNNs)
So far in deep learning, most models we studied were designed for fixed-size inputs. Images, tabular data, and feature vectors all fit neatly into that category.
However, many real-world problems are not static. They involve sequences where order matters.
Recurrent Neural Networks (RNNs) were created to solve exactly this class of problems.
Why Standard Neural Networks Fail for Sequences
Consider tasks like:
Understanding a sentence, Predicting the next word in a paragraph, Analyzing stock prices over time, Recognizing speech from audio signals.
In all these cases, the meaning of the current input depends on what came before it.
A standard neural network treats every input independently. It has no memory of past inputs.
This is the fundamental limitation RNNs were designed to overcome.
The Core Idea Behind RNNs
An RNN processes data step by step.
At each time step, it takes:
The current input, And information from previous steps.
This information from the past is called the hidden state.
The hidden state acts like memory, allowing the network to retain context across time.
How an RNN Thinks (Intuition)
Imagine reading this sentence:
"The movie was surprisingly"
Before seeing the next word, your brain already expects something like "good" or "bad".
That expectation comes from remembering the words you just read.
An RNN behaves similarly by carrying forward information from previous time steps.
Basic RNN Structure
An RNN cell has three main components:
Input at time t, Hidden state from time t−1, Output at time t.
At every step, the same weights are reused. This weight sharing is what allows RNNs to generalize across sequence lengths.
Mathematical View (Simplified)
The hidden state update can be written as:
h_t = tanh(W_h * h_(t-1) + W_x * x_t + b)
Here:
x_t is the current input,
h_(t-1) is the previous hidden state,
h_t is the updated hidden state.
The tanh activation helps keep values bounded
and stable during training.
Where RNNs Are Used in Practice
RNNs are foundational models for:
Language modeling and text generation, Speech recognition systems, Time-series forecasting, Sequence classification problems.
Although newer architectures exist, understanding RNNs is essential to understand modern sequence models.
First RNN Example (Keras)
Below is a minimal RNN layer definition using Keras.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
model = Sequential()
model.add(SimpleRNN(64, input_shape=(None, 10)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
This model processes a sequence of vectors with length 10 and learns temporal relationships between them.
Understanding the Code
The SimpleRNN layer processes input sequences one step at a time.
The hidden state is updated internally and passed forward automatically.
The Dense layer converts the final hidden state into a prediction.
Limitations of Basic RNNs
While RNNs introduced memory into neural networks, they are not perfect.
As sequences grow longer, basic RNNs struggle to remember information from far in the past.
This leads to training difficulties known as vanishing and exploding gradients.
These issues motivated the development of more advanced architectures, which we will explore next.
Exercises
Exercise 1:
Why are RNNs better suited for sequential data than standard neural networks?
Exercise 2:
What role does the hidden state play in an RNN?
Quick Check
Q: Can an RNN handle variable-length sequences?
RNNs introduced the concept of memory into deep learning, but they are only the starting point.
In the next lesson, we will examine why basic RNNs struggle with long sequences and how gradient problems emerge during training.