AlexNet Architecture
In the previous lesson, we studied LeNet, the first successful convolutional neural network.
Now we move to the architecture that changed the history of deep learning — AlexNet.
AlexNet proved that deep neural networks, when trained correctly, can outperform traditional methods by a huge margin.
What Is AlexNet?
AlexNet is a deep convolutional neural network introduced in 2012 by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton.
It won the ImageNet competition by a large margin and restarted the deep learning revolution.
Why AlexNet Was Revolutionary
Before AlexNet, deep networks were considered too slow and unreliable.
AlexNet showed that:
Deep networks can work GPUs can accelerate training Large datasets improve performance
This changed the mindset of the entire AI community.
Key Differences from LeNet
AlexNet is much deeper than LeNet.
It uses:
More convolution layers ReLU activation instead of tanh Max pooling instead of average pooling Dropout for regularization
These ideas are still used today.
High-Level Architecture
AlexNet follows this flow:
Input → Convolution → ReLU → Pooling → Convolution → ReLU → Pooling → Fully Connected → Output
Each layer extracts more abstract features from the image.
Why ReLU Was a Game Changer
Earlier networks used sigmoid or tanh.
These activations often caused vanishing gradients.
ReLU solves this by allowing faster and more stable learning.
This single change dramatically improved training speed.
AlexNet Layer Overview
AlexNet consists of:
5 convolution layers 3 fully connected layers Over 60 million parameters
At the time, this was considered extremely large.
AlexNet in Code (Simplified)
Below is a simplified AlexNet-style model using modern Keras syntax.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
model = Sequential([
Conv2D(96, (11,11), strides=4, activation="relu", input_shape=(224,224,3)),
MaxPooling2D(pool_size=(3,3), strides=2),
Conv2D(256, (5,5), padding="same", activation="relu"),
MaxPooling2D(pool_size=(3,3), strides=2),
Conv2D(384, (3,3), padding="same", activation="relu"),
Conv2D(384, (3,3), padding="same", activation="relu"),
Conv2D(256, (3,3), padding="same", activation="relu"),
MaxPooling2D(pool_size=(3,3), strides=2),
Flatten(),
Dense(4096, activation="relu"),
Dropout(0.5),
Dense(4096, activation="relu"),
Dropout(0.5),
Dense(1000, activation="softmax")
])
This structure follows the original AlexNet design, but adapted to modern frameworks.
Why Dropout Was Important
AlexNet introduced dropout to reduce overfitting.
During training, random neurons are disabled.
This forces the network to learn robust features instead of memorizing data.
Real-World Understanding
Think of AlexNet as a student who learns faster by focusing only on important patterns instead of memorizing everything.
This ability to generalize made AlexNet extremely powerful.
Mini Practice
Think carefully:
Why do you think AlexNet uses very large first-layer filters like 11×11?
Exercises
Exercise 1:
What competition made AlexNet famous?
Exercise 2:
Why did AlexNet use ReLU instead of tanh?
Quick Quiz
Q1. How many convolution layers does AlexNet have?
Q2. What problem does dropout solve?
In the next lesson, we will study VGG networks and learn why deeper but simpler architectures can outperform complex designs.