DL Lesson 38 – AlexNet | Dataplexa

AlexNet Architecture

In the previous lesson, we studied LeNet, the first successful convolutional neural network.

Now we move to the architecture that changed the history of deep learning — AlexNet.

AlexNet proved that deep neural networks, when trained correctly, can outperform traditional methods by a huge margin.

What Is AlexNet?

AlexNet is a deep convolutional neural network introduced in 2012 by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton.

It won the ImageNet competition by a large margin and restarted the deep learning revolution.

Why AlexNet Was Revolutionary

Before AlexNet, deep networks were considered too slow and unreliable.

AlexNet showed that:

Deep networks can work GPUs can accelerate training Large datasets improve performance

This changed the mindset of the entire AI community.

Key Differences from LeNet

AlexNet is much deeper than LeNet.

It uses:

More convolution layers ReLU activation instead of tanh Max pooling instead of average pooling Dropout for regularization

These ideas are still used today.

High-Level Architecture

AlexNet follows this flow:

Input → Convolution → ReLU → Pooling → Convolution → ReLU → Pooling → Fully Connected → Output

Each layer extracts more abstract features from the image.

Why ReLU Was a Game Changer

Earlier networks used sigmoid or tanh.

These activations often caused vanishing gradients.

ReLU solves this by allowing faster and more stable learning.

This single change dramatically improved training speed.

AlexNet Layer Overview

AlexNet consists of:

5 convolution layers 3 fully connected layers Over 60 million parameters

At the time, this was considered extremely large.

AlexNet in Code (Simplified)

Below is a simplified AlexNet-style model using modern Keras syntax.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

model = Sequential([
    Conv2D(96, (11,11), strides=4, activation="relu", input_shape=(224,224,3)),
    MaxPooling2D(pool_size=(3,3), strides=2),

    Conv2D(256, (5,5), padding="same", activation="relu"),
    MaxPooling2D(pool_size=(3,3), strides=2),

    Conv2D(384, (3,3), padding="same", activation="relu"),
    Conv2D(384, (3,3), padding="same", activation="relu"),
    Conv2D(256, (3,3), padding="same", activation="relu"),
    MaxPooling2D(pool_size=(3,3), strides=2),

    Flatten(),
    Dense(4096, activation="relu"),
    Dropout(0.5),
    Dense(4096, activation="relu"),
    Dropout(0.5),
    Dense(1000, activation="softmax")
])

This structure follows the original AlexNet design, but adapted to modern frameworks.

Why Dropout Was Important

AlexNet introduced dropout to reduce overfitting.

During training, random neurons are disabled.

This forces the network to learn robust features instead of memorizing data.

Real-World Understanding

Think of AlexNet as a student who learns faster by focusing only on important patterns instead of memorizing everything.

This ability to generalize made AlexNet extremely powerful.

Mini Practice

Think carefully:

Why do you think AlexNet uses very large first-layer filters like 11×11?

Exercises

Exercise 1:
What competition made AlexNet famous?

The ImageNet Large Scale Visual Recognition Challenge.

Exercise 2:
Why did AlexNet use ReLU instead of tanh?

ReLU reduces vanishing gradients and speeds up training.

Quick Quiz

Q1. How many convolution layers does AlexNet have?

Five convolution layers.

Q2. What problem does dropout solve?

Overfitting.

In the next lesson, we will study VGG networks and learn why deeper but simpler architectures can outperform complex designs.

← Previous Lesson DL Index Next ➜