Image Representation
In the previous lesson, you learned that computers see images as grids of pixel values. Now we go one level deeper and answer an important question:
How exactly are images represented, stored, and interpreted inside a computer system?
This lesson builds the bridge between raw pixels and real Computer Vision operations. Understanding image representation is critical before working with OpenCV, filters, CNNs, or deep learning models.
Why Image Representation Matters
The same image can be represented in different ways, and each representation is useful for a specific purpose.
- Some representations are better for processing speed
- Some preserve color information better
- Some are optimized for storage or compression
If representation is wrong or misunderstood, algorithms may fail or give poor results.
Grayscale Image Representation
A grayscale image is the simplest form of image representation.
- Each pixel is represented by one number
- Typical range: 0 to 255
- Stored as a 2D matrix
Conceptual Grayscale Representation
Image (5 × 5): [ [ 0, 40, 90, 150, 255 ], [ 20, 60, 110, 170, 240 ], [ 30, 80, 130, 190, 210 ], [ 50, 100, 160, 200, 180 ], [ 70, 120, 170, 220, 160 ] ]
This format is widely used in:
- Edge detection
- Thresholding
- Shape analysis
- Medical imaging
Color Image Representation (RGB Model)
Color images are represented using multiple channels. The most common format is RGB.
- Each pixel has 3 values: Red, Green, Blue
- Each value ranges from 0 to 255
- Stored as a 3D matrix
RGB Pixel Representation
Pixel at (x, y): [ R, G, B ] Example: [ 120, 200, 80 ]
So an image of size H × W becomes:
- H × W × 3 numeric values
This representation preserves full color information but requires more memory.
Channel-wise Representation
Instead of viewing an RGB image as one unit, a computer often separates it into three independent channels.
- Red channel → intensity of red color
- Green channel → intensity of green color
- Blue channel → intensity of blue color
Each channel itself is a grayscale image.
This separation allows algorithms to:
- Process specific color components
- Detect patterns hidden in a single channel
- Improve accuracy in vision tasks
Image Representation vs Image Format
Many beginners confuse image representation with image file formats. They are not the same.
| Concept | Meaning |
|---|---|
| Image Representation | How pixel data is arranged in memory (arrays, channels) |
| Image Format | How the image is stored on disk (JPEG, PNG, BMP) |
Once an image is loaded into memory, all formats are converted into numeric arrays.
Common Image Data Types
Pixel values are stored using specific data types:
- uint8 → most common (0–255)
- float32 → normalized images (0.0–1.0)
- int → rarely used in practice
Deep learning models often require normalized float values, while classic CV uses uint8.
Why Normalization Is Often Required
Sometimes pixel values are converted from:
- 0–255 → 0.0–1.0
This process is called normalization.
Normalization helps:
- Improve numerical stability
- Speed up learning in neural networks
- Prevent dominance of large values
Real-World Analogy
Think of image representation like this:
- The image file is like a ZIP file
- Image representation is the unzipped data
- Algorithms work only on unzipped numeric data
No algorithm works directly on “pictures” — only on numbers.
Practice Questions
Q1. How is a grayscale image represented in memory?
Q2. Why are RGB images considered 3D arrays?
Q3. What happens to image formats after loading into memory?
Quick Quiz
Q1. Which representation is best for edge detection?
Q2. What does normalization usually convert pixel values into?
Key Takeaways
- Images are stored as numeric arrays in memory
- Grayscale images use one value per pixel
- Color images use multiple channels (RGB)
- Image representation is different from file format
- All CV algorithms operate on numeric pixel representations
In the next lesson, we will explore image transformations — how resizing, rotating, and scaling images affect pixel representation.