Pixels and Images
Before we talk about edges, objects, faces, or deep learning models, we must clearly understand what an image actually is for a computer.
Humans see images as meaningful scenes — people, roads, text, objects. A computer does not see any of that. It only sees numbers arranged in a grid.
This lesson builds the most important foundation in Computer Vision: pixels and how images are represented internally. If this is clear, everything later (OpenCV, CNNs, YOLO, segmentation) becomes much easier.
What Is a Pixel?
A pixel (picture element) is the smallest unit of an image. Each pixel stores information about intensity or color at a specific location.
Think of an image as a chessboard:
- Each square = one pixel
- Each pixel has a value
- All pixels together form the image
If you zoom into any digital image enough, you will eventually see small square blocks. Those blocks are pixels.
Image as a Grid of Numbers
From a computer’s point of view, an image is just a matrix (2D array).
For example, a grayscale image can be written as:
Example: 5 × 5 Grayscale Image
[ [ 0, 30, 80, 120, 255 ], [ 10, 50, 90, 140, 230 ], [ 20, 60, 100, 150, 200 ], [ 30, 80, 130, 180, 160 ], [ 40, 100, 160, 200, 120 ] ]
Each number represents the brightness of a pixel. The position of the number tells the computer where that pixel is located.
Pixel Intensity Values (Grayscale Images)
In a grayscale image:
- Pixel values usually range from 0 to 255
- 0 → pure black
- 255 → pure white
- Values in between → shades of gray
| Pixel Value | Meaning |
|---|---|
| 0 | Black |
| 50 | Dark gray |
| 128 | Medium gray |
| 200 | Light gray |
| 255 | White |
So when a computer processes a grayscale image, it is really processing these numbers.
Color Images: More Than One Number per Pixel
Color images are slightly more complex. Instead of one number per pixel, we usually have three numbers.
The most common format is RGB:
- R → Red channel
- G → Green channel
- B → Blue channel
Each channel again has values from 0 to 255.
Example: One RGB Pixel
( R = 255, G = 0, B = 0 ) → Red ( R = 0, G = 255, B = 0 ) → Green ( R = 0, G = 0, B = 255 ) → Blue ( R = 255, G = 255, B = 255 ) → White
So a color image is actually a 3D array: height × width × 3.
Image Dimensions Explained
When you hear something like:
- Image size = 640 × 480
It means:
- 640 pixels wide
- 480 pixels tall
For a color image:
- 640 × 480 × 3 values
That is 921,600 pixel values for a single image.
Why Pixel Understanding Is Critical in CV
Every Computer Vision operation works by manipulating pixel values:
- Blurring → averaging pixel values
- Edge detection → comparing neighboring pixels
- Thresholding → checking pixel intensity limits
- CNNs → learning patterns from pixel neighborhoods
If pixels change, the image changes. This is why preprocessing matters so much.
Real-Life Analogy
Think of an image like a large Excel sheet:
- Each cell = one pixel
- Each cell contains a number
- Formulas (algorithms) operate on those numbers
Computer Vision is essentially applied mathematics on pixel grids.
Practice Questions
Q1. What is a pixel?
Q2. How many values does one RGB pixel have?
Q3. Why do computers see images as matrices?
Quick Quiz
Q1. In grayscale images, pixel values usually range between?
Q2. A color image is represented as which data structure?
Key Takeaways
- Images are numerical grids, not visual scenes
- Pixels store intensity or color values
- Grayscale images use one value per pixel
- Color images usually use RGB (three values per pixel)
- All CV algorithms work by manipulating pixel values
In the next lesson, we will see how images are stored and represented in different formats, and why representation matters in Computer Vision pipelines.