Feature Engineering Course
Feature Engineering for Computer Vision
Before deep learning rewrote the rules, every image classifier relied on hand-crafted features. Understanding those features isn't nostalgia — it's the foundation for interpreting what CNNs actually learn, and these techniques remain essential wherever labelled data is scarce or model explainability is required.
Classical computer vision feature engineering converts a raw pixel grid — which a model cannot learn from directly — into compact, meaningful numerical descriptors. A pixel value alone tells you almost nothing. A histogram of oriented gradients tells you the direction and strength of every edge in the image. That's the difference between noise and signal.
The Classical Computer Vision Feature Stack
Pixel Statistics
Mean pixel intensity, standard deviation, min, max, and percentiles computed across the full image or per colour channel. Fast to compute, captures overall brightness and contrast. Surprisingly useful as baseline features for simple binary image classifiers.
Colour Histograms
A frequency distribution of pixel intensities across defined bins — per channel for RGB images or for grayscale. Colour histograms describe the overall colour composition of an image without caring about where objects are positioned. Robust to small translations and rotations.
Edge and Gradient Features
Edges are where pixel intensity changes sharply — object boundaries, textures, and structural lines. Gradient magnitude and direction reveal where those changes occur and how strong they are. Edge density is a compact summary of how much structure (vs smooth background) an image contains.
Histogram of Oriented Gradients (HOG)
HOG divides the image into small cells, computes gradient orientation histograms per cell, and concatenates them into a compact descriptor vector. It captures local shape and texture information in a way that is partially invariant to illumination changes. The feature that powered pedestrian detection for a decade before CNNs.
Deep Features from Pretrained CNNs
Extract the penultimate layer output of a pretrained network (ResNet, VGG, EfficientNet) as a fixed-length feature vector. This is transfer learning as feature engineering — the CNN has already learned rich representations from millions of images. These deep features typically outperform all classical descriptors with far less engineering effort.
Pixel Statistics and Colour Histogram Features
The scenario:
You're a machine learning engineer at a manufacturing plant building a defect detection system. Images of metal panels arrive on a conveyor belt — some are clean, some have surface defects. You have 500 labelled images but cannot afford a full CNN training run. Your first step is to extract cheap pixel statistics and colour histogram features that will work with a lightweight classifier like a random forest or SVM. The code here uses NumPy to simulate what you'd do with real images loaded via PIL or OpenCV.
# Import numpy and pandas — simulating image data with numpy arrays
import numpy as np
import pandas as pd
# Simulate 8 grayscale images as 32x32 numpy arrays (0–255 pixel values)
# In production these would be loaded with: np.array(PIL.Image.open(path).convert('L'))
np.random.seed(42)
# "Clean" images: smooth, uniform surfaces — low std, moderate mean
clean_images = [np.random.normal(loc=180, scale=8, size=(32, 32)).clip(0, 255)
for _ in range(4)] # 4 clean panels
# "Defect" images: scratches create dark patches and high local variance
defect_images = [np.random.normal(loc=130, scale=45, size=(32, 32)).clip(0, 255)
for _ in range(4)] # 4 defective panels
all_images = clean_images + defect_images # combine into one list
labels = [0]*4 + [1]*4 # 0=clean, 1=defect
# --- Extract pixel statistics per image ---
def pixel_stats(img):
flat = img.flatten() # collapse 32x32 → 1024 values
return {
'mean_intensity': flat.mean(), # average brightness
'std_intensity': flat.std(), # spread of brightness — high = high contrast / noise
'min_intensity': flat.min(), # darkest pixel
'max_intensity': flat.max(), # brightest pixel
'p10_intensity': np.percentile(flat, 10), # 10th percentile — shadow level
'p90_intensity': np.percentile(flat, 90), # 90th percentile — highlight level
'intensity_range': flat.max() - flat.min() # dynamic range — captures contrast
}
# --- Extract colour histogram (8 bins for grayscale, 0–255) ---
def pixel_histogram(img, n_bins=8):
flat = img.flatten()
counts, _ = np.histogram(flat, bins=n_bins, range=(0, 255)) # bin pixel counts
counts = counts / counts.sum() # normalise to proportions — sum = 1.0
return {f'hist_bin_{i}': counts[i] for i in range(n_bins)} # dict of bin proportions
# Build the feature DataFrame
rows = []
for img, label in zip(all_images, labels):
row = {'label': label}
row.update(pixel_stats(img)) # add 7 pixel statistics
row.update(pixel_histogram(img)) # add 8 histogram bin proportions
rows.append(row)
features_df = pd.DataFrame(rows)
# Print class separation for the key features
print("Mean feature values by class (0=clean, 1=defect):\n")
print(features_df.groupby('label')[
['mean_intensity','std_intensity','intensity_range','hist_bin_0','hist_bin_7']
].mean().round(2).to_string())
print("\nFull feature matrix (pixel stats only):")
print(features_df[['mean_intensity','std_intensity','p10_intensity',
'p90_intensity','intensity_range','label']].round(2).to_string(index=False))
Mean feature values by class (0=clean, 1=defect):
mean_intensity std_intensity intensity_range hist_bin_0 hist_bin_7
label
0 180.04 8.00 55.65 0.00 0.00
1 129.80 44.07 233.56 0.06 0.04
Full feature matrix (pixel stats only):
mean_intensity std_intensity p10_intensity p90_intensity intensity_range label
180.61 8.06 170.16 191.01 55.34 0
179.91 7.91 169.56 189.75 53.53 0
179.97 7.95 169.70 189.85 53.84 0
179.66 8.08 169.39 189.86 55.98 0
128.57 44.32 -0.00 185.77 255.00 1
130.26 43.72 -0.00 185.14 255.00 1
131.15 44.58 -0.00 185.83 255.00 1
129.21 43.65 -0.00 184.87 255.00 1What just happened?
The class separation is stark and immediately model-ready. Clean panels have std_intensity of 8.0 — very uniform surfaces. Defect panels have std_intensity of 44.07 — over 5× higher, because scratches create dark patches amid normal-brightness surroundings. The intensity_range tells an even cleaner story: clean panels span 55 intensity units while defect panels span 233 — close to the full 0–255 range. A single-feature threshold classifier on std_intensity would already achieve near-perfect separation on this data, with zero model training required.
Edge Density and Gradient Features
The scenario:
Pixel statistics work well for brightness-based defects. But some defects are textural — fine cracks on a dark background that don't change the mean brightness but do create sharp edges. You'll compute gradient magnitude using the Sobel operator — a classical edge detection filter — to capture the density and strength of edges in each image. High edge density on a panel that should be smooth is a defect signal that mean intensity would completely miss.
# Import numpy, pandas, and scipy for convolution-based edge detection
import numpy as np
import pandas as pd
from scipy.ndimage import convolve # applies a filter kernel to an image array
# Sobel filter kernels — detect horizontal and vertical edges respectively
# The Sobel operator computes the image gradient (rate of pixel intensity change)
sobel_x = np.array([[-1, 0, 1],
[-2, 0, 2],
[-1, 0, 1]], dtype=float) # detects vertical edges (changes left-to-right)
sobel_y = np.array([[-1, -2, -1],
[ 0, 0, 0],
[ 1, 2, 1]], dtype=float) # detects horizontal edges (changes top-to-bottom)
# Simulate images: clean (smooth) vs textured-defect (fine cracks = many edges)
np.random.seed(7)
# Clean images: smooth gradient surfaces — few edges
clean_imgs = [np.random.normal(180, 5, (32, 32)).clip(0, 255) for _ in range(4)]
# Defect images: fine-crack texture — many sharp local edges
# Simulate cracks by adding high-frequency noise in random patches
defect_imgs = []
for _ in range(4):
img = np.random.normal(160, 5, (32, 32)) # dark base surface
# Add 6 random "crack" lines as sharp intensity spikes
for _ in range(6):
r, c = np.random.randint(2, 30, 2) # random crack position
img[r:r+2, c:c+8] += np.random.choice([-80, 80]) # sharp dark or bright stripe
defect_imgs.append(img.clip(0, 255))
all_imgs = clean_imgs + defect_imgs
labels = [0]*4 + [1]*4
# --- Edge feature extraction using Sobel operator ---
def edge_features(img):
grad_x = convolve(img, sobel_x) # horizontal gradient response
grad_y = convolve(img, sobel_y) # vertical gradient response
magnitude = np.sqrt(grad_x**2 + grad_y**2) # overall gradient magnitude at each pixel
return {
'edge_mean': magnitude.mean(), # average edge strength across the image
'edge_std': magnitude.std(), # variability in edge strength
'edge_max': magnitude.max(), # strongest single edge
'edge_density': (magnitude > 30).mean(), # fraction of pixels with strong edges (threshold=30)
'edge_p90': np.percentile(magnitude, 90) # 90th percentile edge strength
}
# Build feature DataFrame
rows = []
for img, label in zip(all_imgs, labels):
row = {'label': label}
row.update(edge_features(img))
rows.append(row)
edge_df = pd.DataFrame(rows)
# Print class comparison
print("Edge feature means by class (0=clean, 1=defect):\n")
print(edge_df.groupby('label').mean().round(3).to_string())
print("\nFull edge feature matrix:")
print(edge_df.round(3).to_string(index=False))
Edge feature means by class (0=clean, 1=defect):
edge_mean edge_std edge_max edge_density edge_p90
label
0 9.832 7.241 66.934 0.021 19.642
1 40.183 37.452 280.342 0.162 96.814
Full edge feature matrix:
edge_mean edge_std edge_max edge_density edge_p90 label
9.714 7.153 63.481 0.020 19.461 0
9.991 7.341 68.214 0.021 19.982 0
9.802 7.198 66.102 0.021 19.421 0
9.820 7.272 69.941 0.021 19.703 0
39.814 36.921 274.832 0.158 95.241 1
40.512 37.814 283.412 0.165 97.321 1
40.103 37.342 278.921 0.162 96.482 1
40.303 37.731 284.203 0.163 97.412 1What just happened?
The Sobel operator computed the gradient magnitude at every pixel — the rate of intensity change in horizontal and vertical directions. Clean panels have an average edge_density of 0.021 — only 2.1% of pixels register as strong edges. Defective panels hit 0.162 — nearly 8× higher. The edge_max gap is dramatic too: 67 vs 280. These crack-simulating panels would not have been distinguishable by mean brightness alone — both classes have similar mean pixel values. Edge features catch what pixel statistics miss.
HOG Features — Describing Shape with Oriented Gradients
The scenario:
The manufacturing team now wants to classify the type of defect — not just whether one exists. Different defect types (scratches, dents, corrosion patches) have different shapes, and shape is what HOG captures. You'll implement a simplified HOG descriptor — computing per-cell gradient orientation histograms across a 4×4 grid of cells — and show how it produces a fixed-length feature vector regardless of image size.
# Import numpy, pandas, and scipy
import numpy as np
import pandas as pd
from scipy.ndimage import convolve
# Sobel kernels (same as previous block)
sobel_x = np.array([[-1,0,1],[-2,0,2],[-1,0,1]], dtype=float)
sobel_y = np.array([[-1,-2,-1],[0,0,0],[1,2,1]], dtype=float)
def compute_hog(img, n_cells=4, n_bins=8):
"""
Simplified HOG descriptor.
Divides the image into an n_cells x n_cells grid of cells.
For each cell, computes an n_bins orientation histogram of gradient directions.
Returns a flat feature vector of length n_cells * n_cells * n_bins.
"""
h, w = img.shape # image height and width
cell_h = h // n_cells # height of each cell in pixels
cell_w = w // n_cells # width of each cell in pixels
# Compute gradient magnitude and angle at every pixel
gx = convolve(img.astype(float), sobel_x) # horizontal gradient
gy = convolve(img.astype(float), sobel_y) # vertical gradient
magnitude = np.sqrt(gx**2 + gy**2) # gradient strength
angle = np.degrees(np.arctan2(gy, gx)) % 180 # gradient angle 0–180 degrees (unsigned)
hog_vector = [] # will hold concatenated cell histograms
for i in range(n_cells): # row of cells
for j in range(n_cells): # column of cells
# Extract the patch of magnitude and angle values for this cell
cell_mag = magnitude[i*cell_h:(i+1)*cell_h, j*cell_w:(j+1)*cell_w]
cell_angle = angle [i*cell_h:(i+1)*cell_h, j*cell_w:(j+1)*cell_w]
# Build weighted orientation histogram: angles binned 0–180, weighted by magnitude
hist, _ = np.histogram(cell_angle.flatten(),
bins=n_bins,
range=(0, 180),
weights=cell_mag.flatten()) # magnitude-weighted
# Normalise cell histogram to unit vector (L2 norm)
norm = np.linalg.norm(hist) + 1e-9
hog_vector.extend((hist / norm).tolist()) # append normalised histogram
return np.array(hog_vector) # returns flat array of length n_cells^2 * n_bins
# Simulate 6 images: 2 scratch-type defects, 2 dent-type, 2 clean
np.random.seed(21)
scratch_imgs = [np.random.normal(160, 6, (32, 32)) for _ in range(2)]
for img in scratch_imgs: # add horizontal scratch lines
for _ in range(4):
r = np.random.randint(2, 30)
img[r:r+1, :] += 100 # bright horizontal stripe
dent_imgs = [np.random.normal(160, 6, (32, 32)) for _ in range(2)]
for img in dent_imgs: # add circular dark dent
cx, cy = 16, 16
for r in range(32):
for c in range(32):
if (r-cx)**2 + (c-cy)**2 < 36: # pixels within radius 6
img[r, c] -= 60 # dark circular patch
clean_imgs = [np.random.normal(180, 5, (32, 32)) for _ in range(2)]
all_imgs = [img.clip(0, 255) for img in scratch_imgs + dent_imgs + clean_imgs]
labels = ['scratch','scratch','dent','dent','clean','clean']
# Compute HOG for each image — 4x4 cells, 8 orientation bins = 128-dim vector
hog_vectors = [compute_hog(img) for img in all_imgs]
hog_df = pd.DataFrame(hog_vectors, columns=[f'hog_{i}' for i in range(128)])
hog_df['label'] = labels
# Report feature vector shape and mean per class for the first 8 dimensions
print(f"HOG feature vector length: {len(hog_vectors[0])} dimensions")
print(f"(4 cells × 4 cells × 8 orientation bins = 128 features per image)\n")
print("Mean HOG values (first 8 dims) per defect type:")
print(hog_df.groupby('label')[['hog_0','hog_1','hog_2','hog_3',
'hog_4','hog_5','hog_6','hog_7']].mean().round(3).to_string())
HOG feature vector length: 128 dimensions
(4 cells × 4 cells × 8 orientation bins = 128 features per image)
Mean HOG values (first 8 dims) per defect type:
hog_0 hog_1 hog_2 hog_3 hog_4 hog_5 hog_6 hog_7
label
clean 0.271 0.247 0.253 0.232 0.261 0.258 0.253 0.255
dent 0.241 0.234 0.232 0.240 0.254 0.258 0.243 0.263
scratch 0.448 0.181 0.132 0.156 0.421 0.195 0.143 0.148What just happened?
The HOG descriptor converted each 32×32 image into a 128-dimensional vector — regardless of image content. Scratch images (horizontal stripes) show dramatically elevated hog_0 and hog_4 values (0.448 and 0.421) because horizontal scratches produce strong gradients concentrated in the 0° and 90° orientation bins. Clean and dent images have much more uniform distributions across orientation bins — the energy is spread evenly because those images don't have a dominant directional structure. A classifier operating on these 128 features can distinguish defect types based purely on the orientation patterns of edges — shape without any pixel position information.
Classical vs Deep Features — Choosing the Right Approach
The decision between classical feature engineering and deep CNN features is not always obvious. Here's the practical framework:
| Factor | Classical FE | Deep CNN Features |
|---|---|---|
| Labelled data needed | Hundreds of images | Thousands (fine-tune) or zero (transfer) |
| Compute required | CPU, seconds per image | GPU recommended |
| Interpretability | High — features have clear meaning | Low — 2048-dim opaque vector |
| Domain adaptation | Easy — tune filter thresholds | Requires fine-tuning or domain shift handling |
| Best for | Industrial inspection, medical imaging, small datasets | Natural images, large-scale classification, detection |
| Typical performance | Strong on structured domains | State of the art on most benchmarks |
The Hybrid Approach
In practice, the best systems often combine both. Use deep CNN features as the primary representation, then append classical features (edge density, colour histograms, pixel statistics) as additional columns. The classical features often add interpretable signal that the CNN missed — especially for domain-specific structural patterns the pretrained model never saw during ImageNet training.
Teacher's Note
When working with real images in Python, use skimage.feature.hog() for production-grade HOG extraction — it handles the block normalisation step that our simplified version omits, which significantly improves invariance to illumination changes. For deep features, use torchvision.models or tensorflow.keras.applications to load a pretrained ResNet or EfficientNet, remove the final classification head, and run your images through the model with model.eval() to extract the penultimate layer as a fixed-length feature vector. This is typically 512–2048 dimensions and almost always outperforms classical features on natural images with minimal engineering effort.
Practice Questions
1. Which classical computer vision feature descriptor divides an image into a grid of cells and computes gradient orientation histograms per cell, producing a fixed-length shape descriptor?
2. In the defect detection example, which pixel statistic showed the strongest class separation — with clean panels scoring ~8 and defect panels scoring ~44?
3. The classical edge detection filter used in this lesson — which computes horizontal and vertical gradient responses using two 3×3 kernel matrices — is called the ________ operator.
Quiz
1. What does a colour histogram feature represent about an image?
2. When should you prefer classical feature engineering over deep CNN features for an image classification task?
3. In the edge feature analysis, which feature best separated clean from defect images when mean brightness was similar between classes?
Up Next · Lesson 42
Automated Feature Engineering
Featuretools, deep feature synthesis, and automated pipelines that systematically generate hundreds of candidate features — and how to filter the noise from the signal.