AI Lesson 41 – Introduction to Deep Learning | Dataplexa

Feature Engineering

Feature Engineering is the process of transforming raw data into meaningful inputs that help machine learning models perform better. A model is only as good as the features it learns from.

Even the most advanced algorithm can fail if the features are weak, noisy, or irrelevant. Strong feature engineering often improves performance more than changing the model itself.

Why Feature Engineering Matters

Raw data rarely comes in a form that machine learning algorithms can understand directly. Feature engineering bridges the gap between raw data and intelligent predictions.

  • Improves model accuracy
  • Reduces noise and redundancy
  • Helps models learn patterns faster
  • Increases interpretability

Real-World Connection

Consider predicting house prices. Using only raw text like address strings is not helpful. Converting that information into numerical features such as distance to city center, number of rooms, or area size makes prediction possible.

Common Feature Engineering Techniques

  • Handling missing values
  • Encoding categorical variables
  • Feature scaling
  • Creating new features

Handling Missing Values

Missing data can confuse models. One common approach is replacing missing values with the mean or median.


import pandas as pd
import numpy as np

data = pd.DataFrame({
    'age': [25, 30, np.nan, 40],
    'salary': [50000, 60000, 55000, np.nan]
})

data_filled = data.fillna(data.mean())
print(data_filled)
  
age salary 0 25.0 50000.0 1 30.0 60000.0 2 31.7 55000.0 3 40.0 55000.0

Here, missing values are replaced with column averages, making the dataset usable for training.

Encoding Categorical Data

Machine learning models work with numbers, not text. Categorical features must be encoded.


from sklearn.preprocessing import LabelEncoder

cities = ['New York', 'London', 'Paris', 'London']
encoder = LabelEncoder()

encoded = encoder.fit_transform(cities)
print(encoded)
  
[1 0 2 0]

Each city is converted into a numeric label that models can process.

Feature Scaling

Features with large values can dominate others. Scaling brings all features to a similar range.


from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_data = scaler.fit_transform([[1], [10], [100]])

print(scaled_data)
  
[[-0.82] [-0.41] [ 1.22]]

After scaling, features contribute more equally to model training.

Creating New Features

Sometimes combining existing features creates more useful information.


data = pd.DataFrame({
    'length': [10, 20, 30],
    'width': [5, 10, 15]
})

data['area'] = data['length'] * data['width']
print(data)
  
length width area 0 10 5 50 1 20 10 200 2 30 15 450

The new feature “area” captures more meaningful information than length or width alone.

When Feature Engineering Is Critical

  • Small datasets
  • Business-driven predictions
  • Interpretable models
  • Competitive machine learning tasks

Practice Questions

Practice 1: What process converts raw data into useful inputs?



Practice 2: What technique brings features to similar ranges?



Practice 3: Converting text categories into numbers is called?



Quick Quiz

Quiz 1: Machine learning models primarily work with?





Quiz 2: StandardScaler performs which operation?





Quiz 3: Combining existing columns to create better inputs is called?





Coming up next: Feature Selection — choosing the most important features.