AI Course
Feature Engineering
Feature Engineering is the process of transforming raw data into meaningful inputs that machine learning models can understand and learn from effectively. Even the most advanced algorithms fail if the input features are poor. In many real-world projects, feature engineering has a bigger impact on model performance than choosing the algorithm itself.
This lesson explains why feature engineering matters, how it is done, and how it improves model accuracy and reliability.
Real-World Connection
Imagine cooking a meal. Raw ingredients alone are not enough — they must be cleaned, chopped, seasoned, and prepared correctly. Feature engineering plays the same role in machine learning by preparing raw data into a form that models can use effectively.
Why Feature Engineering Is Important
- Improves model accuracy
- Reduces noise in data
- Helps models learn patterns faster
- Makes models more interpretable
Common Feature Engineering Techniques
- Handling missing values
- Encoding categorical variables
- Scaling and normalization
- Creating new features
- Feature transformation
Handling Missing Values
Missing data can confuse models. Common strategies include removing missing rows or replacing missing values with statistical measures such as mean, median, or mode.
Example: Handling Missing Values (Python)
import pandas as pd
import numpy as np
data = pd.DataFrame({
"Age": [25, 30, np.nan, 40],
"Salary": [50000, 60000, 55000, np.nan]
})
data_filled = data.fillna(data.mean())
print(data_filled)
Understanding the Output
The missing values are replaced with the average of each column. This allows the model to use all rows instead of discarding data.
Encoding Categorical Variables
Machine learning models work with numbers, not text. Categorical values such as cities or job titles must be converted into numeric form.
Example: One-Hot Encoding
from sklearn.preprocessing import OneHotEncoder
import pandas as pd
data = pd.DataFrame({
"City": ["New York", "London", "Paris"]
})
encoder = OneHotEncoder(sparse=False)
encoded = encoder.fit_transform(data)
print(encoded)
Feature Scaling
When features have different ranges, models may give more importance to larger values. Scaling ensures all features contribute equally.
Example: Standard Scaling
from sklearn.preprocessing import StandardScaler
import numpy as np
X = np.array([[10], [20], [30]])
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
print(X_scaled)
Creating New Features
Sometimes the best features do not exist directly in the dataset. They are created by combining or transforming existing data.
For example, from a date column, you can extract year, month, or day to improve predictions.
Practice Questions
Practice 1: What is the process of transforming raw data into useful inputs called?
Practice 2: Machine learning models work best with which type of data?
Practice 3: What technique ensures all features contribute equally?
Quick Quiz
Quiz 1: Feature engineering mainly improves what?
Quiz 2: Which technique converts categories into binary columns?
Quiz 3: Feature engineering can involve creating what?
Coming up next: Feature Selection — choosing the most important features for better models.