AI Course
Feature Selection Techniques
Feature Selection is the process of choosing the most relevant input features from a dataset and removing unnecessary or redundant ones. While feature engineering creates better features, feature selection focuses on keeping only what truly matters for learning.
Using too many features can confuse a model, slow training, and increase overfitting. Feature selection helps models become simpler, faster, and more accurate.
Real-World Connection
Imagine preparing for a job interview. Studying everything in the world is not useful. You focus only on topics relevant to the job. Feature selection works the same way — it keeps only the information that helps the model make better decisions.
Why Feature Selection Is Important
- Improves model performance
- Reduces overfitting
- Decreases training time
- Makes models easier to interpret
Main Types of Feature Selection
- Filter Methods
- Wrapper Methods
- Embedded Methods
Filter Methods
Filter methods select features based on statistical measures without using a machine learning model. These methods are fast and scalable.
Common techniques include correlation, variance threshold, and statistical tests.
Example: Variance Threshold
from sklearn.feature_selection import VarianceThreshold
import numpy as np
X = np.array([
[0, 2, 0],
[0, 3, 4],
[0, 4, 1]
])
selector = VarianceThreshold(threshold=1.0)
X_selected = selector.fit_transform(X)
print(X_selected)
Understanding the Output
The first column had zero variance and was removed. Only features with meaningful variation were kept.
Wrapper Methods
Wrapper methods evaluate different feature combinations by training a model multiple times. These methods are more accurate but computationally expensive.
Example: Recursive Feature Elimination (RFE)
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
model = LogisticRegression(max_iter=1000)
rfe = RFE(model, n_features_to_select=2)
X_rfe = rfe.fit_transform(X, y)
print(X_rfe.shape)
Embedded Methods
Embedded methods perform feature selection during model training. Many models automatically assign importance to features.
Example: Feature Importance Using Random Forest
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
model = RandomForestClassifier()
model.fit(X, y)
print(model.feature_importances_)
Choosing the Right Method
- Use filter methods for quick selection
- Use wrapper methods for small datasets
- Use embedded methods for balanced performance
Practice Questions
Practice 1: What is the process of choosing important features called?
Practice 2: Which method uses statistical measures?
Practice 3: Feature selection helps reduce what?
Quick Quiz
Quiz 1: Which method trains models repeatedly?
Quiz 2: Which method selects features during training?
Quiz 3: Feature selection keeps which features?
Coming up next: Model Evaluation Metrics — measuring how good your model really is.