AI Lesson 56 – Transfer Learning & Fine-tuning | Dataplexa

Feature Selection Techniques

Feature Selection is the process of choosing the most relevant input features from a dataset and removing unnecessary or redundant ones. While feature engineering creates better features, feature selection focuses on keeping only what truly matters for learning.

Using too many features can confuse a model, slow training, and increase overfitting. Feature selection helps models become simpler, faster, and more accurate.

Real-World Connection

Imagine preparing for a job interview. Studying everything in the world is not useful. You focus only on topics relevant to the job. Feature selection works the same way — it keeps only the information that helps the model make better decisions.

Why Feature Selection Is Important

Improves model performance
Reduces overfitting
Decreases training time
Makes models easier to interpret

Main Types of Feature Selection

Filter Methods
Wrapper Methods
Embedded Methods

Filter Methods

Filter methods select features based on statistical measures without using a machine learning model. These methods are fast and scalable.

Common techniques include correlation, variance threshold, and statistical tests.

Example: Variance Threshold


from sklearn.feature_selection import VarianceThreshold
import numpy as np

X = np.array([
    [0, 2, 0],
    [0, 3, 4],
    [0, 4, 1]
])

selector = VarianceThreshold(threshold=1.0)
X_selected = selector.fit_transform(X)

print(X_selected)

[[2 0] [3 4] [4 1]]

Understanding the Output

The first column had zero variance and was removed. Only features with meaningful variation were kept.

Wrapper Methods

Wrapper methods evaluate different feature combinations by training a model multiple times. These methods are more accurate but computationally expensive.

Example: Recursive Feature Elimination (RFE)


from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)

model = LogisticRegression(max_iter=1000)
rfe = RFE(model, n_features_to_select=2)
X_rfe = rfe.fit_transform(X, y)

print(X_rfe.shape)

(150, 2)

Embedded Methods

Embedded methods perform feature selection during model training. Many models automatically assign importance to features.

Example: Feature Importance Using Random Forest


from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)

model = RandomForestClassifier()
model.fit(X, y)

print(model.feature_importances_)

[0.12 0.03 0.45 0.40]

Choosing the Right Method

Use filter methods for quick selection
Use wrapper methods for small datasets
Use embedded methods for balanced performance

Practice Questions

Practice 1: What is the process of choosing important features called?

Practice 2: Which method uses statistical measures?

Practice 3: Feature selection helps reduce what?

Quick Quiz

Quiz 1: Which method trains models repeatedly?

Filter
Wrapper
Embedded

Quiz 2: Which method selects features during training?

Filter
Embedded
Manual

Quiz 3: Feature selection keeps which features?

All features
Important features
Random features

Coming up next: Model Evaluation Metrics — measuring how good your model really is.

← Previous Course Index Next →

AI Course