AI Lesson 56 – Transfer Learning & Fine-tuning | Dataplexa

Feature Selection Techniques

Feature Selection is the process of choosing the most relevant input features from a dataset and removing unnecessary or redundant ones. While feature engineering creates better features, feature selection focuses on keeping only what truly matters for learning.

Using too many features can confuse a model, slow training, and increase overfitting. Feature selection helps models become simpler, faster, and more accurate.

Real-World Connection

Imagine preparing for a job interview. Studying everything in the world is not useful. You focus only on topics relevant to the job. Feature selection works the same way — it keeps only the information that helps the model make better decisions.

Why Feature Selection Is Important

  • Improves model performance
  • Reduces overfitting
  • Decreases training time
  • Makes models easier to interpret

Main Types of Feature Selection

  • Filter Methods
  • Wrapper Methods
  • Embedded Methods

Filter Methods

Filter methods select features based on statistical measures without using a machine learning model. These methods are fast and scalable.

Common techniques include correlation, variance threshold, and statistical tests.

Example: Variance Threshold


from sklearn.feature_selection import VarianceThreshold
import numpy as np

X = np.array([
    [0, 2, 0],
    [0, 3, 4],
    [0, 4, 1]
])

selector = VarianceThreshold(threshold=1.0)
X_selected = selector.fit_transform(X)

print(X_selected)
  
[[2 0] [3 4] [4 1]]

Understanding the Output

The first column had zero variance and was removed. Only features with meaningful variation were kept.

Wrapper Methods

Wrapper methods evaluate different feature combinations by training a model multiple times. These methods are more accurate but computationally expensive.

Example: Recursive Feature Elimination (RFE)


from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)

model = LogisticRegression(max_iter=1000)
rfe = RFE(model, n_features_to_select=2)
X_rfe = rfe.fit_transform(X, y)

print(X_rfe.shape)
  
(150, 2)

Embedded Methods

Embedded methods perform feature selection during model training. Many models automatically assign importance to features.

Example: Feature Importance Using Random Forest


from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)

model = RandomForestClassifier()
model.fit(X, y)

print(model.feature_importances_)
  
[0.12 0.03 0.45 0.40]

Choosing the Right Method

  • Use filter methods for quick selection
  • Use wrapper methods for small datasets
  • Use embedded methods for balanced performance

Practice Questions

Practice 1: What is the process of choosing important features called?



Practice 2: Which method uses statistical measures?



Practice 3: Feature selection helps reduce what?



Quick Quiz

Quiz 1: Which method trains models repeatedly?





Quiz 2: Which method selects features during training?





Quiz 3: Feature selection keeps which features?





Coming up next: Model Evaluation Metrics — measuring how good your model really is.