AI Lesson 42 – Neural Network Basics | Dataplexa

Feature Selection

Feature Selection is the process of choosing the most important input features for a machine learning model and removing unnecessary or irrelevant ones. While feature engineering creates better features, feature selection decides which features actually matter.

Using too many features can confuse a model, slow training, and reduce accuracy. A smaller, well-chosen feature set often performs better than a large noisy one.

Why Feature Selection Is Important

Not all features contribute equally to predictions. Some features add noise, some repeat the same information, and some have no relationship with the target at all.

  • Improves model accuracy
  • Reduces overfitting
  • Speeds up training
  • Makes models easier to understand

Real-World Connection

Imagine predicting employee salary. Useful features may include experience and skills, while irrelevant features like employee ID or email address add no value. Feature selection removes such useless inputs.

Types of Feature Selection Methods

  • Filter methods
  • Wrapper methods
  • Embedded methods

Filter Method: Correlation

Filter methods evaluate features using statistical measures. Correlation shows how strongly a feature is related to the target variable.


import pandas as pd

data = pd.DataFrame({
    'experience': [1, 2, 3, 4, 5],
    'education': [10, 12, 14, 16, 18],
    'salary': [30, 40, 50, 60, 70]
})

correlation = data.corr()
print(correlation)
  
experience education salary experience 1.0 1.0 1.0 education 1.0 1.0 1.0 salary 1.0 1.0 1.0

Higher correlation values indicate stronger relationships. Features with very low correlation can often be removed.

Wrapper Method: Recursive Feature Elimination (RFE)

Wrapper methods use a machine learning model to evaluate feature importance. Recursive Feature Elimination repeatedly removes the weakest features.


from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFE

X, y = load_iris(return_X_y=True)
model = LogisticRegression(max_iter=200)

rfe = RFE(model, n_features_to_select=2)
rfe.fit(X, y)

print(rfe.support_)
  
[ True False True False]

The output shows which features were selected. Only the most useful ones remain.

Embedded Method: Feature Importance

Embedded methods perform feature selection during model training. Tree-based models automatically rank features by importance.


from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(X, y)

print(model.feature_importances_)
  
[0.12 0.04 0.68 0.16]

Higher values indicate more important features. Low-importance features can be removed safely.

Feature Selection vs Feature Engineering

  • Feature engineering creates new features
  • Feature selection removes unnecessary features
  • Both work together to improve model performance

Practice Questions

Practice 1: What process removes irrelevant features?



Practice 2: Which metric measures relationship strength?



Practice 3: Which method removes weakest features recursively?



Quick Quiz

Quiz 1: Removing irrelevant features helps reduce?





Quiz 2: RFE belongs to which feature selection type?





Quiz 3: Tree models select features based on?





Coming up next: Model Evaluation Metrics — measuring how good your model really is.