AI Course
Feature Selection
Feature Selection is the process of choosing the most important input features for a machine learning model and removing unnecessary or irrelevant ones. While feature engineering creates better features, feature selection decides which features actually matter.
Using too many features can confuse a model, slow training, and reduce accuracy. A smaller, well-chosen feature set often performs better than a large noisy one.
Why Feature Selection Is Important
Not all features contribute equally to predictions. Some features add noise, some repeat the same information, and some have no relationship with the target at all.
- Improves model accuracy
- Reduces overfitting
- Speeds up training
- Makes models easier to understand
Real-World Connection
Imagine predicting employee salary. Useful features may include experience and skills, while irrelevant features like employee ID or email address add no value. Feature selection removes such useless inputs.
Types of Feature Selection Methods
- Filter methods
- Wrapper methods
- Embedded methods
Filter Method: Correlation
Filter methods evaluate features using statistical measures. Correlation shows how strongly a feature is related to the target variable.
import pandas as pd
data = pd.DataFrame({
'experience': [1, 2, 3, 4, 5],
'education': [10, 12, 14, 16, 18],
'salary': [30, 40, 50, 60, 70]
})
correlation = data.corr()
print(correlation)
Higher correlation values indicate stronger relationships. Features with very low correlation can often be removed.
Wrapper Method: Recursive Feature Elimination (RFE)
Wrapper methods use a machine learning model to evaluate feature importance. Recursive Feature Elimination repeatedly removes the weakest features.
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFE
X, y = load_iris(return_X_y=True)
model = LogisticRegression(max_iter=200)
rfe = RFE(model, n_features_to_select=2)
rfe.fit(X, y)
print(rfe.support_)
The output shows which features were selected. Only the most useful ones remain.
Embedded Method: Feature Importance
Embedded methods perform feature selection during model training. Tree-based models automatically rank features by importance.
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X, y)
print(model.feature_importances_)
Higher values indicate more important features. Low-importance features can be removed safely.
Feature Selection vs Feature Engineering
- Feature engineering creates new features
- Feature selection removes unnecessary features
- Both work together to improve model performance
Practice Questions
Practice 1: What process removes irrelevant features?
Practice 2: Which metric measures relationship strength?
Practice 3: Which method removes weakest features recursively?
Quick Quiz
Quiz 1: Removing irrelevant features helps reduce?
Quiz 2: RFE belongs to which feature selection type?
Quiz 3: Tree models select features based on?
Coming up next: Model Evaluation Metrics — measuring how good your model really is.