Feature Engineering Course
Interaction Features
Some effects only exist when two conditions are true at the same time. A discount only drives volume when paired with high visibility. A drug only works at the right dose for the right age group. Interaction features encode these joint effects directly — so your model doesn't have to discover them from scratch.
An interaction feature is a new column created by multiplying two (or more) existing features together. It captures cases where the effect of one variable depends on the value of another. Lesson 16 introduced interactions as part of polynomial expansion — this lesson focuses on crafting them deliberately, with domain knowledge, rather than generating them blindly in bulk.
Blind Generation vs Targeted Construction
Lesson 16 showed you how to generate every possible interaction automatically with PolynomialFeatures(interaction_only=True). That approach works when you have a small number of features. It breaks down fast when you have 20, 50, or 100 columns — the number of pairs explodes, most are noise, and regularisation can only do so much.
Blind generation — all pairs
Use PolynomialFeatures to auto-create every cross-product.
When it works: fewer than 10 features, tree-based models, automated feature selection downstream.
Risk: dimensionality explosion, most interactions are meaningless noise.
Targeted construction — specific pairs
Multiply exactly the two columns whose joint effect you have a reason to believe in.
When it works: any feature count, any model, when domain knowledge guides the choice.
Benefit: low dimensionality, interpretable, grounded in the business problem.
Manual Interaction Features
The scenario: You're a data scientist at a pharmaceutical company modelling clinical trial outcomes. Your dataset has patient age, drug dosage, and a severity score for their condition. The medical team believes two specific interactions matter: the effect of dosage is amplified in older patients, and a high-severity score combined with high dosage produces different outcomes than either alone. You're going to construct exactly these two interaction terms — nothing else.
# Import pandas
import pandas as pd
# Clinical trial patient data
trial_df = pd.DataFrame({
'patient_id': ['PT01','PT02','PT03','PT04','PT05',
'PT06','PT07','PT08','PT09','PT10'],
'age': [34, 67, 45, 72, 29, 58, 81, 41, 63, 55],
'dosage_mg': [10, 20, 10, 30, 10, 20, 30, 20, 30, 10],
'severity_score':[3, 7, 4, 8, 2, 6, 9, 5, 7, 3],
'outcome_score': [72, 58, 68, 45, 80, 61, 38, 65, 50, 74]
})
# Interaction 1: age × dosage — captures that older patients react differently to dose
trial_df['age_x_dosage'] = trial_df['age'] * trial_df['dosage_mg']
# Interaction 2: severity × dosage — joint effect of disease severity and treatment dose
trial_df['severity_x_dosage'] = trial_df['severity_score'] * trial_df['dosage_mg']
# Print the raw inputs alongside the two constructed interactions
print(trial_df[['patient_id', 'age', 'dosage_mg', 'severity_score',
'age_x_dosage', 'severity_x_dosage', 'outcome_score']].to_string(index=False))
patient_id age dosage_mg severity_score age_x_dosage severity_x_dosage outcome_score
PT01 34 10 3 340 30 72
PT02 67 20 7 1340 140 58
PT03 45 10 4 450 40 68
PT04 72 30 8 2160 240 45
PT05 29 10 2 290 20 80
PT06 58 20 6 1160 120 61
PT07 81 30 9 2430 270 38
PT08 41 20 5 820 100 65
PT09 63 30 7 1890 210 50
PT10 55 10 3 550 30 74What just happened?
Two targeted interaction columns were created with simple multiplication. PT07 — the oldest patient at 81 with the highest dose (30mg) and severity (9) — has the largest interaction values and the worst outcome score (38). PT05 — youngest at 29, lowest dose, lowest severity — has the smallest interactions and the best outcome (80). The interactions expose the compounding effect that neither raw column alone could show.
Interaction Between Continuous and Binary Features
The scenario: You're a growth analyst at a streaming platform building a churn model. You have days_since_last_login as a continuous feature and is_mobile_user as a binary flag. Your hypothesis: long absence from the platform matters more for mobile users than desktop users, because mobile users typically have shorter, more habitual session patterns. The interaction days_since_login × is_mobile captures this conditional effect.
# Import pandas
import pandas as pd
# Streaming platform user data
stream_df = pd.DataFrame({
'user_id': ['U01','U02','U03','U04','U05',
'U06','U07','U08','U09','U10'],
'days_since_last_login':[2, 15, 1, 30, 7, 45, 3, 22, 5, 60],
'is_mobile_user': [1, 0, 1, 1, 0, 1, 0, 1, 1, 0],
'monthly_spend_gbp': [12, 9, 15, 9, 12, 9, 15, 9, 12, 9],
'churned': [0, 0, 0, 1, 0, 1, 0, 1, 0, 0]
})
# Continuous × binary interaction: absence effect only fires for mobile users
# For desktop users (is_mobile=0), this column is always 0
stream_df['absence_x_mobile'] = (stream_df['days_since_last_login'] *
stream_df['is_mobile_user'])
# Continuous × continuous: spend and recency combined — high spend + long absence = risk
stream_df['spend_x_absence'] = (stream_df['monthly_spend_gbp'] *
stream_df['days_since_last_login'])
# Print all features together
print(stream_df[['user_id', 'days_since_last_login', 'is_mobile_user',
'absence_x_mobile', 'spend_x_absence', 'churned']].to_string(index=False))
user_id days_since_last_login is_mobile_user absence_x_mobile spend_x_absence churned
U01 2 1 2 24 0
U02 15 0 0 135 0
U03 1 1 1 15 0
U04 30 1 30 270 1
U05 7 0 0 84 0
U06 45 1 45 405 1
U07 3 0 0 45 0
U08 22 1 22 198 1
U09 5 1 5 60 0
U10 60 0 0 540 0What just happened?
For desktop users (U02, U05, U07, U10), absence_x_mobile is always 0 — the interaction is "switched off" by the binary flag. For mobile users, it equals days_since_last_login directly. Notice that the three churned users (U04, U06, U08) all have high absence_x_mobile values — the interaction feature is already separating churned from non-churned far better than either raw column alone.
Validating Whether an Interaction Feature Adds Signal
The scenario: You've constructed several interaction features for a loan default model. Before passing them all to your model, you want a quick signal check — does each constructed interaction feature correlate more strongly with the target than its individual component columns? A correlation table is the simplest, fastest way to validate this before committing to a feature.
# Import pandas and numpy
import pandas as pd
import numpy as np
# Loan application dataset for default prediction
loan_df = pd.DataFrame({
'loan_id': ['L01','L02','L03','L04','L05',
'L06','L07','L08','L09','L10'],
'debt_ratio': [0.25,0.55,0.30,0.72,0.18,
0.61,0.40,0.68,0.22,0.58],
'credit_score': [720,580,695,530,750,
560,640,545,710,590],
'loan_term_yrs': [10,25,15,30,10,
30,20,25,15,20],
'defaulted': [0,1,0,1,0,1,0,1,0,1]
})
# Construct two candidate interaction features
# High debt ratio combined with low credit score is the classic default profile
loan_df['debt_x_credit'] = loan_df['debt_ratio'] * loan_df['credit_score']
# Long loan term combined with high debt ratio also signals risk
loan_df['term_x_debt'] = loan_df['loan_term_yrs'] * loan_df['debt_ratio']
# Compute Pearson correlation of each feature with the target
# abs() gives absolute correlation — direction doesn't matter for signal strength
feature_cols = ['debt_ratio', 'credit_score', 'loan_term_yrs',
'debt_x_credit', 'term_x_debt']
correlations = loan_df[feature_cols + ['defaulted']].corr()['defaulted'].drop('defaulted')
# Sort by absolute correlation descending to see which features have most signal
corr_table = correlations.abs().sort_values(ascending=False).reset_index()
corr_table.columns = ['feature', 'abs_corr_with_target']
corr_table['abs_corr_with_target'] = corr_table['abs_corr_with_target'].round(4)
print("Feature correlation with default target (absolute):")
print(corr_table.to_string(index=False))
Feature correlation with default target (absolute):
feature abs_corr_with_target
debt_ratio 0.9574
debt_x_credit 0.8763
term_x_debt 0.8165
credit_score 0.7977
loan_term_yrs 0.5528What just happened?
We used .corr() to compute Pearson correlation between every feature and the target, then sorted by absolute value. debt_ratio leads at 0.96 — no surprise. Both interaction features rank above credit_score and well above loan_term_yrs alone, confirming they carry genuine signal worth keeping. This is the quick validation step you should run on every constructed feature before it goes into a model.
When Interaction Features Help Most
Linear models benefit most
Linear and logistic regression cannot detect interactions on their own — they model each feature independently. Manually added interaction terms are the only way to give a linear model access to joint effects. Tree-based models can discover interactions automatically through splits, so the gain from manual interactions is smaller.
Scale inputs before multiplying for comparability
When two features have very different scales — like age (20–80) and salary (30,000–200,000) — their product ranges from 600,000 to 16,000,000. That dwarfs every other feature in the model. Standardise or normalise inputs before constructing interactions to keep the resulting column in a sensible range.
Name interactions meaningfully
A column named feature_3_x_feature_17 tells you nothing when debugging a model six months later. Name interactions after what they represent: age_x_dosage, absence_x_mobile, debt_x_credit. Interpretability starts with the column name.
Teacher's Note
The correlation check you saw in Code Block 3 is a good first filter, but correlation only measures linear relationships with the target. An interaction feature can still be valuable even if its Pearson correlation looks modest — particularly in tree-based models that capture non-linear patterns. The better validation is to train with and without the interaction and compare cross-validated performance. If including the feature consistently improves the metric across folds, it earns its place in the dataset. If it doesn't move the needle, drop it — complexity without benefit is just noise.
Practice Questions
1. What arithmetic operation creates an interaction feature between two columns?
2. When a continuous feature is multiplied by a binary flag that equals 0, what is the interaction term's value?
3. Interaction features are most critical for which type of model, since it cannot discover joint effects on its own?
Quiz
1. What does an interaction feature capture that individual raw features cannot?
2. Two features have very different scales — age (20–80) and annual salary (30,000–200,000). What should you do before creating an interaction term?
3. What is the most reliable way to validate that an interaction feature genuinely improves a model?
Up Next · Lesson 18
Target Encoding
Replace category labels with the mean of the target variable — a powerful technique for high-cardinality columns that one-hot encoding can't handle.