Feature Engineering Lesson 17 – Interaction Features | Dataplexa
Intermediate Level · Lesson 17

Interaction Features

Some effects only exist when two conditions are true at the same time. A discount only drives volume when paired with high visibility. A drug only works at the right dose for the right age group. Interaction features encode these joint effects directly — so your model doesn't have to discover them from scratch.

An interaction feature is a new column created by multiplying two (or more) existing features together. It captures cases where the effect of one variable depends on the value of another. Lesson 16 introduced interactions as part of polynomial expansion — this lesson focuses on crafting them deliberately, with domain knowledge, rather than generating them blindly in bulk.

Blind Generation vs Targeted Construction

Lesson 16 showed you how to generate every possible interaction automatically with PolynomialFeatures(interaction_only=True). That approach works when you have a small number of features. It breaks down fast when you have 20, 50, or 100 columns — the number of pairs explodes, most are noise, and regularisation can only do so much.

Blind generation — all pairs

Use PolynomialFeatures to auto-create every cross-product.

When it works: fewer than 10 features, tree-based models, automated feature selection downstream.

Risk: dimensionality explosion, most interactions are meaningless noise.

Targeted construction — specific pairs

Multiply exactly the two columns whose joint effect you have a reason to believe in.

When it works: any feature count, any model, when domain knowledge guides the choice.

Benefit: low dimensionality, interpretable, grounded in the business problem.

Manual Interaction Features

The scenario: You're a data scientist at a pharmaceutical company modelling clinical trial outcomes. Your dataset has patient age, drug dosage, and a severity score for their condition. The medical team believes two specific interactions matter: the effect of dosage is amplified in older patients, and a high-severity score combined with high dosage produces different outcomes than either alone. You're going to construct exactly these two interaction terms — nothing else.

# Import pandas
import pandas as pd

# Clinical trial patient data
trial_df = pd.DataFrame({
    'patient_id':    ['PT01','PT02','PT03','PT04','PT05',
                      'PT06','PT07','PT08','PT09','PT10'],
    'age':           [34, 67, 45, 72, 29, 58, 81, 41, 63, 55],
    'dosage_mg':     [10, 20, 10, 30, 10, 20, 30, 20, 30, 10],
    'severity_score':[3,  7,  4,  8,  2,  6,  9,  5,  7,  3],
    'outcome_score': [72, 58, 68, 45, 80, 61, 38, 65, 50, 74]
})

# Interaction 1: age × dosage — captures that older patients react differently to dose
trial_df['age_x_dosage'] = trial_df['age'] * trial_df['dosage_mg']

# Interaction 2: severity × dosage — joint effect of disease severity and treatment dose
trial_df['severity_x_dosage'] = trial_df['severity_score'] * trial_df['dosage_mg']

# Print the raw inputs alongside the two constructed interactions
print(trial_df[['patient_id', 'age', 'dosage_mg', 'severity_score',
                'age_x_dosage', 'severity_x_dosage', 'outcome_score']].to_string(index=False))
 patient_id  age  dosage_mg  severity_score  age_x_dosage  severity_x_dosage  outcome_score
       PT01   34         10               3           340                 30             72
       PT02   67         20               7          1340                140             58
       PT03   45         10               4           450                 40             68
       PT04   72         30               8          2160                240             45
       PT05   29         10               2           290                 20             80
       PT06   58         20               6          1160                120             61
       PT07   81         30               9          2430                270             38
       PT08   41         20               5           820                100             65
       PT09   63         30               7          1890                210             50
       PT10   55         10               3           550                 30             74

What just happened?

Two targeted interaction columns were created with simple multiplication. PT07 — the oldest patient at 81 with the highest dose (30mg) and severity (9) — has the largest interaction values and the worst outcome score (38). PT05 — youngest at 29, lowest dose, lowest severity — has the smallest interactions and the best outcome (80). The interactions expose the compounding effect that neither raw column alone could show.

Interaction Between Continuous and Binary Features

The scenario: You're a growth analyst at a streaming platform building a churn model. You have days_since_last_login as a continuous feature and is_mobile_user as a binary flag. Your hypothesis: long absence from the platform matters more for mobile users than desktop users, because mobile users typically have shorter, more habitual session patterns. The interaction days_since_login × is_mobile captures this conditional effect.

# Import pandas
import pandas as pd

# Streaming platform user data
stream_df = pd.DataFrame({
    'user_id':              ['U01','U02','U03','U04','U05',
                             'U06','U07','U08','U09','U10'],
    'days_since_last_login':[2, 15, 1, 30, 7, 45, 3, 22, 5, 60],
    'is_mobile_user':       [1, 0, 1, 1, 0, 1, 0, 1, 1, 0],
    'monthly_spend_gbp':    [12, 9, 15, 9, 12, 9, 15, 9, 12, 9],
    'churned':              [0, 0, 0, 1, 0, 1, 0, 1, 0, 0]
})

# Continuous × binary interaction: absence effect only fires for mobile users
# For desktop users (is_mobile=0), this column is always 0
stream_df['absence_x_mobile'] = (stream_df['days_since_last_login'] *
                                    stream_df['is_mobile_user'])

# Continuous × continuous: spend and recency combined — high spend + long absence = risk
stream_df['spend_x_absence'] = (stream_df['monthly_spend_gbp'] *
                                   stream_df['days_since_last_login'])

# Print all features together
print(stream_df[['user_id', 'days_since_last_login', 'is_mobile_user',
                 'absence_x_mobile', 'spend_x_absence', 'churned']].to_string(index=False))
 user_id  days_since_last_login  is_mobile_user  absence_x_mobile  spend_x_absence  churned
     U01                      2               1                 2               24        0
     U02                     15               0                 0              135        0
     U03                      1               1                 1               15        0
     U04                     30               1                30              270        1
     U05                      7               0                 0               84        0
     U06                     45               1                45              405        1
     U07                      3               0                 0               45        0
     U08                     22               1                22              198        1
     U09                      5               1                 5               60        0
     U10                     60               0                 0              540        0

What just happened?

For desktop users (U02, U05, U07, U10), absence_x_mobile is always 0 — the interaction is "switched off" by the binary flag. For mobile users, it equals days_since_last_login directly. Notice that the three churned users (U04, U06, U08) all have high absence_x_mobile values — the interaction feature is already separating churned from non-churned far better than either raw column alone.

Validating Whether an Interaction Feature Adds Signal

The scenario: You've constructed several interaction features for a loan default model. Before passing them all to your model, you want a quick signal check — does each constructed interaction feature correlate more strongly with the target than its individual component columns? A correlation table is the simplest, fastest way to validate this before committing to a feature.

# Import pandas and numpy
import pandas as pd
import numpy as np

# Loan application dataset for default prediction
loan_df = pd.DataFrame({
    'loan_id':        ['L01','L02','L03','L04','L05',
                      'L06','L07','L08','L09','L10'],
    'debt_ratio':     [0.25,0.55,0.30,0.72,0.18,
                      0.61,0.40,0.68,0.22,0.58],
    'credit_score':   [720,580,695,530,750,
                      560,640,545,710,590],
    'loan_term_yrs':  [10,25,15,30,10,
                      30,20,25,15,20],
    'defaulted':      [0,1,0,1,0,1,0,1,0,1]
})

# Construct two candidate interaction features
# High debt ratio combined with low credit score is the classic default profile
loan_df['debt_x_credit'] = loan_df['debt_ratio'] * loan_df['credit_score']

# Long loan term combined with high debt ratio also signals risk
loan_df['term_x_debt'] = loan_df['loan_term_yrs'] * loan_df['debt_ratio']

# Compute Pearson correlation of each feature with the target
# abs() gives absolute correlation — direction doesn't matter for signal strength
feature_cols = ['debt_ratio', 'credit_score', 'loan_term_yrs',
                'debt_x_credit', 'term_x_debt']

correlations = loan_df[feature_cols + ['defaulted']].corr()['defaulted'].drop('defaulted')

# Sort by absolute correlation descending to see which features have most signal
corr_table = correlations.abs().sort_values(ascending=False).reset_index()
corr_table.columns = ['feature', 'abs_corr_with_target']
corr_table['abs_corr_with_target'] = corr_table['abs_corr_with_target'].round(4)

print("Feature correlation with default target (absolute):")
print(corr_table.to_string(index=False))
Feature correlation with default target (absolute):
          feature  abs_corr_with_target
      debt_ratio                0.9574
    debt_x_credit               0.8763
     term_x_debt                0.8165
    credit_score                0.7977
   loan_term_yrs                0.5528

What just happened?

We used .corr() to compute Pearson correlation between every feature and the target, then sorted by absolute value. debt_ratio leads at 0.96 — no surprise. Both interaction features rank above credit_score and well above loan_term_yrs alone, confirming they carry genuine signal worth keeping. This is the quick validation step you should run on every constructed feature before it goes into a model.

When Interaction Features Help Most

Linear models benefit most

Linear and logistic regression cannot detect interactions on their own — they model each feature independently. Manually added interaction terms are the only way to give a linear model access to joint effects. Tree-based models can discover interactions automatically through splits, so the gain from manual interactions is smaller.

Scale inputs before multiplying for comparability

When two features have very different scales — like age (20–80) and salary (30,000–200,000) — their product ranges from 600,000 to 16,000,000. That dwarfs every other feature in the model. Standardise or normalise inputs before constructing interactions to keep the resulting column in a sensible range.

Name interactions meaningfully

A column named feature_3_x_feature_17 tells you nothing when debugging a model six months later. Name interactions after what they represent: age_x_dosage, absence_x_mobile, debt_x_credit. Interpretability starts with the column name.

Teacher's Note

The correlation check you saw in Code Block 3 is a good first filter, but correlation only measures linear relationships with the target. An interaction feature can still be valuable even if its Pearson correlation looks modest — particularly in tree-based models that capture non-linear patterns. The better validation is to train with and without the interaction and compare cross-validated performance. If including the feature consistently improves the metric across folds, it earns its place in the dataset. If it doesn't move the needle, drop it — complexity without benefit is just noise.

Practice Questions

1. What arithmetic operation creates an interaction feature between two columns?



2. When a continuous feature is multiplied by a binary flag that equals 0, what is the interaction term's value?



3. Interaction features are most critical for which type of model, since it cannot discover joint effects on its own?



Quiz

1. What does an interaction feature capture that individual raw features cannot?


2. Two features have very different scales — age (20–80) and annual salary (30,000–200,000). What should you do before creating an interaction term?


3. What is the most reliable way to validate that an interaction feature genuinely improves a model?


Up Next · Lesson 18

Target Encoding

Replace category labels with the mean of the target variable — a powerful technique for high-cardinality columns that one-hot encoding can't handle.