Feature Engineering Course
Polynomial Features
Linear models assume straight-line relationships. But the real world is curved. Polynomial features let a linear model bend — by adding squared, cubed, and cross-product terms that capture the non-linearity hiding in your data.
Polynomial feature generation expands a set of input features into higher-degree terms. Given a feature x, it creates x², x³, and so on. Given two features x and z, it also creates the interaction term x·z. A linear model trained on these expanded features can fit curves and combined effects without changing its underlying algorithm at all.
Why Squared Terms Matter
Imagine predicting energy consumption from outdoor temperature. At mild temperatures, consumption is low. But both very cold days and very hot days drive energy use up — heating in winter, air conditioning in summer. That's a U-shaped (quadratic) relationship. A plain linear model sees temperature and fits a straight line — it will miss the uptick at both extremes. Add temperature² as a feature and suddenly the model can capture the curve.
Degree-2 (quadratic) terms — x²
Captures U-shapes and diminishing returns. Common in economics (cost curves), medicine (dose-response), and energy modelling. Adding x² alongside x lets the model fit a parabola.
Interaction terms — x · z
Captures the combined effect of two features. The effect of advertising spend might depend on the market size — neither alone tells the full story. An interaction term encodes that dependency explicitly.
Higher-degree terms — x³, x⁴
Captures more complex curves. Use with caution — degree 3+ dramatically increases the number of features and the risk of overfitting, especially on small datasets. Regularisation becomes essential.
Generating Polynomial Features with scikit-learn
The scenario: You're a data scientist at a utilities company modelling monthly electricity consumption. Your dataset has two features: avg_temp_c — the average outdoor temperature for the month — and household_size — number of residents. You suspect a quadratic relationship with temperature (high at both extremes) and an interaction between temperature and household size. You're going to use sklearn's PolynomialFeatures to generate all degree-2 terms automatically.
# Import pandas and PolynomialFeatures
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
# Monthly energy data — two input features
energy_df = pd.DataFrame({
'month': ['Jan','Feb','Mar','Apr','May',
'Jun','Jul','Aug','Sep','Oct'],
'avg_temp_c': [2, 4, 10, 15, 20, 26, 30, 28, 21, 13],
'household_size': [2, 3, 4, 2, 3, 4, 2, 3, 4, 2],
'kwh_consumed': [520,490,310,260,280,410,510,480,290,320]
})
# Select the two input features — exclude month and target
feature_cols = ['avg_temp_c', 'household_size']
X = energy_df[feature_cols]
# degree=2 generates: original terms, squared terms, and interaction term
# include_bias=False removes the constant 1 column (handled by the model intercept)
poly = PolynomialFeatures(degree=2, include_bias=False)
# fit_transform learns the feature names and generates all polynomial terms
X_poly = poly.fit_transform(X)
# get_feature_names_out shows exactly which column maps to which term
poly_names = poly.get_feature_names_out(feature_cols)
poly_df = pd.DataFrame(X_poly, columns=poly_names)
# Print the feature names to understand what was generated
print("Generated polynomial features:")
for name in poly_names:
print(f" {name}")
print()
print(poly_df.to_string(index=False))
Generated polynomial features:
avg_temp_c
household_size
avg_temp_c^2
household_size^2
avg_temp_c household_size
avg_temp_c household_size avg_temp_c^2 household_size^2 avg_temp_c household_size
2.0 2.0 4.0 4.0 4.0
4.0 3.0 16.0 9.0 12.0
10.0 4.0 100.0 16.0 40.0
15.0 2.0 225.0 4.0 30.0
20.0 3.0 400.0 9.0 60.0
26.0 4.0 676.0 16.0 104.0
30.0 2.0 900.0 4.0 60.0
28.0 3.0 784.0 9.0 84.0
21.0 4.0 441.0 16.0 84.0
13.0 2.0 169.0 4.0 26.0What just happened?
PolynomialFeatures took 2 input columns and produced 5 output columns: the two originals, their squares, and their interaction term. The formula for degree 2 with n features is (n + 2)! / (2! × n!) terms — with 2 features that's 5. Notice how avg_temp_c^2 grows sharply at the temperature extremes (4 in January, 900 in July) — exactly the shape needed to model the U-curve in energy consumption.
Interaction-Only Terms
The scenario: You're building a sales forecasting model at a consumer goods company. You have three features: ad_spend, price_discount, and market_size. Your marketing director believes that advertising only works when paired with a discount — neither alone drives the same uplift as both together. You want interaction terms but not squared terms, since there's no reason to think ad spend squared means anything on its own.
# Import pandas and PolynomialFeatures
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
# Sales campaign data across regional markets
sales_df = pd.DataFrame({
'region': ['North','South','East','West','North',
'South','East','West','North','South'],
'ad_spend': [50,80,120,40,90,110,60,70,100,55],
'price_discount':[5,10,15,0,20,10,5,15,20,0],
'market_size': [200,350,500,150,280,420,190,310,460,175]
})
# interaction_only=True generates ONLY cross-product terms, no squared terms
# degree=2 with interaction_only gives: x*z, x*w, z*w for three features
poly_interact = PolynomialFeatures(degree=2, interaction_only=True, include_bias=False)
# Fit and transform the three numerical features
feature_cols = ['ad_spend', 'price_discount', 'market_size']
X_interact = poly_interact.fit_transform(sales_df[feature_cols])
# Get the generated column names
interact_names = poly_interact.get_feature_names_out(feature_cols)
interact_df = pd.DataFrame(X_interact, columns=interact_names, dtype=int)
# Print feature names and the first five rows
print("Interaction-only features generated:")
for name in interact_names:
print(f" {name}")
print()
print(interact_df.to_string(index=False))
Interaction-only features generated:
ad_spend
price_discount
market_size
ad_spend price_discount
ad_spend market_size
price_discount market_size
ad_spend price_discount market_size ad_spend price_discount ad_spend market_size price_discount market_size
50 5 200 250 10000 1000
80 10 350 800 28000 3500
120 15 500 1800 60000 7500
40 0 150 0 6000 0
90 20 280 1800 25200 5600
110 10 420 1100 46200 4200
60 5 190 300 11400 950
70 15 310 1050 21700 4650
100 20 460 2000 46000 9200
55 0 175 0 9625 0What just happened?
interaction_only=True produced cross-products only — no ad_spend² or price_discount². Rows 4 and 10 had zero discount — so every interaction involving price_discount is also zero, correctly representing the absence of a combined effect. Row 3 (120 ad spend, 15% discount, 500k market) produced the largest ad_spend market_size interaction at 60,000.
Feature Explosion — Degree vs Column Count
This is the most important practical consideration with polynomial features. The number of output columns grows very fast:
| Input features | Degree 2 output | Degree 3 output | Risk |
|---|---|---|---|
| 2 | 5 | 9 | Safe |
| 5 | 20 | 55 | Moderate |
| 10 | 65 | 285 | High — regularise |
| 20 | 230 | 1770 | Very high — use interaction_only |
Polynomial Features in a Pipeline with Scaling
The scenario: You're finalising the energy model. Before generating polynomial features, you want to scale the inputs — otherwise the squared term of a large-valued feature will be enormous and dominate the model. You'll chain StandardScaler → PolynomialFeatures in the correct order, then confirm the output is ready for a linear model.
# Import all tools needed
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
# Re-use the energy dataset from earlier in this lesson
energy_df = pd.DataFrame({
'avg_temp_c': [2, 4, 10, 15, 20, 26, 30, 28, 21, 13],
'household_size':[2, 3, 4, 2, 3, 4, 2, 3, 4, 2],
'kwh_consumed': [520,490,310,260,280,410,510,480,290,320]
})
# Separate features and target, then split
X = energy_df[['avg_temp_c', 'household_size']]
y = energy_df['kwh_consumed']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Build a Pipeline: scale first, then generate polynomial terms
# Order matters: scale raw features before squaring so the terms stay well-behaved
poly_pipeline = Pipeline([
('scaler', StandardScaler()),
('poly', PolynomialFeatures(degree=2, include_bias=False))
])
# Fit on training data, transform both sets
X_train_poly = poly_pipeline.fit_transform(X_train)
X_test_poly = poly_pipeline.transform(X_test)
# Retrieve the final column names from the poly step
feature_names = poly_pipeline.named_steps['poly'].get_feature_names_out(X.columns)
# Print shape and first few rows of the transformed training set
print(f"Input shape: {X_train.shape}")
print(f"Output shape: {X_train_poly.shape}")
print(f"Features: {feature_names.tolist()}")
print()
result = pd.DataFrame(X_train_poly.round(3), columns=feature_names)
print(result.to_string(index=False))
Input shape: (8, 2)
Output shape: (8, 5)
Features: ['avg_temp_c', 'household_size', 'avg_temp_c^2', 'household_size^2', 'avg_temp_c household_size']
avg_temp_c household_size avg_temp_c^2 household_size^2 avg_temp_c household_size
-1.471 -0.577 2.163 0.333 0.849
-1.234 1.155 1.522 1.334 -1.425
-0.285 -0.577 0.081 0.333 0.164
0.426 1.155 0.181 1.334 0.492
1.136 -0.577 1.290 0.333 -0.655
1.609 1.155 2.590 1.334 1.859
0.663 -0.577 0.440 0.333 -0.383
-0.521 -0.577 0.272 0.333 0.301What just happened?
The Pipeline scaled the raw features first, then applied PolynomialFeatures to the standardised values. Because the inputs were already centred near zero, the squared terms stay in a sensible range (0–2.6) rather than blowing up to hundreds or thousands. The full Pipeline can now be passed directly to any sklearn model — it will correctly apply .fit() only to training data during cross-validation.
Always scale before generating polynomial terms
If avg_temp_c ranges from 2 to 30, then avg_temp_c² ranges from 4 to 900. After scaling, both the original and its square stay in a compact, comparable range. The order in the Pipeline — scaler before poly — enforces this correctly every time.
Use regularisation with polynomial features
Polynomial expansion increases the number of features, which increases the risk of overfitting — especially with small datasets. Always pair polynomial features with Ridge or Lasso regression rather than plain linear regression. The regularisation penalty keeps coefficient magnitudes in check.
Teacher's Note
Polynomial features are powerful but easy to misuse. The most common mistake is applying degree=3 or higher to ten or more features and then wondering why the model overfits catastrophically. Before you expand, ask: do I have a theoretical reason to believe a quadratic or interaction effect exists here? Domain intuition should guide polynomial feature creation, not blind grid-searching over degrees. If you're not sure, start with interaction_only=True at degree 2 — it adds meaningful cross-product terms with far less dimensionality explosion than full polynomial expansion.
Practice Questions
1. Which scikit-learn class is used to generate squared, cubed, and interaction terms from numerical features?
2. Which PolynomialFeatures parameter generates only cross-product terms without any squared terms?
3. In a Pipeline, StandardScaler should come ________ PolynomialFeatures to keep squared terms in a sensible range.
Quiz
1. Why would you add a temperature² feature to an energy consumption model?
2. How many output features does PolynomialFeatures(degree=2, include_bias=False) produce from 10 input features?
3. What should you use alongside polynomial features to prevent overfitting?
Up Next · Lesson 17
Interaction Features
Go deeper on cross-product terms — learn how to craft targeted interactions from domain knowledge rather than generating them blindly.