Feature Engineering Lesson 16 – Polynomial Features | Dataplexa
Intermediate Level · Lesson 16

Polynomial Features

Linear models assume straight-line relationships. But the real world is curved. Polynomial features let a linear model bend — by adding squared, cubed, and cross-product terms that capture the non-linearity hiding in your data.

Polynomial feature generation expands a set of input features into higher-degree terms. Given a feature x, it creates , , and so on. Given two features x and z, it also creates the interaction term x·z. A linear model trained on these expanded features can fit curves and combined effects without changing its underlying algorithm at all.

Why Squared Terms Matter

Imagine predicting energy consumption from outdoor temperature. At mild temperatures, consumption is low. But both very cold days and very hot days drive energy use up — heating in winter, air conditioning in summer. That's a U-shaped (quadratic) relationship. A plain linear model sees temperature and fits a straight line — it will miss the uptick at both extremes. Add temperature² as a feature and suddenly the model can capture the curve.

1

Degree-2 (quadratic) terms — x²

Captures U-shapes and diminishing returns. Common in economics (cost curves), medicine (dose-response), and energy modelling. Adding x² alongside x lets the model fit a parabola.

2

Interaction terms — x · z

Captures the combined effect of two features. The effect of advertising spend might depend on the market size — neither alone tells the full story. An interaction term encodes that dependency explicitly.

3

Higher-degree terms — x³, x⁴

Captures more complex curves. Use with caution — degree 3+ dramatically increases the number of features and the risk of overfitting, especially on small datasets. Regularisation becomes essential.

Generating Polynomial Features with scikit-learn

The scenario: You're a data scientist at a utilities company modelling monthly electricity consumption. Your dataset has two features: avg_temp_c — the average outdoor temperature for the month — and household_size — number of residents. You suspect a quadratic relationship with temperature (high at both extremes) and an interaction between temperature and household size. You're going to use sklearn's PolynomialFeatures to generate all degree-2 terms automatically.

# Import pandas and PolynomialFeatures
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures

# Monthly energy data — two input features
energy_df = pd.DataFrame({
    'month':          ['Jan','Feb','Mar','Apr','May',
                       'Jun','Jul','Aug','Sep','Oct'],
    'avg_temp_c':     [2, 4, 10, 15, 20, 26, 30, 28, 21, 13],
    'household_size': [2, 3, 4, 2, 3, 4, 2, 3, 4, 2],
    'kwh_consumed':   [520,490,310,260,280,410,510,480,290,320]
})

# Select the two input features — exclude month and target
feature_cols = ['avg_temp_c', 'household_size']
X = energy_df[feature_cols]

# degree=2 generates: original terms, squared terms, and interaction term
# include_bias=False removes the constant 1 column (handled by the model intercept)
poly = PolynomialFeatures(degree=2, include_bias=False)

# fit_transform learns the feature names and generates all polynomial terms
X_poly = poly.fit_transform(X)

# get_feature_names_out shows exactly which column maps to which term
poly_names = poly.get_feature_names_out(feature_cols)
poly_df = pd.DataFrame(X_poly, columns=poly_names)

# Print the feature names to understand what was generated
print("Generated polynomial features:")
for name in poly_names:
    print(f"  {name}")

print()
print(poly_df.to_string(index=False))
Generated polynomial features:
  avg_temp_c
  household_size
  avg_temp_c^2
  household_size^2
  avg_temp_c household_size

 avg_temp_c  household_size  avg_temp_c^2  household_size^2  avg_temp_c household_size
        2.0             2.0           4.0               4.0                         4.0
        4.0             3.0          16.0               9.0                        12.0
       10.0             4.0         100.0              16.0                        40.0
       15.0             2.0         225.0               4.0                        30.0
       20.0             3.0         400.0               9.0                        60.0
       26.0             4.0         676.0              16.0                       104.0
       30.0             2.0         900.0               4.0                        60.0
       28.0             3.0         784.0               9.0                        84.0
       21.0             4.0         441.0              16.0                        84.0
       13.0             2.0         169.0               4.0                        26.0

What just happened?

PolynomialFeatures took 2 input columns and produced 5 output columns: the two originals, their squares, and their interaction term. The formula for degree 2 with n features is (n + 2)! / (2! × n!) terms — with 2 features that's 5. Notice how avg_temp_c^2 grows sharply at the temperature extremes (4 in January, 900 in July) — exactly the shape needed to model the U-curve in energy consumption.

Interaction-Only Terms

The scenario: You're building a sales forecasting model at a consumer goods company. You have three features: ad_spend, price_discount, and market_size. Your marketing director believes that advertising only works when paired with a discount — neither alone drives the same uplift as both together. You want interaction terms but not squared terms, since there's no reason to think ad spend squared means anything on its own.

# Import pandas and PolynomialFeatures
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures

# Sales campaign data across regional markets
sales_df = pd.DataFrame({
    'region':       ['North','South','East','West','North',
                     'South','East','West','North','South'],
    'ad_spend':     [50,80,120,40,90,110,60,70,100,55],
    'price_discount':[5,10,15,0,20,10,5,15,20,0],
    'market_size':  [200,350,500,150,280,420,190,310,460,175]
})

# interaction_only=True generates ONLY cross-product terms, no squared terms
# degree=2 with interaction_only gives: x*z, x*w, z*w for three features
poly_interact = PolynomialFeatures(degree=2, interaction_only=True, include_bias=False)

# Fit and transform the three numerical features
feature_cols = ['ad_spend', 'price_discount', 'market_size']
X_interact = poly_interact.fit_transform(sales_df[feature_cols])

# Get the generated column names
interact_names = poly_interact.get_feature_names_out(feature_cols)
interact_df = pd.DataFrame(X_interact, columns=interact_names, dtype=int)

# Print feature names and the first five rows
print("Interaction-only features generated:")
for name in interact_names:
    print(f"  {name}")
print()
print(interact_df.to_string(index=False))
Interaction-only features generated:
  ad_spend
  price_discount
  market_size
  ad_spend price_discount
  ad_spend market_size
  price_discount market_size

 ad_spend  price_discount  market_size  ad_spend price_discount  ad_spend market_size  price_discount market_size
       50               5          200                       250                 10000                       1000
       80              10          350                       800                 28000                       3500
      120              15          500                      1800                 60000                       7500
       40               0          150                         0                  6000                          0
       90              20          280                      1800                 25200                       5600
      110              10          420                      1100                 46200                       4200
       60               5          190                       300                 11400                        950
       70              15          310                      1050                 21700                       4650
      100              20          460                      2000                 46000                       9200
       55               0          175                         0                  9625                          0

What just happened?

interaction_only=True produced cross-products only — no ad_spend² or price_discount². Rows 4 and 10 had zero discount — so every interaction involving price_discount is also zero, correctly representing the absence of a combined effect. Row 3 (120 ad spend, 15% discount, 500k market) produced the largest ad_spend market_size interaction at 60,000.

Feature Explosion — Degree vs Column Count

This is the most important practical consideration with polynomial features. The number of output columns grows very fast:

Input features Degree 2 output Degree 3 output Risk
2 5 9 Safe
5 20 55 Moderate
10 65 285 High — regularise
20 230 1770 Very high — use interaction_only

Polynomial Features in a Pipeline with Scaling

The scenario: You're finalising the energy model. Before generating polynomial features, you want to scale the inputs — otherwise the squared term of a large-valued feature will be enormous and dominate the model. You'll chain StandardScalerPolynomialFeatures in the correct order, then confirm the output is ready for a linear model.

# Import all tools needed
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split

# Re-use the energy dataset from earlier in this lesson
energy_df = pd.DataFrame({
    'avg_temp_c':    [2, 4, 10, 15, 20, 26, 30, 28, 21, 13],
    'household_size':[2, 3, 4, 2, 3, 4, 2, 3, 4, 2],
    'kwh_consumed':  [520,490,310,260,280,410,510,480,290,320]
})

# Separate features and target, then split
X = energy_df[['avg_temp_c', 'household_size']]
y = energy_df['kwh_consumed']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build a Pipeline: scale first, then generate polynomial terms
# Order matters: scale raw features before squaring so the terms stay well-behaved
poly_pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('poly',   PolynomialFeatures(degree=2, include_bias=False))
])

# Fit on training data, transform both sets
X_train_poly = poly_pipeline.fit_transform(X_train)
X_test_poly  = poly_pipeline.transform(X_test)

# Retrieve the final column names from the poly step
feature_names = poly_pipeline.named_steps['poly'].get_feature_names_out(X.columns)

# Print shape and first few rows of the transformed training set
print(f"Input shape:  {X_train.shape}")
print(f"Output shape: {X_train_poly.shape}")
print(f"Features:     {feature_names.tolist()}")
print()
result = pd.DataFrame(X_train_poly.round(3), columns=feature_names)
print(result.to_string(index=False))
Input shape:  (8, 2)
Output shape: (8, 5)
Features:     ['avg_temp_c', 'household_size', 'avg_temp_c^2', 'household_size^2', 'avg_temp_c household_size']

 avg_temp_c  household_size  avg_temp_c^2  household_size^2  avg_temp_c household_size
     -1.471          -0.577         2.163             0.333                       0.849
     -1.234           1.155         1.522             1.334                      -1.425
     -0.285          -0.577         0.081             0.333                       0.164
      0.426           1.155         0.181             1.334                       0.492
      1.136          -0.577         1.290             0.333                      -0.655
      1.609           1.155         2.590             1.334                       1.859
      0.663          -0.577         0.440             0.333                      -0.383
     -0.521          -0.577         0.272             0.333                       0.301

What just happened?

The Pipeline scaled the raw features first, then applied PolynomialFeatures to the standardised values. Because the inputs were already centred near zero, the squared terms stay in a sensible range (0–2.6) rather than blowing up to hundreds or thousands. The full Pipeline can now be passed directly to any sklearn model — it will correctly apply .fit() only to training data during cross-validation.

Always scale before generating polynomial terms

If avg_temp_c ranges from 2 to 30, then avg_temp_c² ranges from 4 to 900. After scaling, both the original and its square stay in a compact, comparable range. The order in the Pipeline — scaler before poly — enforces this correctly every time.

Use regularisation with polynomial features

Polynomial expansion increases the number of features, which increases the risk of overfitting — especially with small datasets. Always pair polynomial features with Ridge or Lasso regression rather than plain linear regression. The regularisation penalty keeps coefficient magnitudes in check.

Teacher's Note

Polynomial features are powerful but easy to misuse. The most common mistake is applying degree=3 or higher to ten or more features and then wondering why the model overfits catastrophically. Before you expand, ask: do I have a theoretical reason to believe a quadratic or interaction effect exists here? Domain intuition should guide polynomial feature creation, not blind grid-searching over degrees. If you're not sure, start with interaction_only=True at degree 2 — it adds meaningful cross-product terms with far less dimensionality explosion than full polynomial expansion.

Practice Questions

1. Which scikit-learn class is used to generate squared, cubed, and interaction terms from numerical features?



2. Which PolynomialFeatures parameter generates only cross-product terms without any squared terms?



3. In a Pipeline, StandardScaler should come ________ PolynomialFeatures to keep squared terms in a sensible range.



Quiz

1. Why would you add a temperature² feature to an energy consumption model?


2. How many output features does PolynomialFeatures(degree=2, include_bias=False) produce from 10 input features?


3. What should you use alongside polynomial features to prevent overfitting?


Up Next · Lesson 17

Interaction Features

Go deeper on cross-product terms — learn how to craft targeted interactions from domain knowledge rather than generating them blindly.