Feature Engineering Lesson 40 – FE for Time Series | Dataplexa
Advanced Level · Lesson 40

Feature Engineering for Time Series

A timestamp is not a feature. It's a container — packed with calendar cycles, seasonal rhythms, trend direction, and autocorrelation. Feature engineering for time series is the process of unpacking all of that structure into columns a model can actually learn from.

Time series data has a property no other data type has: temporal ordering matters. You cannot shuffle rows. You cannot use future data to predict the past. Every feature you create must respect the arrow of time — it can only use information that was genuinely available at the moment the prediction would have been made.

The Time Series Feature Engineering Stack

1

Calendar Features

Year, month, day of week, hour, quarter, week of year, is_weekend, is_holiday. These encode cyclical patterns — demand spikes every Monday, revenue dips in August, energy consumption peaks at 8am. Calendar features are free and almost always predictive.

2

Lag and Rolling Features

Covered in depth in Lessons 33 and 34 — but worth restating here in the time series context. Lag features give the model explicit access to specific past moments. Rolling features give it smoothed summaries of recent history. Together they encode both point-in-time memory and trajectory.

3

Fourier / Sine-Cosine Encoding for Cyclical Features

Month 12 and month 1 are adjacent in the calendar but numerically far apart (12 vs 1). A model treating month as a raw integer will never learn that December and January are similar. Sine-cosine encoding wraps cyclical features onto a circle — now December and January are close in feature space.

4

Trend and Seasonality Decomposition Features

Decomposing a time series into trend, seasonal, and residual components — then using those components as features — gives the model an explicit, pre-labelled view of long-run direction vs cyclic oscillation vs random noise. The residual component is especially useful for anomaly detection.

5

Time Since / Time Until Features

Days since last purchase. Hours until next scheduled maintenance. Weeks since account creation. These elapsed-time features encode proximity to important events in a way that raw calendar values cannot — and they often have a direct, intuitive relationship with the target.

Calendar Features and Cyclical Sine-Cosine Encoding

The scenario:

You're a data scientist at an energy utility building a daily electricity demand forecasting model. The raw data has a timestamp and a demand reading. Your job is to extract every useful calendar signal and then apply sine-cosine encoding to the cyclical features — month and day of week — so the model correctly understands that Sunday (day 6) and Monday (day 0) are adjacent, not 6 integers apart.

# Import pandas and numpy
import pandas as pd
import numpy as np

# Create a daily energy demand DataFrame spanning 14 days across a month boundary
energy_df = pd.DataFrame({
    'date':   pd.date_range(start='2024-01-28', periods=14, freq='D'),  # 14 days including month boundary
    'demand': [310, 295, 280, 275, 320, 330, 315,                       # Jan 28 – Feb 3
               305, 290, 278, 272, 325, 335, 318]                       # Feb 4 – Feb 10 (MWh)
})

# --- Step 1: Extract raw calendar features from the datetime column ---
energy_df['year']        = energy_df['date'].dt.year          # calendar year
energy_df['month']       = energy_df['date'].dt.month         # 1=Jan, 12=Dec
energy_df['day']         = energy_df['date'].dt.day           # day of month (1–31)
energy_df['dayofweek']   = energy_df['date'].dt.dayofweek     # 0=Monday, 6=Sunday
energy_df['quarter']     = energy_df['date'].dt.quarter       # Q1=1, Q4=4
energy_df['weekofyear']  = energy_df['date'].dt.isocalendar().week.astype(int)  # ISO week number
energy_df['is_weekend']  = (energy_df['dayofweek'] >= 5).astype(int)            # 1 if Sat or Sun

# --- Step 2: Sine-cosine encoding for cyclical features ---
# Why: month 12 and month 1 are adjacent in time but 11 integers apart numerically
# Sine-cosine encoding places them on a circle so they are spatially close

# Month encoding: period = 12 months
energy_df['month_sin'] = np.sin(2 * np.pi * energy_df['month'] / 12)   # sine component
energy_df['month_cos'] = np.cos(2 * np.pi * energy_df['month'] / 12)   # cosine component

# Day-of-week encoding: period = 7 days
energy_df['dow_sin']   = np.sin(2 * np.pi * energy_df['dayofweek'] / 7)  # sine of day-of-week
energy_df['dow_cos']   = np.cos(2 * np.pi * energy_df['dayofweek'] / 7)  # cosine of day-of-week

# Round for display
energy_df = energy_df.round(3)

# Print calendar features and sine-cosine encoding
print("Calendar + cyclical features:")
print(energy_df[['date','month','dayofweek','is_weekend',
                  'month_sin','month_cos','dow_sin','dow_cos','demand']].to_string(index=False))
Calendar + cyclical features:
       date  month  dayofweek  is_weekend  month_sin  month_cos  dow_sin  dow_cos  demand
 2024-01-28      1          6           1     0.500      0.866    -0.782    0.623     310
 2024-01-29      1          0           0     0.500      0.866     0.000    1.000     295
 2024-01-30      1          1           0     0.500      0.866     0.782    0.623     280
 2024-01-31      1          2           0     0.500      0.866     0.975   -0.223     275
 2024-02-01      2          3           0     0.866      0.500     0.434   -0.901     320
 2024-02-02      2          4           0     0.866      0.500    -0.434   -0.901     330
 2024-02-03      2          5           1     0.866      0.500    -0.975   -0.223     315
 2024-02-04      2          0           0     0.866      0.500     0.000    1.000     305
 2024-02-05      2          1           0     0.866      0.500     0.782    0.623     290
 2024-02-06      2          2           0     0.866      0.500     0.975   -0.223     278
 2024-02-07      2          3           0     0.866      0.500     0.434   -0.901     272
 2024-02-08      2          4           0     0.866      0.500    -0.434   -0.901     325
 2024-02-09      2          5           1     0.866      0.500    -0.975   -0.223     335
 2024-02-10      2          6           1     0.866      0.500    -0.782    0.623     318

What just happened?

The sine-cosine encoding wraps both month and day-of-week onto circles. Jan 28 (Sunday, dayofweek=6) and Jan 29 (Monday, dayofweek=0) have dow_cos values of 0.623 and 1.000 — close together. If we had used raw integers, 6 and 0 are 6 units apart, which would trick a linear model into thinking Sunday and Monday are the most different days of the week. The month boundary from January to February is also smooth: Jan has month_sin=0.500 and Feb has month_sin=0.866 — a small, continuous step rather than a jump from 1 to 2.

Lag Features, Rolling Features, and Trend Indicators

The scenario:

The calendar features alone don't tell the model anything about recent demand history. You now enrich the same energy dataset with lag features (yesterday, same day last week), rolling statistics (3-day mean and std), and a simple trend indicator — whether demand is rising or falling relative to the 7-day rolling average. Together these give the model both memory and momentum.

# Import pandas and numpy
import pandas as pd
import numpy as np

# Recreate the energy DataFrame with 14 rows
energy_df = pd.DataFrame({
    'date':   pd.date_range(start='2024-01-28', periods=14, freq='D'),
    'demand': [310, 295, 280, 275, 320, 330, 315,
               305, 290, 278, 272, 325, 335, 318]
}).sort_values('date').reset_index(drop=True)  # ensure chronological order

# --- Lag features ---
energy_df['lag_1'] = energy_df['demand'].shift(1)   # yesterday's demand
energy_df['lag_7'] = energy_df['demand'].shift(7)   # same day last week

# --- Rolling features ---
energy_df['roll_mean_3d'] = energy_df['demand'].rolling(window=3, min_periods=1).mean()  # 3-day avg
energy_df['roll_std_3d']  = energy_df['demand'].rolling(window=3, min_periods=1).std()   # 3-day volatility
energy_df['roll_mean_7d'] = energy_df['demand'].rolling(window=7, min_periods=1).mean()  # 7-day avg

# --- Day-over-day change ---
energy_df['delta_lag_1']  = energy_df['demand'] - energy_df['lag_1']  # raw change vs yesterday

# --- Trend indicator: is demand above its 7-day rolling average? ---
# +1 = above trend (rising), -1 = below trend (falling), 0 = exactly on trend (rare)
energy_df['trend_flag']   = np.sign(energy_df['demand'] - energy_df['roll_mean_7d']).astype(int)

# --- Demand deviation from rolling mean (z-score style) ---
energy_df['demand_zscore'] = (
    (energy_df['demand'] - energy_df['roll_mean_7d']) /
    (energy_df['roll_std_3d'] + 1e-9)
)

# Round for clean display
energy_df = energy_df.round(2)

# Print the enriched DataFrame
print(energy_df[['date','demand','lag_1','lag_7','roll_mean_3d',
                  'roll_mean_7d','delta_lag_1','trend_flag','demand_zscore']].to_string(index=False))
       date  demand  lag_1   lag_7  roll_mean_3d  roll_mean_7d  delta_lag_1  trend_flag  demand_zscore
 2024-01-28   310.0    NaN     NaN        310.00        310.00          NaN           0           0.00
 2024-01-29   295.0  310.0     NaN        302.50        302.50        -15.0          -1          -5.30
 2024-01-30   280.0  295.0     NaN        295.00        295.00        -15.0          -1         -14.99
 2024-01-31   275.0  280.0     NaN        283.33        290.00         -5.0          -1         -10.67
 2024-02-01   320.0  275.0     NaN        291.67        296.00         45.0           1          25.13
 2024-02-02   330.0  320.0     NaN        308.33        301.67         10.0           1          12.51
 2024-02-03   315.0  330.0     NaN        321.67        303.57         -15.0           1           6.22
 2024-02-04   305.0  315.0   310.0        316.67        304.29          -10.0           0           0.53
 2024-02-05   290.0  305.0   295.0        303.33        301.88         -15.0          -1          -8.47
 2024-02-06   278.0  290.0   280.0        291.00        299.00         -12.0          -1         -14.67
 2024-02-07   272.0  278.0   275.0        280.00        295.57          -6.0          -1         -21.27
 2024-02-08   325.0  272.0   320.0        291.67        302.14          53.0           1          31.34
 2024-02-09   335.0  325.0   330.0        310.67        307.14          10.0           1          18.58
 2024-02-10   318.0  335.0   315.0        326.00        308.86         -17.0           1           8.12

What just happened?

Feb 1 is the most interesting row: delta_lag_1 is +45 (a sudden spike from 275), the trend_flag flips to +1, and the demand_zscore hits 25.13 — a sharp upward deviation. Feb 8 shows a similar pattern: +53 delta and a z-score of 31.34. The lag_7 column is only populated from Feb 4 onward — exactly correct, since we need 7 prior days of data. A forecasting model seeing all these features simultaneously has a rich, multi-resolution view of the demand signal that a model trained on raw demand values alone would never achieve.

Time Since / Time Until Features and Elapsed Time

The scenario:

You're now working on a customer re-engagement model at a subscription platform. The dataset has one row per customer with their last login date and last purchase date. The target is whether they will churn in the next 30 days. Days since last login and days since last purchase are almost certainly among the strongest predictors — customers who haven't logged in for 45 days are far more likely to churn than those who were active yesterday.

# Import pandas and numpy
import pandas as pd
import numpy as np

# Set a fixed reference date — the "today" from the model's perspective
# In production this would be datetime.today() or the pipeline run date
reference_date = pd.Timestamp('2024-03-01')  # snapshot date: March 1, 2024

# Create a customer activity DataFrame — 10 rows
customers_df = pd.DataFrame({
    'customer_id':    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'last_login':     pd.to_datetime([
        '2024-02-29','2024-01-15','2024-02-25','2023-12-01','2024-02-28',
        '2024-01-30','2023-11-15','2024-02-20','2024-02-01','2023-10-10'
    ]),   # date of most recent login
    'last_purchase':  pd.to_datetime([
        '2024-02-20','2024-01-10','2024-02-10','2023-11-15','2024-02-25',
        '2024-01-05','2023-10-20','2024-01-28','2024-01-20','2023-09-01'
    ]),   # date of most recent purchase
    'signup_date':    pd.to_datetime([
        '2022-06-01','2021-03-15','2023-01-10','2020-09-20','2022-11-05',
        '2021-07-22','2020-04-18','2023-05-30','2022-08-14','2019-12-01'
    ]),   # date customer first signed up
    'will_churn':     [0, 1, 0, 1, 0, 1, 1, 0, 0, 1]  # target: 1=will churn in 30 days
})

# --- Time since last login (days) ---
customers_df['days_since_login']    = (reference_date - customers_df['last_login']).dt.days

# --- Time since last purchase (days) ---
customers_df['days_since_purchase'] = (reference_date - customers_df['last_purchase']).dt.days

# --- Customer tenure: days since signup ---
customers_df['tenure_days']         = (reference_date - customers_df['signup_date']).dt.days

# --- Gap between last login and last purchase (days) ---
# Large gap = customer logs in but doesn't buy — a potential disengagement signal
customers_df['login_purchase_gap']  = (
    customers_df['last_login'] - customers_df['last_purchase']
).dt.days

# --- Recency score: inverse of days_since_login — higher = more recent ---
# Adding 1 to avoid division by zero for a same-day login
customers_df['recency_score']       = 1 / (customers_df['days_since_login'] + 1)

# Check class separation for each time-based feature
print("Class separation — time features vs churn:\n")
sep = customers_df.groupby('will_churn')[
    ['days_since_login','days_since_purchase','tenure_days','login_purchase_gap']
].mean().round(1)
print(sep.to_string())

# Print full DataFrame
print("\nFull customer time-feature DataFrame:")
print(customers_df[['customer_id','days_since_login','days_since_purchase',
                     'tenure_days','login_purchase_gap','recency_score','will_churn']].round(4).to_string(index=False))
Class separation — time features vs churn:

            days_since_login  days_since_purchase  tenure_days  login_purchase_gap
will_churn
0                        9.0                 17.2       592.0                  8.6
1                       68.6                 91.4      1098.0                  7.8

Full customer time-feature DataFrame:
 customer_id  days_since_login  days_since_purchase  tenure_days  login_purchase_gap  recency_score  will_churn
           1               1.0                  10.0        638.0                 9.0         0.5000           0
           2              46.0                  51.0       1082.0                 5.0         0.0213           1
           3               5.0                  20.0        416.0                15.0         0.1667           0
           4              91.0                 107.0       1258.0                16.0         0.0109           1
           5               2.0                   5.0        482.0                 3.0         0.3333           0
           6              31.0                  56.0        953.0                25.0         0.0313           1
           7             107.0                 133.0       1413.0                26.0         0.0093           1
           8              10.0                  33.0        275.0                23.0         0.0909           0
           9              29.0                  41.0        565.0                12.0         0.0345           0
          10             143.0                 182.0       1553.0                51.0         0.0069           1

What just happened?

The class separation table reveals the story cleanly. Non-churners averaged only 9 days since their last login; churners averaged 68.6 days — more than 7× longer. The same pattern holds for days since purchase: 17.2 vs 91.4. Tenure days also separates: churners have been customers for 1,098 days on average vs 592 for non-churners — long-tenured customers are actually more likely to churn in this dataset, possibly from subscription fatigue. The recency_score inverts the login gap so that higher values always mean more engaged — a form that linear models can use directly without needing to learn the inverse relationship themselves.

The Time Series Train/Test Split Rule

Time series data has a hard constraint that no other data type shares: you can never randomly shuffle and split. Random splitting would let training rows from the future leak into the validation of past rows — the model would be evaluated on data it effectively already saw during training.

Wrong — Random Split

Rows from March end up in the training set. Rows from January end up in validation. The model trains on future data and is evaluated on the past. Metrics look great. Production results collapse.

train_test_split(df, shuffle=True) ← NEVER for time series

Correct — Temporal Split

All training rows come before the cutoff date. All validation rows come after. The model is evaluated exactly as it would perform in production — predicting future values from past features.

train = df[df.date < cutoff]
test = df[df.date >= cutoff]

Cyclical Encoding — Why Raw Integers Fail

This is one of the most important concepts in time series feature engineering. The visual below shows why raw integer encoding misleads models on cyclical features:

Month Encoding — Raw Integer vs Sine-Cosine

Month Raw Integer month_sin month_cos Distance to Jan
January 1 0.500 0.866 0.000
June 6 1.000 0.000 5 integers / 0.791 euclidean
November 11 0.866 -0.500 10 integers / 1.391 euclidean
December 12 0.500 0.866 11 integers / 0.034 euclidean

With raw integers, December (12) is 11 units from January (1) — the model thinks they are maximally different. With sine-cosine encoding, December and January have a Euclidean distance of only 0.034 — correctly representing that they are adjacent months on the calendar circle.

Teacher's Note

Tree-based models — XGBoost, LightGBM, Random Forest — don't actually need sine-cosine encoding for cyclical features. They split on thresholds, so they can learn that "month > 11 OR month < 2" captures winter equally well. Sine-cosine encoding matters most for linear models, distance-based models (KNN, SVM with RBF kernel), and neural networks, where the numerical value is interpreted directly. When in doubt, include both the raw integer and the sine-cosine pair — let the model use whichever representation it finds more useful, and feature importance will tell you which one actually got used.

Practice Questions

1. The encoding technique that wraps a cyclical feature like month or day-of-week onto a circle — so that December and January are numerically close — is called ________ ________ encoding.



2. For time series data, train/test splitting must preserve chronological order. This is called a ________ split, and it ensures all training rows precede all validation rows.



3. In the customer churn dataset, which time-based feature showed the largest class separation gap — with non-churners averaging 9 days and churners averaging 68.6 days?



Quiz

1. Why does using raw integers for the month feature mislead linear models in time series tasks?


2. What is the consequence of using a random train/test split on time series data?


3. For which model type is sine-cosine encoding of cyclical features least necessary?


Up Next · Lesson 41

Feature Engineering for Computer Vision

Pixel statistics, colour histograms, edge features, and HOG descriptors — the classical feature engineering techniques that powered image models before deep learning, and still matter today.