12 Temporal Features

Feature engineering with data that requires predictions into the future from a fixed moment that leverages all the present and past known within the data creates complicated wrangling problems. As we move through our historical data to create training data that respects the division between past, present, and future we face a unique challenge that our features and target value can often be identical measures only differentiated by our defintion of the present or NOW.

Additionally, time series data also introduces complexities with highly correlated observations. When dealing with time series data, we need to account for the temporal dependencies and autocorrelation present in the data. Our feature engineering often requires creating lag features, rolling statistics, and other time-based features to capture the underlying patterns and trends.

Lag features are simply the values of a variable at previous time steps. For example, if we are predicting sales, we might include the sales from the previous day, week, or month as features. Another approach is to calculate rolling statistics, such as the rolling mean or standard deviation, to capture the trends and variability over time.

We need to be mindful of the potential for data leakage, where information from the future is used to predict the past. This can lead to overly optimistic performance estimates and poor generalization to new data. To avoid this, we should ensure that our feature engineering and model evaluation processes respect the temporal order of the data.

12.1 Exploring spatial munging

Alakh Sethi has a fun post on Feature engineering that highlights some great thoughts about temporal data and feature engineering.

Peeking (into the future) is the “original sin” of feature engineering. It refers to using information about the future (or information which would not yet be known by us) to engineer a piece of data. This can be obvious, like using next_12_months_returns. However, it’s most often quite subtle, like using the mean or standard deviation across the full-time period to normalize data points (which implicitly leaks future information into our features). The test is whether you would be able to get the exact same value if you were calculating the data point at that point in time rather than today.

Be honest about what you would have known at the time, not just what had happened at the time. For instance, short borrowing data is reported by exchanges with a considerable time lag. You would want to stamp the feature with the date on which you would have known it.

12.2 Example Data

We will use some data about grocery stores in South America. The data includes daily transactions, store metadata, and promotional information. This dataset will help us understand how different factors influence sales and transactions at various stores. It is based on a Kaggle competion¹

Let’s take a look at the datasets we have:

transactions_daily.parquet: Contains daily sales data for each store and product family.
transactions_store.parquet: Contains the total number of transactions per day for each store.
stores.parquet: Contains metadata about each store, including location and type.

We will use these datasets to demonstrate feature engineering techniques for building predictive models.

12.2.1 transactions_daily.parquet

The primary data, comprising time series of features store_nbr, family, and onpromotion as well as the target sales.

store_nbr: identifies the store at which the products are sold.
family: identifies the type of product sold.
sales: gives the total sales for a product family at a particular store at a given date. Fractional values are possible since products can be sold in fractional units (1.5 kg of cheese, for instance, as opposed to 1 bag of chips).
onpromotion: gives the total number of items in a product family that were being promoted at a store at a given date.

12.2.2 transactions_store.parquet

The total transactions by day for each store.

store_nbr: The store number.
transactions: The total number of transactions for that day.

12.2.3 stores.parquet

Store metadata, including city, state, type, and cluster.

cluster: is a grouping of similar stores.

12.2.4 Data downloads

12.2.5 Data Munging Challenge

Let’s explore how to build temporal features with a walk through some feature engineering using the grocery store data.

Pourya’s article in Toward’s Data Science provides some simple examples of timeseries feature engineering that we will use with our data.

We can start with our transactions_daily.parquet data.

import polars as pl
from lets_plot import *
LetsPlot.setup_html()
daily = pl.read_parquet("transactions_daily.parquet")

Starting with the date column, let’s move our daily transactions to weekly summaries for each store and product family with the following new columns.

week_of_year
year
min_date in the week of year by year.
max_date in the week of year by year.
total_sales for that week.
prediction_day or the day we define as the NOW that devides historical knowledge and future ‘unkowns’.

After we create our weekly data, we can answer the following questions.

What day do weeks of the year start on?
What is wrong with the date 2013-12-31?
What is wrong with the date 2017-01-01?
Can you fix this issue with weeks and years?

Ok. Now that we have created our proper weekly data,

Can you create the following table for 2017 SEAFOOD?

store_nbr	family	week_of_year	year	prediction_day	min_date	max_date	num_days_in_week	1_week_behind_sales	1_week_ahead_sales	2_week_ahead_sales
54	SEAFOOD	45	2016	2016-11-06	2016-11-07	2016-11-13	7	9	3	19
54	SEAFOOD	46	2016	2016-11-13	2016-11-14	2016-11-20	7	3	19	5
54	SEAFOOD	47	2016	2016-11-20	2016-11-21	2016-11-27	7	19	5	8
54	SEAFOOD	48	2016	2016-11-27	2016-11-28	2016-12-04	7	5	8	10
54	SEAFOOD	49	2016	2016-12-04	2016-12-05	2016-12-11	7	8	10	10
54	SEAFOOD	50	2016	2016-12-11	2016-12-12	2016-12-18	7	10	10	6
54	SEAFOOD	51	2016	2016-12-18	2016-12-19	2016-12-24	6	10	6	5
54	SEAFOOD	52	2016	2016-12-25	2016-12-26	2016-12-31	6	6	5	4
54	SEAFOOD	53	2016	2015-12-31	2016-01-01	2016-01-03	3	5	4	12

Notice how time is moving along columns and rows. As we move down rows we are moving into the future. In the columns, we would have one column (in our example we have two potential columns) that is in the future which would be used as the foundation (or the actual) target (y) variable. The remaining time-based columns would be summaries of intervals or points in the history of the point of estimation.

Which column would be our y value or target if we were predicting what we will sell in the next seven days?
How do the three date columns (prediction_day, min_date, max_date) relate to our target variable.
Can you build a new y that is the lagged difference? The column would be the difference in sales from the week we just completed to the sales seven days in the future.
Can you build a new feature that is the difference in percent of sales for the year comparing the most recently completed week with it’s sister week in the previous year?
How correlated is this new feature with our target?

Try before you pry

Explore the example Polars script

12.3 Exploring spatial modeling

Alexis Cook, DanB, inversion, and Ryan Holbrook. Store Sales - Time Series Forecasting. https://kaggle.com/competitions/store-sales-time-series-forecasting, 2021. Kaggle.↩︎