11  Feature Engineering

When working with business data, we often have a database that stores transactions. We will have tables that store credit card transactions, tables that store paycheck transactions, key card tables that store build access transactions, and account tables that store logins or purchase transactions. These transaction tables are usually not at the right level of detail (often called ‘unit of analysis’) for any machine learning or statistical model inference.

To build our predictive models, these transaction tables need to be munged into the correct space, time, and variable aggregation. Here are the questions we must work through based on our modeling needs.

How would we answer the above questions if we wanted to predict the next US county where a temple for the Church of Jesus Christ of Latter-day Saints would be built? First, what tables do we have?

  1. patterns: monthly traffic patterns for each place.
  2. places: Key information about each place.
  3. censustract_pkmap: The mapping of places keys to census tract based on placekeys in places.
  4. tract_table: The table that maps tract keys to state and county.
  5. idaho_temples.csv: A table of temples with a key that maps to tract.

11.1 Spatial Unit of interest derivation

The tract_table has the county data within it. We can leverage this table to get our unit of analysis table.

11.2 Label creation

The idaho_temples.csv has our path to a label. We want to create a 0/1 label for has not/has a temple for each county.

11.3 Feature Creation

We want a clean feature that focuses on chapels for the Church of Jesus Christ of Latter-day Saints. How can we get a list of placekeys that are LDS chapels? Some regular expressions on the name will be helpful. What about day of the week attendance?

What can we do to check the quality of the data.

11.4 Temporal Considerations

This data runs over COVID-19 as we have 2019 through 2022.

11.5 Mapping back to my unit of analysis