Feature Engineering

Cortex automatically connects every step of the Machine Learning process into end-to-end Machine Learning Pipelines that anyone in your business can run. In this guide, we’ll discuss the third step of a Cortex pipeline: feature engineering.

What is feature engineering?

Feature engineering is the process of transforming raw data into features that your pipeline will use to learn. A feature is simply a way to quantify something about your objects (e.g. users). For example –

How many total clicks has each user recorded over the last 7 days?
What percent of each user’s sessions over the last 30 days have included a transaction?
How many different devices has each user used to log in over the last 14 days?
Etc.

There are limitless features that could be built from a stream of raw event data. The task of Cortex’s feature engineering step is to identify and build the ones that will be most predictive of the goal that your pipeline is optimizing for (e.g. probability of purchasing in the future).

Learn more about feature engineering in this blog post.

Why does it matter?

Features are the tools that your pipeline uses to learn and make predictions. Your pipeline will learn to emphasize important features, and to ignore irrelevant ones. But even with the most sophisticated algorithms, a pipeline is often only as good as its features are predictive. If the inputs aren’t relevant indicators of what you’re looking to predict, your pipeline won’t find any patterns that are useful for making predictions.

“At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used.”

– Professor Pedro Domingos

Which feature engineering techniques does Cortex use?

Cortex uses a variety of feature engineering techniques to ensure that your pipeline is training on the most predictive inputs possible.

Most of your pipeline’s features are engineered by combining various functions, filters, windows, and transformations. Some examples of each are listed below.

Functions	Filters	Windows	Transformations
Sum Average Count Mode Count unique Elapsed time Temporal incidence Sequence …	Column filters Value filters Inequality filters Chained AND/ORs …	1-day 3-day 7-day 14-day 28-day 56-day 84-day …	Log Delta Percent Exponentiation …

Below are a few examples of how features might be built in this way. Note that which features get generated depends on the type of data being ingested into your Cortex account.

Feature	Function	Filters	Window	Transformation
Total shoes purchases over the last 56 days	Count	event_type = purchase category = shoes	Days = 56	N/A
Unique categories clicked over the last 28 days	Count unique	event_type = click	Days = 28	N/A
Total amount spent over the last 7 days compared to the previous 7 days	Sum (price)	event_type = purchase	Days = 17	Delta

This allows Cortex to build a wide variety of features that speak to behavioral patterns such as frequency, recency, breadth, sequences, and more. See below for examples of each of these.

Frequency

Total number of purchase events on category “shoes” over the last 7 days
Total price across all the user’s purchase events over the last 14 days
Number of days over the last 28 where the user registered at least 1 click event

Recency

Number of days since the user last recorded a pageview event
Number of days since the user last purchased an item for > $50

Breadth

Number of unique event types the user has completed over the last 7 days
Number of unique categories the user has added to cart over the last 56 days
Average price of items purchased by the user over the past 84 days

Sequences

Whether the user has completed the event sequence “click on email” → “click to site” → “purchase on site” within the last 14 days

In addition to Cortex’s automated feature engineering techniques, you may also define custom features based on your business intuition, and easily add these features to any of your Cortex ML pipelines.

Feature Engineering

How Can We Help?

Feature Engineering

What is feature engineering?

Why does it matter?

Which feature engineering techniques does Cortex use?

Related Links