How Can We Help?
Feature Engineering
Cortex automatically connects every step of the Machine Learning process into end-to-end Machine Learning Pipelines that anyone in your business can run. In this guide, we’ll discuss the third step of a Cortex pipeline: feature engineering.
What is feature engineering?
Feature engineering is the process of transforming raw data into features that your pipeline will use to learn. A feature is simply a way to quantify something about your objects (e.g. users). For example –
- How many total clicks has each user recorded over the last 7 days?
- What percent of each user’s sessions over the last 30 days have included a transaction?
- How many different devices has each user used to log in over the last 14 days?
- Etc.
There are limitless features that could be built from a stream of raw event data. The task of Cortex’s feature engineering step is to identify and build the ones that will be most predictive of the goal that your pipeline is optimizing for (e.g. probability of purchasing in the future).
Learn more about feature engineering in this blog post.
Why does it matter?
Features are the tools that your pipeline uses to learn and make predictions. Your pipeline will learn to emphasize important features, and to ignore irrelevant ones. But even with the most sophisticated algorithms, a pipeline is often only as good as its features are predictive. If the inputs aren’t relevant indicators of what you’re looking to predict, your pipeline won’t find any patterns that are useful for making predictions.
“At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used.”
– Professor Pedro Domingos
Which feature engineering techniques does Cortex use?
Cortex uses a variety of feature engineering techniques to ensure that your pipeline is training on the most predictive inputs possible.
Most of your pipeline’s features are engineered by combining various functions, filters, windows, and transformations. Some examples of each are listed below.
Functions | Filters | Windows | Transformations |
Sum Average Count Mode Count unique Elapsed time Temporal incidence Sequence … |
Column filters Value filters Inequality filters Chained AND/ORs … |
1-day 3-day 7-day 14-day 28-day 56-day 84-day … |
Log Delta Percent Exponentiation … |
Below are a few examples of how features might be built in this way. Note that which features get generated depends on the type of data being ingested into your Cortex account.
Feature | Function | Filters | Window | Transformation |
Total shoes purchases over the last 56 days | Count | event_type = purchase category = shoes |
Days = 56 | N/A |
Unique categories clicked over the last 28 days | Count unique | event_type = click | Days = 28 | N/A |
Total amount spent over the last 7 days compared to the previous 7 days | Sum (price) | event_type = purchase | Days = 17 | Delta |
This allows Cortex to build a wide variety of features that speak to behavioral patterns such as frequency, recency, breadth, sequences, and more. See below for examples of each of these.
Frequency
- Total number of purchase events on category “shoes” over the last 7 days
- Total price across all the user’s purchase events over the last 14 days
- Number of days over the last 28 where the user registered at least 1 click event
Recency
- Number of days since the user last recorded a pageview event
- Number of days since the user last purchased an item for > $50
Breadth
- Number of unique event types the user has completed over the last 7 days
- Number of unique categories the user has added to cart over the last 56 days
- Average price of items purchased by the user over the past 84 days
Sequences
- Whether the user has completed the event sequence “click on email” → “click to site” → “purchase on site” within the last 14 days
In addition to Cortex’s automated feature engineering techniques, you may also define custom features based on your business intuition, and easily add these features to any of your Cortex ML pipelines.
Related Links
- What is a Machine Learning Pipeline?
- Data Preprocessing
- Data Cleaning
- Model Selection
- Prediction Generation
- Adding Custom Features to a Pipeline
Still have questions? Reach out to support@mparticle.com for more info!