This post is a continuation on our last blog about understanding the accuracy of your ML Pipelines. If you haven’t read that yet, definitely start there! In this post, we will talk about how to A/B Test our ML Models.
Just because our predictions are accurate, doesn’t mean we’ll get better results for our business because of it. Instead, we need to make sure the way we implement these predictions actually leads to better outcomes. And the best way to do this is through A/B Testing.
Can Machine Learning Improve My Business Results?
The first step is to become comfortable with your ML Model and know that it’s producing accurate predictions (as described in the first post). The next step is to start applying those predictions to your business. And much like any change being made, it’s often beneficial to A/B Test the application of your predictions. This is to ensure that it’s adding more value to your business than the previous status quo.
Setting up an A/B Test for ML
So how should we go about A/B Testing these predictions? Over the years, Vidora has worked on this question with many customers. We have now come up with a few best practices to ensure a proper test.
- When setting up an A/B Test, it’s always important to start with a random split of your customers. So first get all customers available to be included in the test, and randomly split that group of users into 2.
- Once we have our two randomly split groups, we determine what experience to give each group.
- Run the test. When running and A/B Test, it’s always important to run it long enough, and collect enough data, that you have statistically significant results. This usually means collecting enough data that you’re 95% certain the results are not a false positive, meaning 19 out of 20 times you’re confident in data being significant.
- Measure the results for all of Group A vs Group B. For example, if we are looking to increase the number of purchases from an email, we would measure total purchases for each group. If instead we wanted to measure retention based on a churn prevention campaign, we would measure retention rate for each group. This will show us if using our predictions actually helps move the needle on our business results.
Examples of ML A/B Tests
In order to better understand the process for A/B Testing Machine Learning predictions, we’ll discuss two specific tests, and their setup, below.
A/B Testing ML Recommendations/Personalization
The first A/B Test we will cover is testing whether or not adding ML Recommendations can improve email marketing. For this test, every customer will receive an email, and we will test if a Recommended Products section results in more revenue than a generic Products section.
Following the steps outlined above, here’s how we can go about this test:
- Split the entire customer base randomly into 2 groups. This can be done through A/B Testing software if you have it, or through other means by looking at the User IDs and create 1 group of Odd IDs and 1 group of Even IDs. Regardless of how we do the split, we want to make sure it’s done randomly without bias in choosing users for a specific group.
- For this test, we have two distinct experiences, so Group 1 will get the generic Product section (which is the current status quo) and Group 2 will get the new Recommended Products section (which is new functionality we’re testing).
- Send the emails to the two groups, and wait until we have enough data for analysis.
- Measure the revenue generated by this email for Group 1 vs Group 2, ensuring the results are statistically significant.
A/B Testing ML Targeting/Segmentation
Following up on the last A/B Test, let’s assume our test was run successfully and we found that including personalized recommendations did in fact lead to more purchases. Another test we may want to run is to determine what segment of our audience should get that email in the first place. For this test, instead of measuring total purchases, we may want to test if a more granular audience results in a high click-through rate on the email.
Here’s the setup for this type of test:
- Split the entire customer base randomly into 2 groups. Even if everyone won’t get the email, we still want to randomly split our audience first.
- For this test, we’ll still be sending the personalized recommendation to both groups, but what will change between each group is the targeting of that email. For Group 1, we’ll use the same targeting that we were using in the past, but for Group 2, we’ll target only those people our ML Model indicated had a high likelihood of purchasing.
- Send the emails to the two groups, and wait until we have enough data for analysis.
- Measure the click-through rate generated by this email for Group 1 vs Group 2. Given that we are testing segmentation, we may also want to segment our results to measure more granularly the results. And while this is an acceptable way to measure results, it does require you to be sure you’ve captured enough data to do this segmentation. The fewer customers you are analyzing, the less data you will have for analysis. So always make sure your results are statistically significant for the customers you’re analyzing.
Summary
Just because we are running ML Pipelines and making predictions, this doesn’t automatically improve the performance of our business. So to ensure we’re utilizing these predictions in the best way, A/B Testing is the best way to go. We work with many of our customers to help setup tests to prove the benefits of Machine Learning. Get in contact with our team to find out how ML can benefit your business!