Restaurant A/b Testing

Launching a New Product Into Your Menu? Try A/B Testing…

Imagine you have a QSR chain and want to introduce a new product to your menu. You would want to test - if introducing this new item boosts your sales or not. This is a common issue and people act in their instinctive way — Going for a trial period. Trying it out in one of their best stores and checking if people are liking the product or not. Continuing the trail for a week or in some cases a month and seeing if people are liking it. If it is boosting the sales, then adding it to the menu or else dropping it. What is the problem with this approach..?

Generally, there are four possible outcomes for this approach:
The product is…
1. Hit in Trial Period and Hit in Long Run
2. Failed in Trail Period and Failed in Long Run
3. Hit in Trail Period and Failed in Long Run
4. Failed in Trail Period and Hit in Long Run

The first two cases are good and we won't be facing any problems with those
but the last two cases either bring loss to the company or reject a possible unicorn. Both the outcomes are not good for business. Hence you have to do something about it.

There are basically two possible reasons for any outcome
1. The outcome of the experiment is actually correct.
2. The outcome of the experiment is due to Random Chance.

The last two cases are because of Random Chance and In this case, A/B testing is your key to avoid Random Chance outcomes.

Often, the Human mind underestimates the scope of natural random behavior. One of the manifestations of this is the failure to anticipate extreme events, or so-called "black swans". Another manifestation is the tendency to misinterpret random events as having patterns of some significance.

In such cases, we can take advantage of A/B testing. An A/B test is an experiment with two groups to establish which of two treatments, products, procedures is superior. Often one of the two treatments is the standard existing treatment or no treatment. And the other one is a new treatment. If a standard (or existing) treatment is used, it is called the control. A typical hypothesis is that treatment is better than control.

Terminology alert:
Treatment: Something to which a subject is exposed.
Control group: A group of subjects exposed to standard treatment.
Treatment group: A group of subjects exposed to a specific treatment.
Randomization: The process of randomly assigning subjected to treatments.
Unit: Each individual in a test is called a Unit. In this scenario, the unit is a store.

Why is A/B testing being used in this case?
The main reason for going with A/B testing is because we are data-poor in this scenario. In the sense, we don't have past data for the Target Variable(Sales impact for the new product). So the Intuitive approach would be going for an experiment with one or two stores. And that's exactly what we are doing here.

In order to conduct a good experiment, we need to split our groups into two comparable groups.
1. Treatment Groups: A treatment group is the collection of units that will be getting the treatment. A Treatment is a Change that you are making. It is the new thing that you are testing. In our scenario, It is the sale of the new item we are adding to the menu.
2. Control Groups: For each treatment group you need a control group that is used as a baseline comparison for the treatment group. In the control group, no treatment is applied. In our experiment, control groups are the stores that do not sell the product.

In order to compare a store that is going to sell the new product to one that is not, we need to make sure that it is as similar as possible. For example, we want stores that have similar sales level, geography, and customer demographics, etc. This is super important because we want to measure the impact of the product is independent of other factors.

How do you know which variables to control?
The main purpose of the control variables is to find a good match between the treatment and control variables so that the comparison being made is a good one. We can use the below steps to find a good match
1. Come up with the list of potential variables
2. Determine whether you have data for those variables
3. Then make sure the connection between the target variable and the control variable is logical
4. Test the correlation between the control and target variables
5. Test correlation between other control variables.

How long to run the test?
An experiment should run at least one cycle for whatever you are trying to observe. In our scenarios, if you know a majority of your customers visit your store once a month, then a month would represent a full cycle. Having a full cycle of data reduces the chance of bias in the responses.

Choosing Experimental Design?
1. If there is very little opportunity to control variables, and where the volume and velocity of the data are high enough that you are not concerned about bias then you can go for Randomised Design. It is a technique where treatment and control units are selected randomly. For example, Selecting a web design for your website out of two or more UI prototypes.
2. If the Volume of observations is low, the concern for bias, and the cost per observation is high then you would want to choose the matched-pair design. It is a technique where treatments are matched with controls on the unit by unit basis using a weighting of identified control variables.

In our Scenario, a matched-pair design would involve taking each treatment store and finding one or more control stores to match it to. The analysis is done on a pair by pair basis and then aggregated to a single result. This unit by unit basis helps eliminate sources of bias and increases confidence in results even with a lower number of total observations.

How to select Treatment and Control Stores?
In Practice, It is very common Treatment stores are already chosen for the analyst. But if you have to make a decision then it is best if you identify the outliers and remove them. In this scenario, stores having the highest and lowest sales and less number of stores in a city would be considered as outliers. It is best to remove those steps in the very first step. Next-best thing to do would be to decide the number of treatment units. And the last step would be selecting those randomly from the remaining stores.
Even more important than selecting a treatment store would be selecting the control store. These stores would be the baseline for the analysis of the experiment. As mentioned before, the process for selecting treatment and control stores is known as matched pairing. Matched pairing is where each treatment store is matched to a control store that has very similar measures for each of the control variables.
In our Scenario, we chose three control variables — sales volume, number of products sold, and state where the store is located. In the matched pairing process, we would want to select a control store that has close to the same sales volume, close to the same number of products sold, and is in the same state.

There are many different algorithms for finding matched pairs, with different levels of computational complexity. Generally Speaking, These algorithms take all of the numeric control variables and calculate a single value that determines how far the treatment unit is from each control unit. This is usually referred to as a distance score. You can find this distance using a KDTree. Here you can make use of the scikit-learn library

          from sklearn.neighbors import KDTree

When it is possible to do so, an experiment can be designed with more than one control unit for each treatment unit. Having more than one control unit for a treatment unit increases the confidence in the baseline and makes for a better comparison. In general, there is no hard rule of how many control units there should be for each treatment unit. As a data scientist, you would want to look at the statistical differences, with a different number of control matches and look to see when the distance increases significantly.

Now you have treatment — control groups, control variables, and duration of the experiment. This is when the experiment is conducted and the results are analyzed. Once we get the sales output from the stores after the experiment, we go through the below steps

Cleanup data and match up stores
Calculate the average sales for the entire comparable period
average sales = (current_week_sales — avg_sales_comp) / avg_sales_comp
Calculate growth for each week for each store against the average sales in the comparable period.
Average the growth by the testing and comparable period.
Create the Store list for significance testing.
Match the store list with their respective average growth by period
Take the difference in growth between test and comparable period for each store.
Test for significance of the growth difference using a t-test
T.TEST(range_of_test_group,range_of_control_group,2,3)
Calculate lift
Lift = (Growth_Diff_Treatment_Store — Growth_Diff_Control_Store) / (1 + Growth_Diff_Control_Store)
Calculate the lift impact of the growth difference between each treatment store and its respective control store

By Calculating the lift impact, A/B testing helps you understand whether the new product that you are introducing into the market is really a hit or not by eliminating the possibility of Random chance.

Happy reading…

Follow me on Linkedin