In the course of an A/B experiment, the correct **calculation of a sample size **is one of the key success ingredients. Yet, sometimes the amounts of traffic necessary for statistical significance of tests put app publishers off. Indeed, a required sample size can be large that means a **test lasts longer** than you’d like.

However, this **obstruction is not that dramatic** if you run your A/B tests with help of SplitMetrics. The thing is the platform can apply an alternative approach called **Bayesian Multi-armed Bandit **(MAB), which can solve the above-mentioned drawback without even bothering you.

A Bayesian Multi-armed Bandit test allows **choosing an optimal variation** of the two or more. Unlike a classic A/B test, which is based on statistical hypotheses testing, a Bayesian MAB test proceeds from **Bayesian statistics**. In this post, we’ll learn more about the principles behind it.

**Why “Multi-Armed Bandit”?**

Our problem of choosing an optimal variation **can be compared** with so-called “Multi-armed bandit problem”.

Imagine, you are in a casino. There are many different slot machines, so-called **‘one-armed bandits’**, as they are known for robbing you. Let’s think that some slot machines payout more frequently than others.

If you knew beforehand which machine had the highest payoff, you would play that one exclusively to **maximize your expected return** from playing. The difficulty is that you do not know beforehand which machine is the best. The **resources are limited** – if you pull one lever (arm), then you are not pulling another arm.

The goal is to walk out of the casino with as much money as possible. The problem is, how do you learn which slot machine is the best and get the most money in the shortest amount of time?

The straightforward analogy is to imagine different **app store product page variations** as a row of slot machines, each with its own probability of producing a conversion.

**Difference between A/B/n and Bayesian Multi-Armed Bandit Tests**

In a **traditional** **A/B/n test**, we determine a sample size and run the test until each variation gets a determined number of visitors. When our test is over, we use its results to **choose a winning variation** and upload it to the store. So in the course of our tests, we learn or explore and then – earn or exploit.

Let’s consider that MSQRD decided to check if a **changed order of screenshots** (Variation B) favors better conversion rate. If conversion rate of the variation A (CR(A)) is **20% **and the expected conversion rate of variation B (CR(B)) is **26%**, then the sample size for an A/B test to check the statistical significance at the significance level of 5% and with 80% statistical power is **608 visitors per variation**.

*You can check out all calculations behind it in our post on determining **sample size for A/B testing**.*

If the conversion rate of variation B is really 26%, we’ll get **279 conversions** during the test with 608 visitors per variation:

*1216 * 0.5 * (CR(A) + CR(B)) = 608 * (0.2 + 0.26) = 279.68.*

If all **1216 visitors** (*608 * 2*) were directed to the best variation B, then it would be **316 conversions** (*1216 *0.26 = 316.16*).

The difference between the **actual conversions** in the course of your experiment and the conversions you would get had you send every visitor to the optimal variation is called a **regret**.

In our A/B test example, the regret is.

The main idea behind a Multi-Armed Bandit test is to balance between:

- using the current best guess about the probability of each variation being optimal
- collecting more data to improve upon these guesses.

This tradeoff between** using currently available information **and** collecting more data** is often referred to as explore/exploit problem or earning while learning.

There are many algorithms for Multi-armed Bandit tests, and they all aim at balancing exploration with exploitation.

In this post, we consider **Bayesian Multi-armed Bandit **(another name for this algorithm is Thompson Sampling or Randomized Probability Matching). In this algorithm, a variation that appears to **perform better** (give more conversions) gets more visitors, and a variation with worse performance gets fewer visitors.

Imagine, we will perform a Bayesian Multi-armed Bandit test to **choose between variations** A and B for MSQRD. Suppose our Bayesian MAB test will last as long as the A/B test considered above but the variation B will get **80%** and A – **20%** of visitors. We will get **301 conversions** as a result of the test with **1216 visitors**:

*1216 * (0.2 * CR(A) + 0.8 * CR(B)) = 1216 * (0.2 * 0.2 + 0.8 * 0.26) = 301.568.*

In our example of Bayesian MAB test, the regret is **15 conversions** and it is **2.5 times less** than in the A/B test.

Thus, if we treat tests aimed at choosing an optimal variation as multi-armed bandits, the cost of testing can be reduced dramatically.

In the post “Multi-armed bandit experiments”, the authors describe an experiment to show that Bayesian MAB requires a **smaller sample** than an A/B test, i.e. shorter duration of testing.

They compared A/B and Bayesian Multi-Armed Bandit tests to choose between variations A and B with **expected conversion** rates** 4%** and **5%** respectively. The sample size, calculated for the A/B test with 95% confidence level, was **11165 visitors per variation**.

In case of 100 visitors per day, the test would take **223 days**. They repeated Bayesian MAB test** 500 times** and got the following results. The correct variation was found in **96.4% of tests** (482 of 500), which is about the same error rate as in an A/B test.

There were a few tests where Bayesian Multi-Armed Bandit actually took longer than an A/B test, but only in **about 1% of the cases** (5 out of 500). The average time saved relative to A/B tests was **171.8 days**, and the average number of **conversions saved was 98.7**.

**Bayesian ****Multi-Armed Bandit**** Algorithm**

The **math behind the multi-armed bandit** problem is so hard that in practice different approximate heuristic solutions are used. Bayesian MAB is such a heuristic and rather a popular one (e.g. it’s the choice of Google Analytics).

The **detailed rationale and description** of Bayesian Multi-Armed Bandit technique is given in the article “A modern Bayesian look at the multi-armed bandit”. However, let’s review the **main aspects of the algorithm** behind Bayesian MAB.

**Bayes’ theorem states **that for two events Y and Z, the conditional probability of Y given Z is the conditional probability of Z given Y scaled by the relative probability of Y compared to Z: PZ=PY*P(Y)P(Z)

How is Bayes’ theorem applicable to **choosing an optimal variation** in a test?

Let *y = (y**1**,…,y**n*** )** denote the

**sequence of conversions**observed in a test for some variation after

**visitors.**

*n*

*y***takes the value**

*i***1**under the condition of conversion for visitor

**and**

*i***0**otherwise.

Let ** θ** denote the

**unknown probability of conversion**(a conversion rate) for a variation. Then, a probability of conversion for a variation based on the test results for

**visitors (posterior)**

*n***P(y)**can be calculated using the Bayes’ theorem:

P(y)=P*P(θ)P(y)

**P(y|θ)**– a likelihood of test results y in case of θ;**P(θ)**– beliefs about θ before testing (prior);**P(y)**– a constant (as it does not contain θ) and can be ignored.

Let denote ** µ(θ)** a conversion rate for a variation observed in the test. Suppose we test

**variations. Then a probability for the variation**

*k**j*to be optimal

*ω*

*j*

*=*

*P(µ*

*j*

*= max{ µ*

*1*

*,…, µ*

*k***can be calculated from the formula mentioned above (1). Bayesian MAB allocates visitors to a variation**

*})**j*in proportion to

*ω*

*j***.**

*(j changes from 1 to k)*The computation of **ωj **from formula 1 is rather difficult, but by simulation it’s easy. For each variation, *θ* can be simulated as independent random variable from beta-distribution ** Beta(α, β)**, where

α=0+i=1nyi

β=0+n-i=1nyi

Here **α(0), β(0)** are priori values for **α and β** (set equal to 1, as a rule).

For ** k** variations, one should simulate a large table containing samples of

**from the relevant beta distributions, where the columns represent**

*θ*

*k***variations**. A Monte Carlo estimate of the probability for a variation to be optimal is the empirical fraction of rows for which considered variation had the largest simulated value.

Let’s consider an example. Suppose in the test for **variations A, B, C** with the number of visitors 20, 30, 40 we received 12, 20, 30 **conversions respectively**. Then to simulate ** θ** for variations A, B and C we use distributions Beta(13, 9), Beta(21, 11) and Beta(31, 11) respectively. Below there is a fragment of the table, consisting of

**samples for each variation**.

Since the **maximum value** in the first row of the table above (0.74) is in the column corresponding to the variant C, then variant C is assigned to the first row. Analogically, each row is put into **correspondence with the variation**.

To **calculate the probability** of being optimal for variation A, one needs to divide the number of rows, variation A was assigned to, by the **total number of rows**. Probabilities to be optimal for variations B and C are calculated similarly.

Thus, we start the first day of the **test with a number of visitors per variation** in a proportion to our prior beliefs about variations probabilities to be optimal.

Then periodically, say once per day, we **recalculate the probability **of each variation to be optimal. That gives us proportions of visitors for the next day. We repeat this process until a **set of stopping rules** is satisfied.

**Google Analytics**, for example, by default runs a Bayesian Multi-Armed Bandit test for at least two weeks. Then they keep track of variations probability to be **optimal and potential value** remaining in the test. They stop the test when they have a variation with probability to be **optimal 95%** (a champion) and a potential value remaining in the test is **less than 1% **of the champion’s conversion rate.

**Potential value remaining **in the test is calculated as the 95th percentile of the distribution of **(***θ**max *** – θ*) / θ***, where

*θ***is the largest value of**

*max***in a row, and**

*θ*

*θ******is the value of

**for the variation that is most likely to be optimal.**

*θ*Suppose **calculations of probabilities** to be optimal for variants A, B, and C based on the example presented in table 1, gave the values 0.09, 0.2 and 0.71, respectively. The p**robability of being optimal** for C 0.71 means the value of switching away from variant C is zero in **71% **of the cases. Let’s calculate the remaining potential value. The values *(θ**max*** -θ*) / θ*** are calculated in the fourth column in the table below.

The** histogram for the values **from the fourth column of table 2 is shown in the left part of the figure below. The **95th percentile** of the value distribution is the “potential value remaining” in the test, which in this case works out to be about 0.16. We interpret this number as “We’re still unsure about conversion rate for variation C, but whatever it is, variation A or B might beat it by **as much as 16%**.”

Suppose **we continued the test **for some time and the total number of visitors for each variation increased 5 times, i.e. reached 100, 150 and 200 for A, B and C. At the same time, the **total number of conversions increased** to 60, 100 and 150, respectively.

After calculating the **new α and β parameters** for the three beta distributions, constructing the samples and computing the **optimal variant probabilities**, we obtained that the optimal probability for variation C was 0.95. This means that the value of switching to variant A or B is **zero in 95% of the cases**, and, therefore, the 95th percentile of the **value-remaining distribution** is zero (right side of the figure above).

**When to Use Multi-Armed Bandit Instead of A/B/n Tests?**

It’s worth checking a great post discussing **this issue**: “When to Run Bandit Tests Instead of A/B/n Tests”. Generally speaking, specialists recommend using Multi-Armed Bandit when you **care about optimization**, rather than understanding. Specifically, MAB algorithms tend to work well for **short tests** (when you have a small amount of time) – and paradoxically –long (ongoing) tests.

**Short Tests**

For instance, headlines are the **perfect use case for Multi-Armed Bandit **tests as news has a short lifetime and MAB tests can determine **quickly** which headline is better.

The same is with **short-term campaigns**. If a campaign is one week long (for example, if you prepare your app store page for Holidays), you don’t want to spend the week for pure exploration with visitors evenly distributed between variations, because once you learn anything, it’s** too late to exploit the best variation**.

**Long Tests**

Multi-Armed Bandit algorithms can be** easily automated**, that is why they are effective for constantly evolving tests. For example, if we need to determine the best order to display the top **5 news articles** on our site regularly, it’s convenient to automate this task with a MAB test.

Another long-term use of Multi-Armed Bandit algorithms is **targeting**. Some types of users may be more common than others. Contextual MAB can take advantage of this, by applying the **learned targeting rules sooner** for more common users, while continuing to learn (test) on the rules for the less common user types.

When it comes to **mobile A/B testing**, Multi-Armed Bandit may become a real saver of marketing budget and time. If you launch your experiments with **SplitMetrics A/B testing platform**, it’s possible to activate MAB, leave all concerns regarding formidable sample size behind and enjoy **statistically significant results** without extra expenses.