**A/A testing** is the tactic of using a testing tool to test two identical variations against each other. Whether it is worth to conduct AA testing and, if so, for **what purposes** are the questions that invite conflicting opinions.

In this post, we explore why some users of testing tools like SplitMetrics practice A/A tests and dwell on the things they **need to keep in mind** while performing this sort of tests.

The **main reasons** for A/A testing are:

**checking the accuracy**of A/B or Multi-Armed Bandit (MAB) testing tools;**determining a baseline conversion**rate for a control variation before beginning an A/B test.

**Using A/A Test for Checking Accuracy of A/B Testing Tool**

As a rule, users run A/A test for **checking the accuracy** of an A/B testing tool. It normally happens when they consider using a **new A/B testing tool **and want to get the proof that the tool is operating correctly.

*Running such test, a user should keep in mind, that AA test is a non-typical scenario for an A/B testing tool, and have a good prior understanding of what to expect from A/B testing platform in such scenarios.*

In an A/B test we **determine sample size** and run the experiment until each variation gets a **determined number of** **visitors**. When using the A/B testing tool to test identical variations, one should do the same. As we described in the post Determining Sample Size in A/B Testing, to calculate sample size for a trustworthy A/B test we need **5 parameters**:

1) the conversion rate value of a control variation (variation A);

2) the minimum difference between variations A and B conversion rates which is to be identified;

3) chosen significance level;

4) chosen statistical power;

5) type of the test: one- or two-tailed test.

For AA test, the above-mentioned **parameters 1, 3, 4 **are to be set in the same way as in the case of A/B tests. The value of the **second parameter** — the minimum difference between variations A and B conversion rates that is to be identified — can be set as a small percentage of variation A conversion rate.

*We recommend the value of the fifth parameter to be set as a **two-tailed test** as both positive and negative conversion rate differences should be considered.*

**What is A/A Testing: Example**

Let’s consider the example. Suppose the conversion rate of the **variation A** is 20% *(CR(A)* *= 0.2). *Assume the resulting difference is **less than 5%** of this value, i.e. less than 0.01 *(0.2 * 0.05 = 0.01)*. It indicates the identity of the tested variations.

One can calculate sample size to **perform an AA test**ing with the significance level of 5%, the statistical power of 80% and **two-tailed test **with help of the free-to-use software Gpower.

Thus, one will have to run the A/A test until each variation gets **25 583 different visitors** (totally 51 166 unique visitors). As seen, a required sample size is large, which makes the test **extremely resource consuming**.

If in the A/A test under consideration a **less than 0.01 difference** is identified, it will confirm the identity of variations and the accuracy of an A/B testing tool.

**Interpreting Results of A/A Test Disproving Identity of Variations**

How should **results be interpreted** if a correctly conducted A/A test does not confirm the identity of the variations?

*When analyzing A/A test results, it is important to keep in mind that finding a difference in conversion rates between identical variations is always a possibility.*

This is not necessarily an evidence of A/B testing tool bad accuracy, as there is always an **element of randomness** when it comes to testing (we explained reasons for randomness in the post on A/B test results analysis).

The **significance level** of an A/B test is the probability of concluding that the conversion rates of variations A and B differ when in fact they are equal (type I error). E.g. a significance level of 5% represents** 1 in 20 chance **that the results of a test are due to random chance.

If we **r****epeat the same A/A test** many times using accurately operating A/B testing tool, the proportion of the results that **confirm the identity** of the variations should be at least as high as a confidence level (at the significance level of 5%, the **confidence level equals to 95%**).

In addition to the above-mentioned **statistical randomness**, there are other reasons why correctly conducted A/A test **does not confirm** the identity of variations.

For example, the reason can be in a **heterogeneity of the target audience**. Suppose A/A test is conducted on the audience of all women, while conversion rates differ for women of **different age groups.**

In this case, if **proportions of different age groups** among visitors differ for two identical variations, and resulting conversion rates are calculated for **all visitors**, then a correctly conducted A/A test using accurately operating A/B testing tool can identify a **significant difference** between two identical variations.

**A/A tests for Checking Accuracy of Bayesian MAB Testing Tools**

Using an A/A test for **checking the accuracy** of a Bayesian Multi-Armed Bandit has certain problems and differs from one A/B testing tool to another. A Bayesian MAB testing tool does not require a **pre-determined sample size**. It calculates the probability to be optimal.

If in a **Bayesian ****Bayesian Multi-Armed Bandit** test with two variations it turns out that they perform about the same, any variation can be chosen.

A Bayesian MAB test will not be run until the **optimal variation is found** (because there are two optimal ones). It will run until it is sure that switching to another variation will not help us very much.

*Thus, at some **iteration in an A/A test**, a winner will be declared, but this does not mean that Bayesian MAB test tool is not reliable.*

**Determining Baseline Conversion for Control Variation before A/B Testing**

As it was mentioned above, one should know a **baseline conversion rate** for a control variation to calculate a sample size for an A/B test. To determine it some users **conduct A/A tests**. Let’s consider an example.

Suppose one is running an A/A test where the control gives **1 003 conversions **out of 10 000 visitors and the identical variation gives **1 007 conversions** out of 10 000 visitors.

The conversion rate for **control is 10.03%**, and that for identical is 10.07%. Then one uses 10.03% — 10.07% as **the conversion rate range** for control variation and conducts an A/B test in the following way.

If in the A/B test there is an **uplift within this range**, one considers that the result is not a significant one.

The approach, mentioned in the example, is **not the correct one**. In the posts A/B test results analysis, we explained the statistics behind A/B testing and showed a correct way to **calculate confidence intervals** and make a conclusion about the results significance of an A/B test.

The **best way** to determine a baseline conversion rate for a control variation is using a **monitoring campaign**, which is an experiment that does not have any variations.

**Afterthought on ****A/A test**

In some cases, it makes sense to **run an A/A test** if users are uncertain about a new A/B testing tool and want **additional proof **that it is operating accurately.

However, it is **hardly worth running A/A tests** to check an accuracy of an A/B testing tool more often than that. We have shown that a correct A/A test is very **resource consuming** — as a minimum difference between variations conversion rates is very small, a calculated sample size will be large.

Besides the** likelihood of an inaccurate** **work** of a testing tool, which is not a new product or a new version, is very small. Therefore, it is advisable for new users to concentrate on verifying **the correct setting of A/B testing tool**.

For a user planning to conduct A/A test for **checking an accuracy** of A/B testing tool, we recommend

- to pre-determine a
**sample size**; - to repeated A/A test
**several times**(in the case of accurately operating A/B tool a proportion of results that confirm the identity of the variations should be at least**as high as a confidence level**); - to take into account
**testing application peculiarities**, e.g. a heterogeneity of the target audience and others.

As for **Bayesian MAB** testing tool, A/A test cannot be used to check the tool accuracy.

*To determine a **baseline conversion rate** before the beginning of A/B test it’s more effective to use a monitoring campaign than A/A test.*

What concerns **correct settings** for valid A/B tests with SplitMetrics, pay close attention to:

**Testing only one hypothesis at a time**. Each alteration to your assets matters and testing multiple changes within one experiment makes identification of a winning element way harder.**The quality of your hypothesis**. It’s useless to run a test when the only difference between variations is the shade of a character’s tie.**Experiments consistency**. It’s better to start with optimizing such elements as screenshots and video previews.**Choosing the right experiment type**. If you rely mainly on organic traffic, opt for category and search A/B tests. If you rely on paid traffic, product page experiments are to become your number one priority. Mind that different experiment types call for a different approach to**banners design**.**Determining optimal****sample size**for your experiment.**Traffic source.**Use only verified traffic channels, thus you’ll get trustworthy results faster.**Smart targeting.**It not only about the right demographic characteristics, you’re to pay attention to such details as, let’s say, correspondent device setting.**The duration of an A/B test.**Run experiments for at least 7-10 days to track users behaviour on all weekdays.**Waiting for the SplitMetrics platform to determine the winner.**Normally it takes 150 installs per variation but each case is highly individual.

With SplitMetrics, you can also launch a **test with one variation**. This feature provides incredible **pre-launch opportunities **– from product ideas validation to nailing the right targeting and identifying the most effective traffic source.

Sure, **A/A tests** can be helpful at times but they tests don’t require a pre-determined sample size and depend on the **specificity of a testing tool** and necessity of parameters being checked. Yet, it’s way more prudent to **invest in top-notch A/B testing tool** like the one SplitMetrics provides.