Everyone who is familiar with A/B testing knows that not all tests have a clear winner. A bunch of criteria affects the test results. If any of these criteria is not good enough, the test results are considered unreliable. A confidence level is the exact value that identifies which test results are worth trusting and which tests require updating. This is a common term used in statistical hypothesis testing and used by the majority of companies that offers A/B testing, like Google or Amazon.
On our experiment page, we explain confidence as the statistical measurement used to gauge the reliability of an estimate. E.g. a 95% confidence indicates that the results of the experiment will hold true 95 times out of 100. In other words, the confidence level shows whether your test results are statistically important or not.
We won’t bore you with the details, but what we counted and considered are the things which matter when calculating:
In short words: we take the number of visitors driven to each variation and their conversion rates. Then we use this data to calculate errors possibilities for each variation and the measure of the average deviation.
When A/B testing it’s vital to understand that the conversion rate is an estimated value. You never know whether the next person who visits your experiment will decrease or increase the conversion rate. The conversion you see on the dashboard is based on the data we’ve already received. With more visitors driven to the page, the estimated conversion starts approaching the true conversion rate. That’s why it’s vital to drive a sufficient number of users to your experiment.
Conversion rates are used for calculating possibilities of errors. These error values help us make sure that the results are indeed improving conversions and that it’s not just a matter of chance. Let’s assume you have 50 visitors driven to your experiment. If 49 of them installed the control variation, you can be fairly certain that your chance for error is low. However, if 24 users install the control variation and 26 install the testing variation, then it’s more likely that it’s a matter of chance. In this case, you can’t wrap up your experiment.
We also calculate the measure of average deviation to find the chances that one winning alternative performed better than the second winning alternative. Then we take this measure and look at a table of standard normal distribution to find the corresponding confidence interval. That’s how the confidence level is obtained.
3 don’ts when viewing the confidence level:
- Do not change experiment settings in the middle of the test. It can spoil your test and your investment in A/B testing becomes a waste.
- Don’t stop it too early. It helps to allow more factors to impact the testing. Ideally, you should run your test for 10 days.
- The confidence level can vary throughout the A/B experiment. Don’t panic – it’s normal.