, ,  — 7 Oct 2025

How to Design & Run Valid A/B tests: the SplitMetrics Framework

Gabriel Kuriata

For mobile apps, a valid A/B test is a data-driven experiment that ensures the measured difference between the control (A) and the variation (B) is genuinely caused by the change you made, and not by chance, bias, or external factors. Validity in A/B testing for mobile is achieved through adhering to strict statistical and methodological criteria and designing tests properly.

Statistical and methodological criteria are easily maintained with the help of proper A/B testing tools, such as SplitMetrics Optimize, our A/B testing platform for app product pages on mobile app stores. Luckily, choosing the correct creative elements to test is fairly straightforward, provided we follow a proper framework.

This article elaborates on the official A/B testing & validation framework developed by SplitMetrics, enabling users to utilize our platform to its fullest potential and design tests that have the highest chance of providing meaningful, actionable results. If not in the form of new winning variations, then in lessons learned that will point our design in the right direction.

Why should you A/B test your app?

Scale determines how hard a failure hits. Today, with millions of apps on the market and thousands of new ones joining the crowd every day, any form of success requires significant investment. Not only in development but also in marketing. With marketing budgets for casual games frequently being in the seven-figure-a-month category, a failure hits their publishers like a freight train.

Good apps with great features aren’t immune to failure. Great apps can underperform because of their product pages. Customer expectations change. Entire genres come, go, and return from beyond to app stores. Jumping on the gravy train is much more difficult than it sounds. Are match-3 games all the rage now, or are they being replaced by match-4 games? Sometimes it’s difficult to tell.

How to Design & Run Valid A/B tests: the SplitMetrics Framework

With a great scale of operations comes great responsibility. For big projects testing is ingrained in the development process and ubiquitous among successful mobile app developers. Many factors contribute to the app’s potential for profit – or loss and testing provides considerable control over many of them. It can directly influence installation, drop off and engagement rates. It also has an implicit impact on lifetime value, revenue and in-app purchases. For smaller projects testing can also be a life-saver as strained budgets leave less room for error. The most important point is: A/B testing can be outstandingly beneficial to each and all developers and publishers, but it’s a case by case analysis. A good framework and experience helps to maximize our return on this investment. This is why we’re always ready to help and assist in many projects.

Natallie Tishkevich, Client Success Manager at SplitMetrics

The official SplitMetrics testing and validation framework

So, if the investment in testing your app looks sensible to you, let’s get down to the subject matter.

Our official testing and validation framework consists of eight steps. It doesn’t matter what stage of development your app is; the framework remains the same:

  1. Research: Analyze your competitors, market, and audience to gain references and benchmarks for ideation and subsequent steps.
  2. Ideate: Gather ideas for features and all aspects of your app’s product page that will be subject to testing and evaluation during this process.
  3. Form a hypothesis: Build hypotheses around your ideas.
  4. Select variations: Prepare variants to be compared and tested.
  5. Design an experiment: Create and execute a series of tests using your test variations.
  6. Analyze results: Analyze your results.
  7. Share your findings: Share relevant data with stakeholders and decision makers.
  8. Go back to stage 3: Continue your efforts and optimize further!

We rigorously follow this framework for apps at all stages of development, simply shifting our focus to the elements that matter during each stage. The focus of SplitMetrics Optimize and what we do in the Agency is testing & validation of mobile app product pages on app stores. Given that, our framework serves different purposes depending on the stage of development an app is in:

  1. Pre-release: for apps in their conceptual phase, we focus on key features, messaging, and artistic direction through the process of pretotyping. Utilizing simulated product pages on app stores and ad campaigns, publishers can gather early feedback for their app.
  2. Pre-launch: fine-tuning your app store presence to ensure a successful launch, with a focus on perfecting your App Store presence.
  3. Post-launch optimization: ensuring the project has “long legs” with a focus on fine-tuning and continuously evolving your product pages.

To answer a frequently asked question in advance: yes, it’s possible to conduct meaningful experiments and tests even when your app is nothing more than some ideas on paper (or rather, concept screens). We have the tools in SplitMetrics Optimize to test that, and they’re used more frequently than you might think!

Anyway, in this article, we give examples of successful validations for apps in all the mentioned stages of development, so let’s get right into it.

Step 1: Research

Your app will have to beat thousands of other apps to… win a couple of seconds of attention from a user. In that time, you’ll have to convince them that your app is worth installing. The research step is a solid foundation to answering the question of “How to hell do we do that?”, provided it consists of these elements:

  • Market research: you look at a category & subcategory most relevant to your app and see how tough it is on a macro level and what trends can be distinguished from their populations of apps.
  • Competitor analysis: you take a more granular look at your competitors and try to understand why they’re successful, what their focus is and messaging and functionality.
  • Audience analysis: you try to understand as much as you can about your potential users. This means collecting demographic and behavioral data.

But how do you actually start your research? The first step in this process is to check if someone hasn’t done it already.

Otherwise, this stage can be time-consuming (but at the same time incredibly fun). You want to do this (or have to) phase all by yourself? Here is a checklist for each part of the research phase:

Market research

This might be the most difficult part for an individual publisher, as a key component here are industry benchmarks that are a reflection of the current state of the market. Average values of tap-through rates, conversion rates, and other metrics are difficult to acquire without a large portfolio of clients (that an agency or a company like ours might have) and a history of test & experiments, so most probably you’ll just HAVE to rely on external sources to evaluate the feasibility of creating an app for a particular category and audience.

Another key component is keywords. These directly reflect the functionality of apps and the users’ intent. It isn’t easy to evaluate their competitiveness and popularity without proper tools, although in theory, the Apple App Store and Google Play Store offer what’s necessary to accomplish this task.

Another area to examine is the physical number of similar apps on the app store. So… to answer the question of “How many match-3 games are out there exactly?”… someone has to count them. It truly matters for the next question: “How do we add another one that is successful?”. You won’t know that without…

Competitor analysis

You should conduct competitor analysis on two levels: macro and individual apps. Take a population of apps most similar to yours and try to distinguish any trends & patterns in design and messaging. Take a sample of the most successful apps and study them individually.

This part will help you formulate your value proposition during the Ideation phase. Either offer something new (a unique feature), perfect what’s already out there (better performance)… or find yet another way.

This article on ASO competitive research may be helpful in understanding this step better.

Audience analysis: this part may require some investment in probing ad campaigns to verify responsiveness to ad groups organized by key demographic metrics. You may complete this step later, as discovering your audience can be a result of using our testing framework.  This is precisely what our client Etermax did with their quiz app.

You can also make assumptions based on market research and competitor analysis, as it’s possible to deduce through creatives which apps most similar to yours are targeting.

Research summarisation

This is what you should have after completing this step:

  1. A list of benchmarks for key KPI for your app’s category – TTR, CR, CPT, CPA, to help you evaluate the level of competition.
  2. A list of keywords with data on their popularity and competitiveness.
  3. A list of observed design & messaging trends among apps most similar to yours.
  4. A list of key takeaways from individual analysis of key benchmark apps (your best competitors).
  5. A quick summary of the most important parameters of your target audience, including demographic data and preferences… or educated guesses about them.

With these at hand, you’re ready for the next step.

Step 2: Ideate

In this phase, you collect ideas for all the things that you’ll want to test. These can be key features (highlighted through app store creatives), artistic direction, screenshot types, icons. What’s on the table depends on the development stage you’re in.

How to Design & Run Valid A/B tests: the SplitMetrics Framework

We would like to start with big, different concepts at first – to really understand what is the core motivation of the users. What is the reason for them to download the app, which USP would speak to the majority of the audience? Once we know the core motivation and we find a winner – we keep testing and optimizing in a fine tuning process.

Hagar Seri, Head of ASO & Mobile CRO at Moburst.

Ideation: a practical example

To illustrate this step with a real example, consider the case of Etermax. Following the success of Trivia Crack, Etermax was developing a new Trivia game, targeting a more senior audience. What value can the app bring to its intended audience?

How to Design & Run Valid A/B tests: the SplitMetrics Framework

We had a new Trivia game in development, aimed at a more senior audience. We wanted to know before the release of the game what type of users we were targeting. With the product team, we set up 2 user archetypes.

Gisela Carrera, ASO Principal Artis at Etermax.

The two archetypes were:

  • People aged 40-50: with motivations of challenge, competition, ostentation, and overcoming obstacles.
  • People over 50 years old: with motivations of self-improvement, sharpening of the mind through a daily challenge, and being better every day.

Both ideated archetypes are valid and have potential. However, given their mutual exclusion and overall impact on messaging present on the app’s product page on the App Store, as well as in-game design, there was only room for one.

Step 3: Form a hypothesis

In this stage, you use your ideas to form hypotheses that can be verified and tested. What is a hypothesis? Well, essentially, it’s an educated guess that discusses variables that can be verified. How you formulate your hypothesis determines the design and execution of your validation later.

Hypothesis forming: practical examples

Let’s get back to our Etermax case. We have two big ideas:

How to Design & Run Valid A/B tests: the SplitMetrics Framework
Testing ideas before the app is even released. Will users prefer the more competitive angle of a tested quiz game, or will they respond better to messaging of self-improvement and sharpening their intellect?

The hypothesis: the senior audience might prefer a more casual game that promises self-improvement and sharpening of their skills and intellect.

It’s not uncommon to see A/B tests run this early in the development cycle. Read the full Etermax case study to see how pre-launch testing can work. We also recommend this article, which explains why it’s actually very beneficial to start testing even before the release.

Another example is the case of Hobnob and rebranding that led to a decrease in conversions and tap-through rate. Hobnob is the app that helps people create professional-looking event invitations and distribute them via text messages. Which element of the app’s product page may affect those metrics?

Hobnob's app icon, post-rebranding, before alteration
Hobnob’s underperforming app icon.

The hypothesis: app icon is the only graphic asset that’s shown in both the search results and on the product page. Changing it will improve both tap-through rates and conversion rates.

How to Design & Run Valid A/B tests: the SplitMetrics Framework

We recommend a top-down approach in this stage. Your first hypothesis should discuss variables (in our case, product page creatives) with biggest visibility and possible impact on users. The app’s icon, the first screenshot – visuals responsible for the very first impression. You’ll have the opportunity to go deeper in future experiments. The best strategy is to test your changes one by one, to achieve clear and transparent analytics.

Natalie Tishkevich, Client Success Manager at SplitMetrics.

Step 4: Select variations for testing

Transform your hypotheses into variations for testing. How? Look at the screen below:

How to Design & Run Valid A/B tests: the SplitMetrics Framework
Hypotheses transformed into one of the most important elements on an app’s product page: screenshots. Variation A showcases a more competitive quiz app, while variation B clearly hints at more casual gameplay that leads to self-improvement. Please note that the key message is included on the very first screen in each case. Which one will be the winner?

Another fine example of a proper selection of screenshots for testing is the case of Rockbite and Mining Idle Tycoon:

How to Design & Run Valid A/B tests: the SplitMetrics Framework
A simple hypothesis (we need more varied and distinct screenshots) led to a choice of two variations destined to replace the original design (top row). Both variations utilize color to differentiate captions, but differ aesthetically in a significant way to be a good variation for testing.

Step 5: Design & execute your experiments

In this step, we’re diving into all the nitty-gritty details. This is the step when our designs and image sets are sent from the design team and into our SplitMetrics Optimize platform.

SplitMetrics Optimize
Start testing even before the app hits the App Store. Optimize with real data & make it easy!
Book a Demo

 Designing & executing an experiment involves these steps:

  • Configuration of ad campaigns that will generate traffic to the product page (real or simulated if the app hasn’t launched yet);
  • Designation of a time period for the experiment to take place, or a desired sample size, before the experiment can be concluded.

Remember, compare what’s comparable. Traffic from Facebook campaigns differs in quality from that acquired through Apple Ads. Be prepared that the results may be inconclusive, so you can begin another test immediately.

All in all, our platform greatly simplifies this step.

Step 6: Analyze your results

So how did our tests fare? Let’s have a look at a result from SplitMetrics Optimize:

How to Design & Run Valid A/B tests: the SplitMetrics Framework
Etermax pre-launch testing case: variation B with the self-improvement design and communication angle was the clear victor. These results were reached after two weeks of testing. Read the full case study here.
How to Design & Run Valid A/B tests: the SplitMetrics Framework
Rockbite testing and validation case study: variation C was the winning one. Distinct caption background colors, bright and contrasting font did the job, as well as aligning all text to the bottom of screenshots. Such details matter for users who sometimes take only a brief moment to look at the results of our hard labor. Read the full case study here.
How to Design & Run Valid A/B tests: the SplitMetrics Framework
Bright, vibrant, and energetic. Among them, variation E proved to be the most successful. A small change that brought fantastic results in terms of conversion. Read the full case study here.

All cases shown above have well-planned-out tests that resulted in decisive results for winning variations. This isn’t always the case. Basically, we can have 3 possible outcomes for each executed experiment:

  1. A clear winner emerges: either scale the experiment or implement changes right away. Congratulations!
  2. There’s no clear winner: iterate. Leave out the worst-performing variations and change other iterations. Don’t give up!
  3. A variation is worse than the original design – don’t waste time, just kill the test and go back to the previous steps.

Step 7: Share your findings

Sharing what you know with others can have a significant impact not only on your project but also on others. Your research and preparation can be a valuable lesson for other teams. Transparency ensures that stakeholders have proper expectations regarding future projects. An internal knowledge base can significantly reduce the workload required for future validation processes.

Step 8: Go back to step 3 (hypothesis forming)

As soon as you have decisive results, implement and observe changes in metrics. Don’t stop testing, don’t stop growing.

How to Design & Run Valid A/B tests: the SplitMetrics Framework
Icon evolution for My High School Summer Party by Lab Cave. Sometimes reaching that sweet spot is a gradual process. Read the full case study here.

Seasonality also plays a role in maintaining the continuity of A/B testing of mobile apps. You can significantly boost your results by customizing your app store presence to summer holidays, Christmas, or any other local festive period. The gains can be so significant that they warrant earlier preparation.

Trends change, user expectations evolve, and so should your app. As long as the benefits of maintaining an app outweigh the cost of validating & improving it, you should continue to seek out new opportunities for growth.

Unlock new levels of app growth
Book a Demo
Share this article
Gabriel Kuriata
Gabriel Kuriata
Content Manager
Gabriel is a professional writer with more than a decade of experience in bringing advanced b2b tech solutions closer to the people - with content in all forms, shapes and sizes.
Read all articles
Apple Ads Optimization
Cut CPA by 50% and double campaign ROAS with our free AI-powered automation solution.
Create Free Account
Share this article