First, let me introduce you to some vocabularies in the language of A/B Testing and optimization:
• Element — a discrete unit on the page: a block of text, a form, a button, an image, etc.
• Page — a web page or landing page that is considered the control page for your test.
• Variation — a version of a page that has some changes made to page elements. Also referred to as a variant.
• Test — a hypothesis that one version of an element will change the conversion rate in a significant, hopefully, beneficial way.
- Conversion — when a visitor takes a desired action on the page.
What is A/B Testing?
The most common means of testing to improve conversions online are A/B Testing (aka Control Experiment or Split-Testing) and Multivariate Testing (aka MVT), which we’ll talk about in more detail below.
They are widely used in industry to make decisions. For example, when you run an A/B test, you compare one page against one or more variations that contain one major difference in an element of the control page. After a set amount of time, or visits, you compare the results to how the change affected your results.
In the simplest form of A/B testing, there are 2 variants: Control A, Treatment B (Control group: existing features, Treatment group: new features). However, in reality, we might face scenarios that we have one control group and more than one treatment group, so it might actually be A/B/C/D testing, but it is still called A/B testing and if you see A/B/n or Split Test that is just a more accurate expression of one control with many variations.
How Much Traffic do You Need for a Valid Test?
A prevalent question is how much traffic you need when running a test. Let me make an example of how to use A/B test results for major site-wide decisions. Companies like Twitter and Facebook use split tests to temper drastic changes like redesigning a homepage by only serving it to a portion of their visitors to test major interface changes by only rolling out the new version to a segment of their visitors and measuring how that group reacts.
How long to run an A/B test?
To answer this question, we need to determine the sample size which requires three parameters including:
· Type II error rate β or power (because power = 1 — β)
· Significance level α
· Minimum detectable effect
The rule of thumb is that sample size n approximately equals 16 (based on α = 0.05 and β = 0.8) multiplied by sample variance divided by δ square, whereas δ is the difference between treatment and control:
How each parameter impacts the sample size?
For example, we need more samples if the sample variance is larger, and we need fewer samples if the delta is larger. Sample variance can be obtained from the existing data, but how do we estimate δ? we don’t know this before we run an experiment, and this is where we use the last parameter: the minimum detectable effect. It is the smallest difference that would matter in practice. For instance, we may consider a 0.1% increase in revenue as the minimum detectable effect. In reality, this value is discussed and decided by multiple stakeholders.
Once we know the sample size, we can obtain the number of days to run the experiment by dividing the sample size by the number of users in each group. If the number is less than a week, we should run the experiment for at least seven days to capture the weekly pattern. It is typically recommended to run it for two weeks. When it comes to collecting data for a test, more is almost always better than not enough.
What makes a test champion?
A good technique for tracking your test performance is to keep a record of your hypotheses and your results, so you know where you’re going right and where you went wrong. That will make your next test better, and serve as a record for stakeholders, as well as yourself.
Advantages of A/B Testing
• They are fast
• Advanced analytics can be installed and evaluated for each variation (e.g. click tracking, heatmaps, etc.)
• Can achieve more dramatic conversion rate lift results
• Requires less traffic
Disadvantages of A/B Testing
• More dramatic failures
• Less specific understanding of element effects