All Collections
Elite Motivation
Introduction to Experimental Design
Introduction to Experimental Design

Learn about experimental design, including why and how we use it in our Experiments.

Portal Support avatar
Written by Portal Support
Updated over a week ago

Persado is the advanced language personalization platform that delivers outsized impact through AI content generation and decisioning at scale. This article teaches you about how Persado uses experimental design to obtain the maximum information from testing marketing language with the minimum amount of resources. If your agreement doesn’t include Experiments, and you’ll only be employing Predictive Content, the contents of this article won’t apply to you since Predictive Content doesn’t use experimental design.

After reading, you will understand:

  • What experimental design is

  • Distinctions between experimental design and other forms of testing, like A/B testing

  • The key components of experimental design and how Persado puts them into practice

  • How experimental design can bolster your business.

What is Experimental Design?

Experimental design is a field of statistics that deals with how to efficiently plan an experiment to obtain as much useful information as possible by performing a small number of trials. It allows us to identify cause-and-effect relationships between inputs and outcomes in a rigorous, replicable way, so we have statistically significant knowledge and can be assured our results were not due to chance.

Experimental design is not unique to Persado - it’s an established methodology of statistics, mostly discussed in academic circles before adoption by the pharmaceutical world in the 1970s. Unlike simply collecting and observing data from a sample of patients, medical academics began designing experiments in order to measure the precise impact of changing different, independent variables.

Persado employs experimental design in a unique way by applying it to language. The Experiments we run identify the precise words and phrases that will drive the most engagement with your audience. In the context of your marketing campaign, experimental design allows us to:

  • Tag different elements of your message

  • Test out different versions of those elements that are representative of a much broader population of possibilities

  • And identify which ones will be most successful in market.

Even more, experimental design allows us to assess which combinations of elements have the highest impact, even if we haven’t yet deployed them yet - something other forms of testing, like A/B testing, can’t deliver.

What is an Experiment?

Generally, experiments are scientific procedures carried out to support or refute a hypothesis, but at Persado they have a specific function. Experiments are our practical application of experimental design. These methods of content generation include Full Experiments, One-step Experiments, and Exploration Only Experiments, all of which allow us to gain granular insights into which parts of our messaging are working with your audiences so we can learn continuously over time and deepen our machine's understanding of your brand. Typically, Experiments contain 8 or 16 Variants plus your control.

Limitations of Other Testing Methods

Imagine testing all the possible versions of a promotional email subject line. It's hard to even imagine how many there are. Hundreds? Thousands? Infinite? You might begin by breaking the subject line down into 4 elements:

  • Offer

  • Product Description

  • Emotion

  • Formatting.

You then might come up with 4 different options for each of those parts. That would give us a steep total of 256 possible combinations. Testing this many combinations with limited human resources and budget simply isn’t feasible using traditional marketing techniques. This is why Persado uses experimental design.

Did You Know?

If you were to somehow manually test all 256 versions, you would be performing a multivariate test, which requires all possible combinations to be deployed. But this simply isn’t a possible (or desirable) use of your staff, resources, and budget. That's the reason why multivariate tests are limited to very small scales, like 2 factors with 2 values each (a 2x2 design, with a total of 4 combinations). Experimental design is a form of multivariate testing that allows us to test all combinations in an efficient manner.

Why Not Use A/B Testing?

One of the most common methods marketers use to test different messages is the A/B test. As the name implies, A/B tests allow brands to randomly divide an audience into 2 groups and deliver 1 version of a message to one group and deliver a different version to the other group, and assess which performs better. A/B testing can be thought of as testing the champion version against the challenger version.

A/B tests are limited in today’s climate of massive scale and real-time interactivity. They cannot, for example, efficiently test multiple versions of multiple-page elements against one another, or show why a particular message performs better than another. Nor can they show why a failed message produced no response, or a negative response.

On top of that, A/B tests can very easily be biased, particularly when there are multiple things that are different between the 2 versions tested. Imagine version A has a green button that reads "click to know more" while version B has a red button that reads "next step," and version A has a 20% higher click rate than version B. How do you know if that 20% is driven by the button color or by the text? The short answer is, you don't.

How Experimental Design Works

Tests that use experimental design methodology are extremely efficient. With experimental design, we can test a number of variations and then use predictive analysis to identify the best observed and best predicted combination of elements for the brand. Going back to our previous example, experimental design allows us to test all 4 parts of the subject line, and 4 options of each of those 4 parts, while still deploying a very small number of combinations.

You may be wondering: how? Experimental designs are robust, but they need to be constructed in a specific way. The principles of experimental design discussed in this section help us determine which of the thousands of possible combinations to deploy that are representative of the broader population, and how to test them to ensure that our results are significant and not random.

This is only possible through the following key elements of experimental design:

  • Balance and orthogonality

  • Randomization and homogeneity

  • The use of a control

  • Collection of binary response data

  • Reaching statistical significance.

Balance and Orthogonality

At Persado, the various marketing messages we deploy for testing - which we refer to as Variants - need to meet two important criteria: balance and orthogonality. These are standards that most experimental designs, and all Persado experimental designs, must meet.

  • Balance means that each element must show up in the same number of Variants as each of its alternative elements. If we have 4 possible offer descriptions, for example, each offer description needs to appear in a quarter of the Variants.

    • Why is this important? We need to be able to evenly compare the impact of each element, so they must appear the same number of times.

  • Orthogonality means that each part of the message can be analyzed independently from the others, and they are in no way correlated to one another. We want to be able to find which offer description is the best, for example, and testing other parts (like the emotion used) cannot cause bias to that.

    • Why is this important? We need to be sure that the results we find are specific to each element of the message, as opposed to how one element + another element work together. Otherwise, we wouldn’t be able to isolate their impact.

These 16 Variants demonstrate the key principles of balance and orthogonality.

Hot Tip

Balance is the reason why you can’t change only one Variant in your review cycle with Persado. If you edit an element of your Variant - say, if you wanted to change “$250 Just for you” to “$250 On us” - you would have to make the change so it shows up in the same number of Variants as each of its alternative elements. This is to maintain the integrity of the experimental design.

The Use of a Control

It’s commonplace to test a control (the standard marketing message) with every Experiment. The major advantage of using a control is that it helps detect and eliminate inadvertent bias. An additional benefit is that the use of a control yields a more precise estimate of the difference in performance between the original copy and the Persado Variants.

Randomization and Homogeneity

For the Experiment to be valid, audience assignment for each Variant needs to be randomized. This is to ensure that we are testing the individual elements of the Variants as opposed to a combination of the Variants + the audience. The systematic, random allocation of message Variants to uniform groups of customers ensures that extraneous factors, such as frequency of buying, age group, region, etc., don’t introduce bias into the Experiment.

Collection of Binary Response Data

Fundamental to our algorithm is the use of binary response data, which refers to only 2 types of outcomes, like: yes or no, 0 or 1, the number of clicks versus the number of non-clicks. While non-binary response data, like net promoter score (NPS) or revenue, can technically inform Persado’s algorithm, these metrics generally require very large sample sizes and user-level data. As a result, brands most often use response actions like clicks, conversions, or application submissions.

Reaching Statistical Significance

When you run an Experiment with Persado, you want to find the best ways to drive results for your brand. But not all results are created equal. There’s a chance that one may be a random happening, rather than a product of an intervention. To ensure we avoid this, we rely on statistical significance.

Statistical significance indicates that the result of an Experiment is not likely to have occurred due to chance or luck. It is quantified through a metric called p-value, which is the probability your results were caused by what you were testing. In other words, if what we observed was very unlikely to happen due to random fluctuation, then we must accept that our intervention was the reason for the observed result.

For example, p=0.05 means that if we repeated the same Experiment 100 times, the same results would be expected in about 95 of them. In the other 5 tests, we would report that our test caused the result, when in fact the result could have been caused by random noise. This is called a "type 1 error," or a "false positive." By reaching statistical significance, we minimize our chances of incurring this error.

Persado’s cutoff probability for rejecting our hypothesis is usually very strict - less than 5%. In marketing, application practitioners often accept levels of significance as high as 15%.

Statistical significance is influenced by a few parameters.

Sample Size

The greater the sample size you use, the more significant your results will be - and the more confident you will be that you improved your ad copy. At Persado, we use advanced mathematical methods to calculate the minimum sample size you need to reach statistically significant results every time ad copy changes. Our methodology is based on formulas used in the experimental design community that we then revise as needed per channel. The minimum sample size varies per channel, but it usually ranges from 1,500 to 8,000 responses.

In addition, there needs to be a large enough difference in performance between the best and worst performing Variants in order to generate statistically reliable conclusions. The greater the difference a change makes in response rate, the easier it is to achieve statistical significance. Persado creates extreme ad copy variations that generate big differences in response rate and make it easier to identify which elements significantly influence the response rate.

Number of Variants Tested

The fewer Variants you test, the faster you will reach a conclusion on which ones are significantly different. Our statistical algorithms take into account the available population you have for testing and your previous results, and recommend the optimal number of Variants that need to be tested so that you will reach significant results and gain the most insights.

How Persado Applies Experimental Design

Persado has developed an algorithm to automate the selection of the very specific set of Variants that meet the above principles of experimental design:

  1. With your brand’s input, Persado defines the elements of your message to be tested.

  2. We generate Variants to test those values, taking into account prior performance and new language we want to test.

  3. We generate a full list of possible combinations, and narrow down to the exact set that meets the criteria of balance and orthogonality using our algorithm.

For a balanced and orthogonal (i.e., unbiased) experimental design, the smallest number of Variants required to test is the lowest common multiple of the number of values in every element being tested. So, if your message has at least two 4-value elements (like 4 different variations on your emotion and your offer description), then the smallest possible design is 4 x 4 = 16, and it can also support 2-value elements (since 16 is a multiple of 2x2 and 2x4).

Persado has found over years of working with customers that 16 Variants obtain useful results and can still be efficiently produced and approved. However, this is not restrictive; we can use a different experimental design and test more (e.g., 32) or less (e.g., 8) depending on your objectives.

Determining the Best Message

After those 16 Variants are deployed in market, Persado gathers the results and feeds them into a statistical model that can predict the response rate of any of the 16,384 possible combinations. By doing that, we can tell exactly which combination has the maximum response rate, even if it was not part of the initial set of 16 Variants.

This means Persado is able to deliver both a best observed outcome and best predicted outcome.

The best observed option is the one that is actually observed to perform the best in an Exploration phase, where we release our selected Variants to a subset of your customer population to learn in real time what works and what doesn’t.

The best predicted outcome is one that Persado’s statistical model predicts will perform the best based on experimental design. Based on our findings in the Exploration phase, the messages that perform the best and meet statistical significance are then put into real world tests to test actual performance in what we call a Broadcast phase.

How Experimental Design Helps Businesses Grow Revenue

It can take a person multiple contacts with a brand before they decide to become a customer, so every opportunity to communicate and engage with people counts. If a brand can increase the number of people who respond to their marketing messages, they increase conversions, so the ability to know what works to engage quickly and effectively is paramount.

Knowing which messages resonate and which do not gets to the heart of competing in customer experience.

During Covid-19, for example, online sales became increasingly important. One major luxury goods retailer wanted to boost awareness and adoption of their mobile app to support digital shopping. They used Persado to run an Experiment on the language in the text above the image as well as the CTA. The winning Variant we identified drove 31% more app installs.

The most critical parts of this creative were the CTAs. The control’s original CTA button said “Install Now.” The subtle change from “Now” to “Immediately” helped drive significantly more engagement. Furthermore, the CTA in the text was also important. “Download it immediately” was more effective than “Download immediately the Stylable app” in the control. Emotion also played a key role. “We invite you to discover,” which conveys exclusivity, was more effective than other emotions like curiosity, gratification, and urgency.

This granular level of insight into which parts of the message contributed most to engagement is only possible through experimental design.

Did this answer your question?