1 The State of Split Testing Survey Analysis Analysed by Parry Malm President, Howling Mad 1
2 Welcome to the State of Split Testing subject line split testing is nothing new. It s a programmatic experiment that has been widely used by marketers for some time now. And yet, as you ll read in the following analysis, it s generally done quite badly. However, it s not your fault! Well, not completely, that is. There are many reasons why people aren t split testing enough. In the following pages, you ll learn how the marketing industry views subject line split tests... what they are doing today, and most importantly, what they aren t doing. A note on methodology: The State of Split Testing Survey was conducted in October 2014 and had 304 respondents. All responses were anonymous. 55% of the respondents were brandside, and 45% were agencyside from across the world (approximately 60% US, 25% UK, 15% other.) We re confident that it s a broadly representative sample of the marketing industry. 2
3 Contents ONE TWO THREE FOUR FIVE SIX SEVEN HOW TO INTERPRET THE RESULTS WHAT ELEMENTS OF AN ARE THE MOST IMPORTANT? HOW MARKETERS SPLIT TEST WHAT DO PEOPLE TEST? HOW DO YOU MEASURE SUCCESS? HOW CONFIDENT ARE MARKETERS IN THEIR SKILLS? BUT DON T TAKE IT FROM US PAGE 4 PAGE 5 PAGE 7 PAGE 10 PAGE 11 PAGE 13 PAGE 15 3
4 How to interpret the results At Howling Mad, we hate it when people use statistics badly. So, we ve done our best to analyse the results of the survey in the most robust way possible. Most people will look at the sample mean, or the average, of the values in the survey responses. However, averages from a survey sample always have some variance in them and shouldn t be trusted by themselves. The sample mean may not be the true mean of the population. It s an estimate but not the exact number. The charts all look something like this: How much do you love statistics? Therefore, we encourage you to look at the 95% confidence interval range, represented by the high and low bars on each bar. This is the band of values which represents the likely range of the population mean. OR, SIMPLY PUT: The typical marketer s responses exist within the confidence interval, but not necessarily at the average point. The average can be a misleading statistic, so don t focus too much on it. We ve included it because it s what people expect. We ve included the confidence intervals because it s what people learn from. Lots! Meh. Not a big fan. FML! 44% Sample mean 39% 49% 30% 20% 6% 95% confidence interval 4
5 What elements of an are the most important? Different people, often with different agendas, will say various things are more or less important when it comes to marketing. In your experience/opinion, how much do the following elements affect the response rate of an campaign? Quality of data Segment selection For example, data agencies will tell you that CRM matters most. Or, an ESP will tell you their technology is the most important. UX people will tell you that you need to go responsive or die. And, designers will tell you human creativity trumps all. Subject line Rendering on multiple devices Deliverability infrastructure Human creativity So, we asked people what, in their experience, affects the response rate of an campaign, with one being low, and four being high. aesthetics / design Time of day ESP technology List size LESS MORE
6 Quality of data, segment selection and the subject line are all viewed as the most important elements of an campaign. Quality of data makes intuitive sense, of course bad data equals spam boxes. Segment selection yep, sending the right to the right people is generally a pretty good idea. What about results for Brands vs. Agencies? A few telling things come to light. Perhaps not surprisingly, agencies appear to view services that they generally offer things such as data quality services, deliverability advice/infrastructure, and segmentation abilities as more important than factors they don t control. Once noticeable difference is subject lines! Brands view them as being more important than agencies do. The subject line it s the only part of your campaign everyone is guaranteed to see! Note that the confidence intervals of all three winners overlap. Therefore you can interpret the result as follows: the three elements are all viewed as being roughly equally important, as it s not clear which one is the most important. From these statistics we can, however, robustly say that: Quality of data, segment selection and your subject line is the holy trinity of successful marketing. And the bad news sorry, human creativity and deliverability infrastructure and especially sorry to ESPs! Your input is not viewed as being highly important to the outcome of a campaign. 6 Brand vs. Agency Subject line Quality of data Segment selection Human creativity aesthetics / design Time of day Rendering on multiple devices Deliverability infrastructure List size ESP technology Agency Brand LESS MORE
7 How marketers split test Based upon the preceding results, we can all agree that subject lines are hugely important to both brands and agencies. This is not news. So, surely if the subject line is of huge importance to the response of an campaign, marketers should be testing all the time, right? Of all the campaigns you've sent out in the last month, in how many of them did you split test your subject lines? About a quarter of marketers never split test their subject lines. And, about half of people only test subject lines on just a few of their s. None A few 22% 49% 18% 27% 44% 54% This result is counter-intuitive! While people agree that subject lines are of huge importance to a campaign s response, most people aren t trying to test and learn how to do them better. Most All 21% 17% 25% 7% 5% 10% Hopefully when marketers do split test their subject lines, they at least make the most of the opportunity for maximum learning by testing out numerous subject lines at once. Right? 7
8 When you split test, how many subject lines do you normally test? What is the maximum number of subject lines you can split test in your ESP? None A/B 15% 11% 19% None 9% 6% 11% 69% 64% 74% A/B 32% 27% 37% % 10% 17% % 11% 19% 5+ 2% 1% 4% % % 0% 2% Huh. Not so much. Unlimited 17% 13% 21% The vast majority of marketers only test A/B splits. Which is good, don t get us wrong, it s better than nothing, but it still limits what you can learn from any given experiment. Why are marketers restricting their learnings to A/B? There are a couple of reasons. The first reason is clear from this next question: Not sure 27% Wow. This is clearly a problem. 22% Even in this day and age of programmatic marketing, many major ESPs (about a third of the sample) only offer A/B split testing which is less than ideal for marketers who want to supercharge their results. 31%
9 Who on your team thinks of the subject lines to test? How much time do you and your team spend thinking of subject lines to test? You & team 78% 73% 82% A few minutes 41% 36% 46% You alone 14% 10% 18% About an hour 42% 37% 47% Another 6% 3% 8% About half a day 15% 11% 19% Your boss 2% 1% 4% More than a day 2% 1% 4% Fortunately most people include their team in the brainstorming process, which is a good sign. The human brain, in short amounts of time at least, can only think of so many ways to say the same thing. When the same people think of subject lines over and over, it s incredibly difficult to find new ways to say the same thing. The statistics show that people view subject lines as being hugely important to an campaign. And yet, barely anyone spends more than an hour thinking about them. This is a very odd result Consider this: how much time do you spend making an look great in your ESP s HTML editor? And how long do you spend picking the data? And making sure it s responsive across all devices? And so on, and so on It s probably more than an hour, right? This is perplexing. Subject lines are viewed as important, that s clear, and yet people don t spend much time on them. 9 WTF?
10 What do people test? We at Howling Mad have looked at hundreds of thousands of subject lines (we re not exaggerating we love subject lines! We re also great at parties.) And we ve seen an enormous amount of split tests, some of which work, and some of which don t. Subject lines generally don t work when people test out the wrong things. When you test out subject lines, what do you test? This is what marketers around the world are testing: When testing out subject lines, which element(s) do you test the most commonly? Different call to action phrases Length of subject line Different adjectives Including the price / discount Price differentials (i.e. $50 vs 50%) Different product features Punctuation 77% 58% 43% 36% 30% 25% 10% Other things that people test out, although in very small amounts, are things like personalisation, front- vs back-loading features, and including your brand name in the subject line. 25% 7% 13% 20% 31% 34% 38% 30% 41% 53% 48% 73% 63% % 10 What s surprising is that only about a quarter of marketers test out different product features despite it being your products that people buy from your s. This result is surprising and is likely an area of opportunity for the astute marketer.
11 How do you measure success? Doing subject line tests just for the sake of doing them is nonsensical. The purpose is, of course, to improve your results. So how do marketers measure success? What metric(s) do you use to measure success of a subject line? insignificance here drop us a line if you are curious why this is the case.) Open rate Click to open rate Click rate Conversion rate Unsubscribe rate 79% 75% 83% 40% 34% 45% 35% 35% 30% 30% 40% 40% 24% 20% 29% However, one thing that s common is marketers focus on short-term results. Getting a little lift in open rates or click rates, and considering that success. This is a problem. Where the real power of split testing comes in is when you follow an experimental plan and apply longitudinal learnings. That is to say, learn about your audience over time, over a series of planned split tests, so your response uplift isn t just fleeting but delivers you long-run value. Each business will have its own reasons for using different metrics. For example, if you re marketing a premium priced product that only gets a couple conversions per , using conversion rate as a success metric is statistically unreliable. (We ll save you a long rant about statistical To put it in the words of one of the survey respondents: [We] don't know what [we re] testing until last possible moment Short-term focus on testing (no methodology.) 11
12 So the important question isn t how do you measure success The important question is, How do you analyse your previous successes to predict better subject lines in the future? As one respondent noted: [We ] make poor assumptions of our audiences based on the 'success' of one campaign. Do you analyse your past history of split tests to help design future splits? If so, how do you conduct the analysis? Don't really do much Gut feel Determine causal variables Print out all subject lines and look for trends 39% 34% 32% 32% 29% 27% 27% 34% 39% 37% 37% 44% And another: [We] haven't planned a robust series of tests and don't know what we're looking for to demonstrate success/improvement. Sentiment/natural language processing analysis 16% 12% 20% Build a model to predict subject line performance 4% 2% 6% Now, this is worrying. Most people don t do very much, rely on intuition, or look at all the subject lines in a list and try to eyeball trends Only a paltry 4% of people have tried to build a model to predict subject line performance. This is, of course, a challenge, in terms of technology, statistical know-how, and not having enough data. Thus, predictive model building remains a pipe dream for the majority of marketers. 12
13 How confident are marketers in their skills? How confident people are in their subject line expertise, not to mention those around them? Being confident in your skills makes you want to do whatever the skill is more often. And if you re not overly confident yet, knowing where to get expert advice is equally important. The question is derivative of Net Promoter Scores, which is a way of measuring confidence in business services. Net Promoter Scores: How likely are you to recommend the following to others? -64% -56% Your ESP's (or agency's) advice on subject lines Your company's subject line strategy A Net Promoter score is based on a scale of -100% to +100%. The lower the score, the less confident people are recommending a skill. The higher, the more confident they are. -53% -52% -47% Your team's subject line expertise Your company's subject line testing methodology Your own subject line expertise -46% Your ESP's split testing functionality % -50% 0% 50% 100%
14 OK, here s a big problem. marketers aren t extremely confident in their own subject line expertise (-47%.) Usually in this situation people would seek out external help for this problem. However, they have even less confidence (-64%) in their ESP and/or agency s subject line advice! As one of the respondents noted: I would [like to] allow more time for build when there is a split time to be done as [ESP name] is such a retarded programme and makes split testing a pain. So where are marketers supposed to get help, if not from ESPs and agencies?!? marketers are caught in a vicious cycle of wanting to do better, but not having enough time, lacking enough knowledge to improve things, and having no one to call on to help them do better. This is the State of Split Testing in the industry. 14
15 But don t take it from us The quantitative results paint an interesting picture. But, what s equally important is to determine what people are thinking. What their challenges are. What keeps them awake at night. Two qualitative questions were asked: Just out of curiosity, what are a few things that most marketers do wrong when it comes to split testing? And, maybe this is a stupid question, but: if you could wave a magic wand, what split testing practices/ methodology/whatever would you change in your organization? (Note: you don't have a magic wand, sorry, it's a hypothetical question.) Here are the most common words and phrases that the respondents used when answering these questions: last rather Bettertimelines plan difficult know using correlation gut use just enough looking results content built two test understanding within variables also spend structured line analysis a/b planning trying get learning methodology next based often think data make measure campaign significance minute 15
16 Not enough time. Or understanding. Or methodology. Or data. You ve read lots of analysis in this document of cold, hard statistics. The words of marketers are more powerful than any statistics could ever be. So, we ll leave you with their words, anonymous of course. Actual analysis and less of a reliance on gut feeling. Freedom from Commercial team's orders, not having to create sends and out to people who aren't going to convert. We use only A/B split testing, they send the winner subject line out too soon. I would make my team stop relying on using the same phrases and be more creative. I would change the ad hoc split testing we do and create a strategic plan for what we are trying to achieve or measure. Also, I would wave a wand for everyone involved with testing to have experience with actually sending one so we can all understand what we are testing and learning. Testing two arbitrary subject lines each time based on gut feel rather than testing a specific causal variant and testing it over and over until you have proven that it is in fact, a successful variant. And we ve saved our favourite for last: 16 Not planning split testing - so it's Joe's favourite subject line v Fred's - no learning can be gained.
17 Howling Mad helps brands and media owners like you make more money online. We combine , social, on-site tactics, and a proprietary, market-leading experimental design platform to drive online results. Visit to learn how we can help supercharge your digital brand. Design by the Analysed by Parry Malm President, Howling Mad