WHITE PAPER : WEB PERFORMANCE TESTING Why Load Test at all? The reason we load test is to ensure that people using your web site can successfully access the pages and complete whatever kind of transaction they need, regardless of the number of people on the site (called the load). There are two important points to make regarding this view of load testing. The first is that what matters most is the customer experience not what data centers see. This is because customers are geographically dispersed outside data centers, and when they use a computer or other device to access your web site, it introduces additional layers of complexity for the applications to be displayed on your customer s browser, and for them to send back information or requests. The second point is that customer s expectations of a web site do not change during peak usage times which might occur because of seasonal or event-driven traffic or during infrastructure changes. Customer expectations are very high: 47 percent of consumers expect a web page to load in two seconds or less. Furthermore, 58 percent of mobile phone users expect web sites to load almost as quickly as or faster on their mobile phone than their PC. HOW IS WEB APPLICATION DEVELOPMENT AND DELIVERY CHANGING? There has been an important shift in how web applications are delivered that shift is to the browser. Previously, the browser did little more than render the instructed graphics; today, however, the browser is actually assembling the web application on the fly from a number of components delivered to it from an assortment of locations. So how profound a shift is this? When we looked at 3,000 companies and 68,000 applications and workflows we discovered that it takes an average of 8.97 hosts to complete a successful transaction. This means nine different hosts have to deliver content, and the browser has to assemble them in order for a user to make a plane reservation, buy a television or download an iphone app. This shift in how the application is assembled is critical to how you test. The application is now what the user sees, not what the developer creates. Let s first touch on the business impact of not meeting customer expectations. Poor web performance directly impacts customer behavior and, therefore, hits your bottom line immediately. What people do when web pages load slowly is startling: They don t wait they abandon. We studied 500 million transactions to see just how customer behavior changed as a result of poor performance on a web site. As you can see, each increase in response time leads to a correspondingly higher abandonment rate. Just two seconds can increase the abandonment rate by 17 percentage points. There are two more factors that add even more complexity to this situation. 1) No single browser dominates the market, so there are any number of potential browser configurations that are in use, each with different performance characteristics; 2) There is an entire chain of events that occurs between the application behind your firewall and the browser (what the customer sees). This is referred to as the web application delivery chain (WADC). While it is important to understand the dependencies along the entire chain, we are not suggesting you test the content delivery network or an ad server; however, you need to know how your choice of hosts and services impacts the user. TM
An average of nine hosts contribute to a single web transaction. This illustration of the WADC shows in broad terms the route that your application takes from your data center to the end user. Problems can occur anywhere along the chain. And if you can t find them, you can t fix them. Fortunately we can classify these problems into four areas in terms of their location along the WADC. 1.) Data center 2.) Internet 3.) Third party 4.) Browser/device
WHAT TYPES OF LOAD TESTING ARE AVAILABLE AND WHAT PROBLEMS DO THEY FIND? There are three types of load testing currently available. We refer to them as load testing 1.0, load testing 1.5 and load testing 2.0. Load testing 1.0 is primarily a legacy method of testing applications inside the firewall. It is designed to tell you about your internal hardware and application performance how much load can be put on our infrastructure but it doesn t tell you anything about the end user experience. Load testing 1.5 uses pretty much the same set of tools as 1.0, but now we put those tools in the cloud and run them from there. This begins to provide a better view. It is also a great way to reduce costs, and you can start testing beyond the fire wall. But it still doesn t tell you what the end user will experience. Load testing 2.0 solutions look at the entire web application delivery chain from the outside in; that is, from the browser back to the data center. The approach looks at how both your infrastructure and end-user experience scales with increased load before you launch new applications, make infrastructure changes or experience a spike in traffic. CAN YOU PROVIDE SOME REAL-LIFE EXAMPLES OF PROBLEMS FOUND WHEN LOAD TESTING AND WHAT APPROACH IS REQUIRED TO FIND THEM? EXAMPLE 1: BOTTLENECKS AND TRAFFIC SPIKES Our first example is a web site attached to a very popular network television show. After the show broadcast, the traffic to the web site peaked rapidly as fans logged on. The goal was to accommodate as many as 1,500 logins per minute on the site. As it turned out, the first load test turned up a bottleneck at 160 logins per minute clearly an issue. Since this was a problem found in the application, a load testing 1.0 solution could have found this error. However, once the application was fixed, additional testing showed that a bottleneck was reached at 1,300 logins per minute, still below the goal of 1,500. The problem turned out to be a limit in bandwidth. Since the problem was outside the firewall, it would need to be tested using a 1.5 or 2.0 solution to discover the root cause.
EXAMPLE 2: CLOUD VS. REAL WORLD TESTS Another example involved an online gaming site, which allows customers to wager on sports events. The traffic is very spiky and it is imperative that, regardless of the rush to the site, response times remain excellent in order to serve all their markets. In this example, when testing the site from the cloud, the first 18 minutes show acceptable performance, with a flat response time. But as the 2,500 user mark is reached the response time starts to climb. The number of errors goes up, but the performance is still acceptable from the cloud. However, from the last mile (the part of the WADC that starts from the data center and continues to the end user) all the way to realworld desktops connected to residential ISPs, availability is terrible and the failure rate is close to 75 percent. This means that from the end users perspective, only one in four can make a transaction. So what is the reason for this discrepancy between performance measured from the cloud and performance measured from the last mile? When you test from the cloud, using only a handful of data centers, you test with a minimal set of external IP ranges and the load balancer handles the addresses. When using real-world testing, with diverse IP addresses, the load balancer is unable to handle it. Cloud-only testing shows misleading availability and must be complemented by real world, last mile testing. Availability never gets above 50% with minimal users The first 18 minutes show acceptable performance from cloud
EXAMPLE 3: A PROBLEM IN GEOGRAPHY Geography can have a huge impact on response times. This example highlights an attempt to precisely target a specific geographic location a micro-geography, if you will. In this case, a regional online newspaper is providing information during elections. They have a lot of people counting on them for local information that is unavailable via the national news. They have two key U.S. regions New York and Pennsylvania and a set of specific criteria around the services they want to provide. The goal is that when maximum load is reached and sustained the response time needs to remain under four seconds and the success rate (availability) of any transaction needs to be at 99 percent. As you can see, when testing from the cloud, performance looks good. In fact, it looks excellent. Page response time is under four seconds and the availability is good only a single page error by standard testing measures. Using a load testing 1.0 or 1.5 method, you would say this works great. However, when we look at the key geographies we see something very different. The response time for New York and Pennsylvania exceeded the four seconds maximum. With their primary constituencies in those states, the last mile tests show that the application fails to meet their goals. Only a load testing 2.0 methodology would show this specific issue by looking at the delivery from the end-user perspective at those precise locations.
EXAMPLE 4: ONLINE-ONLY RETAILER HAS FAILURE WITH THIRD-PARTY COMPONENT This is an example of how third-party components can completely throw off an application. The components could be content delivery networks, web analytics, ratings and reviews, tracking pixels or else anything else not developed by you or that comes from outside your firewall. In this case we have an online retailer who used several third parties and needed to validate transactions. You can see pretty quickly that page 3 was an issue. This is really important, as there is so much content on the average web site that you must be able to pinpoint as quickly as possible a particular component. This was a search component responsible for 90 percent of the response time. The third party simply did not have enough capacity and caused a failure of the entire application. In this case, testing behind the firewall would not have shown the problem; load testing 2.0 will find it. Load testing 1.5 could possibly find the issue but, unless the load testing tool can track individual components, it would be very difficult to troubleshoot. EXAMPLE 5: GEOGRAPHY ON A MACRO SCALE This is a bit more complex than the previous geographic example. We have a big travel chain implementing a new reservation system. There are multiple data centers in multiple countries and a new load-balancing system. When the system was tested across a variety of geographies, it was discovered that three countries had no access to the system availability was zero. The only way to see the problem was by testing from many locations and from the end users perspective, since this was a customerfacing application. The problem, since it was in the load balancers, was best detected by using the Last Mile test, since the IP addresses are true to production and not aggregated. TM This didn t take long to fix because it was easy to identify and the test plan included all the geographic locations that mattered to them. The key was to use distributed testing. It is possible that testing from the cloud may have worked if the cloud locations were situated in each location of concern, but it is not clear that the problem would have been as obvious.
EXAMPLE 6: E-RETAILER BROWSER FAILS COMPLETELY This final example shows how the browser itself can profoundly impact your web site. This is the case of an Internet-only retail operation in the fashion industry with a business model that counts on 90 percent of its revenue being generated through daily sales announced at a specific and regular time. Clearly the traffic will spike heavily and it is critical that the load is handled and that customers can make transactions. Our testing revealed that Firefox showed 100 percent availability for the web site along with acceptable response times, yet varying widely depending on location. Looking at the results for Internet Explorer (IE) you can see significant outages. In fact, most transactions did not complete. Performance is miserable and the resulting loss of sales could be devastating. Left: Using IE agent how components load FIREFOX BROWSER INTERNET EXPLORER BROWSER Right: Using Firefox agent how components load The cause of the problem can be seen in the waterfall charts that show the order in which components come together in the browser is different based on the type of browser. There was a third-party script causing issues in IE. So this is a browser issue caused by a third party, which would be undetectable without load testing 2.0.
LOAD TEST 1.0 LOAD TEST 1.5 LOAD TEST 2.0 HTTP: Behind the Firewall Traditional Client/ Server Test HTTP: Data Centers Datacenter Testing Browser: Data Centers ONLY GOMEZ SPANS Real World Desktops Last Mile Accuracy of End-User Response Time Incomplete Incomplete Indicative Most Accurate Accuracy of Application Availability Invalid Indicative Indicative Most Accurate Ability to drive large load volume Yes requires substantial hardware Best Better Good Understand CDN Impact No Misleading Misleading Most Accurate Understand 3 rd Party (ads, feeds, etc ) No Minimal Some Most Accurate Realistic object download No No Static Only Yes Yes Visibility behind the firewall Best Good Good Good SUMMARY The chart above summarizes the types of load testing and what types of issues can be identified with them. The most important takeaway is that the only criteria that matters in the end is what your customers experience will be. The only way to understand that experience is to load test using a load testing 2.0 method that looks at the experience from your customers point of view. ABOUT GOMEZ The Gomez platform is the industry s leading solution for optimizing the performance, availability, and quality of web, non-web, mobile, streaming and cloud applications. The Gomez approach to application performance management starts by measuring your end user s experiences and all the components that contribute to it to proactively detect performance issues, quantify their business impact and accelerate resolution. The Gomez solution works for any type of application, including enterprise applications accessed by employees, e-commerce web sites visited by customers or applications running on mobile devices. Only the Gomez First Mile to Last Mile solution eliminates blind spots across the entire application delivery chain, from the browser on a user s computer or mobile device, across the Internet or a corporate WAN, across third-party and cloud providers, to the complex infrastructure inside data centers. Business managers, IT operations personnel and application development/qa engineers benefit from the insight provided by the Gomez solution. More than 4,000 customers worldwide, ranging from small companies to large enterprises and managed service providers, use Gomez to increase revenue, build brand loyalty and decrease costs. To learn more about Gomez, visit: www.compuware.com/gomez Compuware Corporation, the technology performance company, provides software, experts and best practices to ensure technology works well and delivers value. Compuware solutions make the world s most important technologies perform at their best for leading organizations worldwide, including 46 of the top 50 Fortune 500 companies and 12 of the top 20 most visited U.S. web sites. Learn more at: compuware.com. Compuware Corporation World Headquarters One Campus Martius Detroit, MI 48226-5099 2011 Compuware Corporation Compuware products and services listed within are trademarks or registered trademarks of Compuware Corporation. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. 07.11 19979