Test Run Analysis Interpretation (AI) Made Easy with OpenLoad OpenDemand Systems, Inc. Abstract / Executive Summary As Web applications and services become more complex, it becomes increasingly difficult to pinpoint performance bottlenecks. An application s behavior is defined by many software and hardware components, each with its own settings, which may dramatically affect the application s performance. Performance testing and tuning is a continuous issue throughout the lifetime of a Web application as any change in the software, hardware or data might result in a new need for optimization. Due to the increasing number of components and possible complex configurations, it is becoming even more challenging for developers, quality assurance (QA) testers and operations personnel to identify what changes are required to improve the performance of Web applications/services and IT infrastructure. Introducing OpenLoad, the first easy-to-use, completely browser-based, enterprise load and stress testing solution designed specifically for optimizing the performance of Web sites, applications and services, including popular J2EE and.net platforms. OpenLoad helps quickly identify customer experience issues as well as back-end Web infrastructure hot spots and bottlenecks by greatly simplifying the process of setting up and configuring robust data driven tests and analyzing the results. This paper offers a detailed description of the OpenLoad Analysis features that simplify the process of interpreting test run results. The Problem In a typical client-server environment, there is a known desktop configuration and fixed user population. Since these variables are controlled, the release process is generally uneventful. Although some bugs are still expected, performance and scalability is more or less predictable. Even when there are issues, problems can be isolated to one or two tiers. However, the release process of a Web application is usually a much more complex endeavor. A Web application is a complex distributed environment with many components interacting across multiple tiers, including Web, application, database and even mainframe systems. Firewalls, load balancers and other intermediate hosts may also come into play. Then there are the users client configurations to consider, which often vary in terms of browser type and version, platform and connection speed. Not to mention that the number of users accessing a Web application can peak at any time due to the potentially unlimited, or open, demand of the Web. All of these factors can greatly affect a Web application s overall behavior. Therefore, any change in code, content, hardware or network configuration requires proper testing in order to verify that the system is still at least as responsive as before. If a Web application is released without careful testing, it may not meet service level expectations or scale to support the tens, hundreds or thousands of simultaneous visitors that make up the application s user base. Load and stress testing enables development, QA and operations teams to identify performance issues before users do when they are less expensive to resolve, and allows IT managers to deploy even the most complex of Web applications with the utmost confidence. Although the benefits of testing throughout the application life cycle are readily apparent to most organizations, adopting iterative testing models has been largely impractical due to the complexity and cost associated with implementing legacy test tools. Legacy test tools generally require users to learn proprietary scripting languages, perform complicated data analysis and invest a substantial amount in licensing, hardware and resources to support the testing effort. All of these factors drive up total cost of ownership (TCO) and 1
have a direct impact on an organization s ability to initially acquire and effectively implement an automated testing solution throughout the application life cycle. The overhead associated with implementing these legacy test tools is an artifact of their client-server testing origins, which poses a sufficient obstacle for organizations with limited resources, time and budget. In particular, the art and science of test run analysis interpretation (AI) is especially challenging for organizations that are new to automated testing or performance engineering. To make testing an efficient process, IT requires tools that are easy to learn and use, yet still provide the robust feature set required to adequately test today's enterprise-level Web applications. OpenLoad is designed from the ground-up to meet the unique requirements of Web application developers and testers. It is a next generation test tool with a browser-based testing model that substantially minimizes the time and skill set required to build and maintain test scripts as well as pinpoint performance bottlenecks within Web applications and IT infrastructure from both inside and outside the firewall. OpenLoad s reporting and analysis tools take all of the guess work and complexity out of identifying performance bottlenecks by automatically flagging problem areas within your application and providing recommendations for areas that require further investigation. Within minutes, OpenLoad will help you to answer such questions as: How many users can my site adequately support? What pages are slow or error prone under load? How do I improve the performance, availability and reliability of my web app under load? What do I optimize? The Web Tier, Application Tier or the Database Tier? What are the software components (e.g. SQL queries, EJB methods) I need to optimize? Should I upgrade my infrastructure? What should I buy? Processors, memory, storage or bandwidth? 2
OpenLoad Analysis Overview OpenLoad has four modules that facilitate fast and productive testing; the Recorder, Controller, Scheduler and Analysis modules. Figure 1 - OpenLoad Modules This paper will focus on the Analysis module, which is the fourth and final step in the OpenLoad testing process. The OpenLoad Analysis module automatically correlates user, application and server side metrics into a single simplified reporting view that is categorized by application performance, availability, reliability, scalability and capacity. This approach enables you to more effectively pinpoint resource contention issues within the various tiers of your application, including Web, application and database servers. The Analysis module is comprised of four (4) primary reports: Virtual User Report provides detailed information on the activities of each virtual user, including the ability to view the actual responses from your application in real-time Summary Report provides a high-level view of how well the system is meeting business and end-user expectations Detailed Report provides specific insight into pages that are slow or encountering errors Graph Report provides detailed user and server metrics for five other sub-reports, including Web performance, availability, reliability, scalability and capacity Viewing the Analysis requires Sun Microsystems Java Virtual Machine 1.3 or higher in order to properly load and run the applet. Once loaded, the Analysis applet is cached for fast viewing and you can use it for analyzing test results in real-time or performing historical analysis for before and after regression comparisons. 3
AnalyzingVirtual Users Analyzing virtual user behavior is the first step in isolating potential performance bottlenecks within your Web applications and services. The OpenLoad Analysis provides a Virtual User Report with detailed information on the activities of each virtual user, including the ability to view the actual responses from your application in real-time. Figure 2 - Virtual User Report The default Virtual User Reporting options include: VU Watch select the check box for one or more users to view HTML or XML responses in real-time. A new browser will be launched for each selected user. For performance reasons the Analysis Engine only begins collecting a user's response data for viewing once the user is selected; otherwise there is no response data available to view. As a result, you can only view responses for pages that have not yet been requested at the time the user was selected and while the test has a status of running. VU ID displays the unique identifier for each virtual user to help you better track individual activity. Hover your mouse over the VU ID to see the current status of the test run (e.g. running or completed). 4
Sessions displays the number of completed user sessions. A session is defined as the completion of a user scenario iteration. Hover your mouse over the session count to see the user scenario name the virtual user is running. Use this value to determine the amount of progress the user is making in completing a test run. Page Name displays the name of the current page the virtual user is viewing (i.e. requesting) or the status of thinking if the user is idle (usually due to a ramp-up delay). Hover your mouse over the page name to see the URL of the requested page. Use this option to determine the amount of progress the user is making in completing a session (i.e. user scenario). Max Time displays the maximum page view time for the virtual user. Hover your mouse over the max time value to see recommendations for areas that may require further investigation. Use this option to determine the slowest users for the test run. Last Error displays the last error the virtual user encountered. Hover your mouse over the last error to see the complete text for the error message. Use this option to determine which users encountered problems during the test run. Note: There are also two other options for this report (Test Run and User Scenario) that are only displayed when comparing results for two or more test runs. Hovering your mouse over the Test Run value will always display the start time and current status of the test. Figure 3 - Virtual User Watch 5
You can sort records within the Virtual User Report by selecting the column header for the option you wish to sort by. By default records are sorted in descending order. For example, selecting the Max Time column would sort the virtual users from slowest to fastest for max page view times showing the users that experienced the poorest performance first. To sort records in ascending order, simply hold the SHIFT key down while selecting the column header for the option you wish to sort by. For example, selecting the Max Time column while holding the SHIFT key would sort the virtual users from fastest to slowest for max page view times showing the users that experienced the best performance first. As a best practice in isolating bottlenecks, it is generally a good idea to sort virtual user records by Max Time and Last Error (see Figure 4) so that you know immediately which users are having problems. OpenLoad starts virtual users in the order of their VU ID, so users with a higher ID will start later in the test (how late depends on your ramping options) when there are already other users in progress making the errors they are experiencing more likely to be symptoms of server load conditions. Figure 4 - Virtual User Error Report It is also generally good practice to watch one or two users during the test to ensure the system is coming back with the expected data. Although the verify success and error features located in the Recorder are the most 6
effective means for ensuring the correct response content for all virtual users. For example, in Figure 4 we see that VU ID 2 encountered a content error even though we did not elect to watch that particular user. Notice that error conditions are automatically highlighted in red to quickly draw our attention to potential problem areas within our application. When using the Virtual User Watch feature (see Figure 3) you have the option of allowing the Analysis module to attempt rendering the HTML or XML content downloaded by the virtual user or to display the source of the response by going to View->VU Watch in the Analysis menu tool bar and selecting or deselecting the Render Content option. This option is available because it is not always possible for the Analysis module to properly render every response in the browser. The Analysis module attempts to redisplay the content as it is received from the server and therefore image requests with relative paths, or paths that are dynamically generated via JavaScript or protected by site security, may display as broken since they are not being requested directly from the server, but rather through OpenLoad. Figure 5 - Summary Report 7
Performance Benchmarking To investigate performance changes across different revisions of your web application, you can use a technique known as performance benchmarking. The OpenLoad Analysis Module allows you to benchmark performance by comparing results from different applications or multiple versions of the same application. Using a benchmark helps you see how changes made in subsequent versions of the code have affected the performance of your application. Although you can visually compare two or more result sets with any of the Analysis module reports, the Summary Report is specifically designed for benchmark comparisons. The Summary Report allows you to quickly identify if your applications performance, availability and reliability is meeting the criteria you defined in the Recorder and Controller. The Summary Report options include: Test Run Start Date displays the day and time the test run started. Test Run Status displays the current status of the test run (e.g. pending, running, completed or stopped). % Performance displays the percentage of page view times that met your Max Page Timeout criteria. Hover your mouse over this value if it is highlighted in red to see recommendations for areas that may require further investigation. Use this option to determine how well your site is performing. % Availability displays the percentage of times users connected to your application without error. Hover your mouse over this value if it is highlighted in red to see recommendations for areas that may require further investigation. Use this option to determine how available your site is under load. % Reliability displays the percentage of times users received responses without error. Hover your mouse over this value if it is highlighted in red to see recommendations for areas that may require further investigation. Use this option to determine how reliable your site is under load. % Page Time <= 4s displays the percentage of page views that required less than four (4) seconds to download. Hover your mouse over this value if it is highlighted in red or green to see recommendations for areas that may require further investigation. Use this option to determine how well your site's page performance stacks up to an industry benchmark for content sites. 4s < % Page Time <= 8s displays the percentage of page views that required more than four (4) seconds, but less than eight (8) seconds to download. Hover your mouse over this value if it is highlighted in red or green to see recommendations for areas that may require further investigation. Use this option to determine how well your site's page performance stacks up to an industry benchmark for e-business sites. 8s < % Page Time <= 12s displays the percentage of page views that required more than eight (8) seconds, but less than twelve (12) seconds to download. Hover your mouse over this value if it is highlighted in red or green to see recommendations for areas that may require further investigation. Use this option to determine how well your site's page performance stacks up to an industry benchmark for corporate sites. % Page Time > 12s displays the percentage of page views that required more than twelve (12) seconds to download. Hover your mouse over this value if it is highlighted in red or green to see recommendations for areas that may require further investigation. Use this option to determine how well your site's page performance stacks up to an industry benchmark for any site. Total Virtual Users displays the number of concurrent users. This value may or may not equal the number of running users. For example, if you are running a 500 user test, depending on think time, ramping and page time out settings, it's possible that not all 500 users may be making a request or receiving a response at the same moment in time (i.e. concurrent). Total Sessions displays the total number of completed sessions. A session is defined as the completion of a user scenario iteration. Total Page Views displays the total number of completed page views. Total Requests displays the total number of completed requests. Total Responses displays the total number of completed responses. Although it is possible to have more than one response returned for a request (e.g. HTTP redirect), the number of responses should always be at least equal to the number of requests. Hovering your mouse over this value warns you if this is not the case. In addition the request and response values will be highlighted in green as a warning. 8
Hits per Second displays the number of completed HTTP requests per second. Returns per Second displays the number of completed HTTP responses per second. For your system to be scalable, the number of hits per second should be equal or close to the number of returns per second. Hovering your mouse over this value warns you if this is not the case. In addition the request and response values will be highlighted in green as a warning. % 400 Level Errors displays the percentage of 400 level HTTP errors, including Not Found, Bad Request, Unauthorized, Forbidden, Method Not Allowed, Proxy Authentication Required, Request Timeout, etc. Any increase in this value will impact the % Reliability metric as well. % 500 Level Errors displays the percentage of 500 level HTTP errors, including Internal Server Error, Not Implemented, Bad Gateway, Service Unavailable, Gateway Timeout, HTTP Version Not Supported. Any increase in this value will impact the % Reliability metric as well. % Application Specific Errors displays the percentage of "Verify Success" mismatches or "Verify Error" matches. Any increase in this value will impact the % Reliability metric as well. Figure 6 - Summary Report Comparison You can easily compare different result sets by selecting the appropriate test runs from the Analysis menu tree (see Figure 6). When conducting performance benchmarks there are three key areas to focus on including Business Requirements, Page Time Distribution and Test Run Statistics. The Business Requirements 9
category allows you to quickly determine if new code or infrastructure changes make your application's performance, availability or reliability better, worse or the same. The Page Time Distribution category provides details on how changes to your application impact the end-user's perception of site performance. For example, as you make performance enhancements to your application, you may start to notice the percentage of page times in the upper bands (8-12s) gradually shift to the lower bands (0-4s and 4-8s), which gives you a more detailed understanding of how much performance is actually improving from the end user's perspective. Lastly, Test Run Statistics enable you to accurately benchmark the responsiveness of your application. For example, if you made a change that resulted in an overall decrease in page times, you would expect the number of completed sessions, page views, requests and responses to be higher for a timed test. So for an e- commerce site, an increase in performance could mean you are able to process X% more orders within the same time period. Figure 7 - Detailed Report Isolating the Bottleneck A critical step in the optimization process is understanding which pages are prone to errors or poor performance. The OpenLoad Analysis Module provides a Detailed Report to help you isolate these types of bottlenecks. By looking at this report, you can quickly identify which pages have problems and what they 10
are. The example in Figure 7 shows that the login page and buy page for the Trader and Broker scenarios have page times that exceeded our thresholds and that the login page for the Trader scenario in particular encountered an error, which is displayed when the mouse hovers over it. The optimization process should be focused on those pages first. The default Detailed Reporting options include: User Scenario displays the name of the user scenario. Page Name displays the name of the page associated with the user scenario. Hover your mouse over the page name to see the URL of the request associated with this page. Avg Time displays the average page view time. Hover your mouse over the average time value if highlighted in red to see recommendations for areas that may require further investigation. Use this option to determine which are the slowest pages for the test run. Max Time displays the maximum page view time. Hover your mouse over the max time value if highlighted in red to see recommendations for areas that may require further investigation. Use this option to determine which are the slowest pages for the test run. % Timeouts displays the percentage of page timeouts. Hover your mouse over the timeout value if highlighted in red to see recommendations for areas that may require further investigation. Use this option to determine which are the poorest performing pages for the test run. % Errors displays the percentage of page errors. Hover your mouse over this value to see the complete message for one or more errors encountered on the page. Use this option to determine which pages are prone to error. Note: There is also another option for this report (Test Run) that is only displayed when comparing results for two or more test runs. Hovering your mouse over the Test Run value will always display the start time and current status of the test. 11
Drill-down Analysis The next step is to continue the drill-down and start to differentiate between the various types of issues that may impact site performance, availability, reliability, scalability or capacity. In this phase you would like to learn what trends within your application can cause response times to exceed expectations or response content check points to fail. For example, long response times could be due to a slow or saturated CPU, or due to exhausted memory. Correlating end-user performance metrics with machine response times may enable you to investigate those assumptions. Figure 8 - Performance Report The Graph Report enables you to identify and focus on a specific area within your application (performance, availability, reliability, scalability or capacity) that does not meet expectations. Within each of these subreport types there are two views (or perspectives); virtual user and server. The former allows you to explore various metrics related to the user's experience, while the latter provides the means to graph metrics related to the server's performance. Once you have identified a specific report you want to focus on, you can drill down to take a closer look at user or server behavior over time or user load (see Figure 8). The Graph Report is comprised of five (5) primary sub-reports: Performance Report provides user and server metrics related to the performance (i.e. timing) of your application such as session time, page time, round-trip time, CPU time, etc (see Figure 8). 12
Availability Report provides user and server metrics related to the availability (i.e. connectivity) of your application such as dropped, refused and timed out connection problems (see Figure 11). Reliability Report provides user and server metrics related to the reliability (i.e. integrity) of your application such as malformed, dropped, not found, error and timed out responses (see Figure 12). Scalability Report provides user and server metrics related to the scalability of your application such as the number of HTTP request and response operations completed per second (see Figure 13). Capacity Report provides user and server metrics related to the capacity of your application such as the amount of bandwidth consumed over time or user load (see Figure 14). Figure 9 - Export Reports To learn the meaning of a particular metric, simply hover your mouse over the label in the tree menu. You can also get more specifics on the actual data values that are plotted by motioning your mouse over the data point (see Figure 11). In addition, you have the ability to zoom in and out by selecting the desired area within the graph and holding down your right mouse button to display the graph menu options. All graphs and their respective raw data values can be exported by going to Report->Export in the Analysis menu tool bar and selecting either the PDF or Spreadsheet option. Although any report type may be exported, including Virtual User, Summary and Detailed Reports, the ability to export raw data values for a 13
graph is particularly useful for feeding information into third-party reporting packages or providing PDF views of performance data to other stakeholders. As you begin analyzing the various performance trends of your application, having the ability to focus in on a particular user scenario, page or request can be very useful. To facilitate this, the Analysis Module provides filtering options, which you can enable simply deselecting the items you wish to filter out of your data set (see Figure 10). Figure 10 - Filtering Report Options 14
Figure 11 - Availability Report 15
Figure 12 - Reliability Report 16
Figure 13 - Scalability Report 17
Figure 14 - Capacity Report 18
Conclusion OpenLoad is an enterprise web performance testing solution designed specifically for improving the performance of J2EE and.net applications. The OpenLoad Analysis Module helps programmers gain insight into backend web infrastructure hotspots and bottlenecks, as they develop their distributed applications during their performance/stress testing. OpenDemand and OpenLoad are trademarks or registered trademarks of OpenDemand Systems, Inc. *Other names and brands may be claimed as the property of others. Copyright 2000-2004, OpenDemand Corporation, All Rights Reserved. 19