Developing a Load Testing Strategy

Developing a Load Testing Strategy Michele Ruel St.George Bank CMGA 2005 Page 1

Overview... 3 What is load testing?... 4 Scalability Test... 4 Sustainability/Soak Test... 4 Comparison Test... 4 Worst Case... 4 Spike Test/Break Point... 4 Back-end Slowdown Test... 4 Performance Regression/Benchmark Test... 4 Why do we need Load Testing?... 5 How do we start Load Testing?... 6 Where does it fit in the organisation?... 6 Who runs the tests?... 6 Who decides if the test is successful?... 6 Who gathers information?... 6 When do you test?... 7 Software... 7 Hardware... 8 Anatomy of a Load Test... 8 Planning... 8 Reporting... 12 Testing Archive... 12 And how's it going at St.George Bank?... 12 Where we are in designing... 12 And how did your first big test go?... 12 But we do already have some lessons we've learned... 13 Conclusion... 13 Page 2

Overview The purpose of this paper is to describe how St.George Bank is creating a load test strategy to fit into its corporate framework. It will define load testing, suggest when to do load testing, highlight the considerations around load testing, and describe our experience during our first big load test. Page 3

What is load testing? Load testing is a method of applying a (usually synthetic) transaction load to an application system to test various performance and stability aspects of code and architecture. Here is a partial list of types of tests: Scalability Test These ramp-up tests determine the extent to which the application scales before reaching a choking point. This test will give us the bottleneck or limiting factor to performance. This could be CPU, memory or other system resources of the server, frontend or backend throughput limitation, or application limitations (e.g. Queue size, connection max, etc.). Sustainability/Soak Test This is a long running-test under low, medium and high load that determines if the application exhibits problems over an extended period of time (12-48 hours). It will tell us if there are any problems like memory leaks or other resource usage problems. Comparison Test Establish the impact/overhead of the new version of the application over the existing version. This test is done only if there is an existing version of the application. However, we have to make sure that we are comparing apples to apples as newer versions could have a different implementation from the old and/or the behavior of the application could differ. Worst Case This tests behaviour of a worst case scenario of the application under load. All of the parameters are at their maximum values. All other tests should be done with a vanilla scenario. This test could bring to light problems with the application that may have not been caught. This is basically boundary condition testing. Spike Test/Break Point This test helps to view the impact of a spiking load on the application behaviour/performance. It determines if a sudden increase in load causes any functional/performance problems. Back-end Slowdown Test Determine the impact of a back-end slowdown on the application performance. The back end server is artificially limited. These can be short ramp-up tests or long running tests. This test could bring out code defects that would not show up under normal conditions. Performance Regression/Benchmark Test This is particularly important for critical scenarios of the application that should be run as a performance regression mix to see the impact on overall performance and capacity. This test will tell us the overhead of the new release over the old one. Also this test could be run when doing optimization or tuning. Page 4

Why do we need Load Testing? Load testing done well will greatly improve the end product of your applications. If it is incorporated into the development framework, users will see a marked improvement in the systems that they work on every day. Those errors and system disasters that are always feared as an application moves to productions will no longer show up, because production loads and hardware will have been provend. Often organisations come to load testing after a very painful implementation occurred or has witnessed some other organisation s very painful implementation. Incorporating load testing is a lot of work to undertake to be sure of your code and environment, but consider the time it takes to constantly fix and explain to your customers why their application is so slow or unstable. The next section has a list of times when this type of automated testing would be userful. Page 5

How do we start Load Testing? Where does it fit in the organisation? Here we are talking about management of load testing activities. This could be the most important decision that you make. It will greatly influence the success of the testing. Think about the structure of your organisation and the politics that are involved. Most types of testing take place within the development area of the IT hierarchy, but most studies of this placement point to problems with conflicting goals. The project s success is measured by the on time, under budget rule. This goal usually limits the amount of testing allowed and skews the results of any testing done to positive. When the application then goes to production, all the load- and stress-related problems are solved there, with attendant bad experience for the users. The alternative is to place this function in an area that is separated from the development management, for instance, in production operations, the testing organisation, or capacity planning. If this is done, testing can be more objective and, if major performance or stabiliy problems are encountered, production implementation delay can be debated rather than allowed to continue even to meet budget or dates. Who runs the tests? Development Development could be a good choice because they have the best access to application architecture and can easily create the scripts that run the load tests. This way, load testing procedures can be developed as the application is developed and used for unit and system testing. The testing organisation This would seem to be the perfect area for load testing management, unless they are report to development. They will understand the importance of different types of tests and will have a methodology to structure and track their testing. Who decides if the test is successful? This is probably a committee decision. There should be a group composed of interested parties. project management, production support, capacity planning, and the testing organisation. This will be discussed later under exit criteria. Who gathers information? Once again, you must look to your organisation chart for the answer to this question. The type of data that is needed is business usage metrics, and then they must be translated to system load. Places in the typical IT organisation where people are used to dealing with these questions are capacity planning and performance analysis. Your production support organisation may also have the business contacts and the experience to translate these numbers to system load. Page 6

When do you test? During development We have developed a Risk Matrix that helps us to estimate the time needed for load testing and the types of load tests to run. If we know a particular bit of the system is prone to cause performance problems, or we are doing something like running an unpredictable query on our mainframe, we will plan on using these synthetic transactions to drive development tests. Before implementation This is the traditional time to do load testing. Our plan is to do several iterations and tests in a preliminary phase to shake out the scripts and determine code readiness for the final test. The next step would be to do a final load test at the end of customer acceptance testing. This test will have a go-no go decision on implementation; the passing (exit) criteria are set up at the beginning of load test planning; a group of concerned people from different areas will make this decision, so that development s bias for implementing and fixing in production will not be a factor. Before peak periods After the development testing requirements are finished, you can then move on to planning load tests to run before each important systems peak times, for instance, before the Christmas peak, to be sure that it will work smoothly during these times. At specific intervals For some systems that are fairly stable in terms of not much development going on, it might be useful to just schedule a yearly test to make sure that they re not about to break under increased load or different usage. This is also useful in that the planning of the test will ensure at least yearly contact with the business to see if there are any changes in usage metrics planned. Software There are a number of products that are available to assist you in running a load test; it is outside the charter of this paper to discuss specific software. Suffice to say that these products are of a level of sophistication now that they can not only control the injection of load into the system, but also collect system metrics during the run and collate the response times of each of the transactions. The software goes through a learning phase in which screen movements and choices are captured. This is then translated into a script that should be modified to vary input and transaction choices. If the software you choose doesn t have built-in metric measurement, then you must consider how you are going to monitor the test systems for load; perhaps your production organisation has tools that you can use in test. When you re choosing software, you must consider what types of systems you will be testing: web-based, mainframe-based, or other platforms. You will probably have to buy a certain number of licences for virtual users, so make sure you ll have enough to apply the peak load to the systems. Software varies widely in cost, so that will also be a big factor in your decision. Page 7

Hardware Hardware, ah hardware this is pivot on which all load testing turns. If your test system hardware is inadequate, or unstable, you will not be able to trust your test numbers. There are two main test hardware configurations to consider: A system exactly like the production system, duplicating network and system hardware and software, using production-size databases A test system that can easily be extrapolated to the production size, for instance, a complete system at half or a quarter the size of production. Keep in mind that the further away from production configuration the system gets, the harder to draw conclusions it will be. For this you might consider capacity planning modelling software to compare the test to the production performance. Be creative about where you get hardware from; perhaps you have older hardware that you can borrow to set up the system; if you re implementing a new system on new hardware, ask to bring in the production hardware earlier and test on it. Just keep in mind that this hardware will have to be readied for production, so build that into your schedule. Anatomy of a Load Test Planning Here are the minimal topics that you must think about before you run a load test: Each one is discussed in the following sections. Business metrics and translation to application load Software Hardware Test strategy: o What to test o What types of tests to run Entry criteria (what is needed before you start the formal testing) Exit criteria (how to know the load testing is over) Reporting Test archive The hard part: business metrics Usage Modelling is a the estimation of an application s production load, in terms of user volumes and transaction rates. These numbers are required to ensure load test scenarios apply a realistic work mix, running at realistic transaction rates. In addition, Exit Criteria are specified at loads determined by Usage Modelling this ensures that the response times we measure as a result can be compared against those specified in our Exit Criteria. Page 8

Capacity Planning is the logical organisation to take ownership of this phase and responsibility for its delivery to both load testing and other appropriate customer. They are the people who will probably have the data from the production systems, and they will also probably have the contacts in the business organisation. Who do you get the metrics from? Well, the business, of course? No? The customers should have these numbers. They should have been part of the justification of the project, shouldn t they? The answer here is get the best figures you can from the best people available. Start with the project sponsors. Find out how they justified the project. If this is a replacement application, then someone probably has these statistics somewhere. Maybe capacity planning has them, maybe production support has them, and maybe there are some key customers who know them. In the end, though, you may have to just make an educated guess. Remember to document the process and the outcomes. Give each metric a confidence value, so you can refine how you estimate in the future, and keep notes on where you got them. How do you verify them? Huh? Verify them? I had enough trouble getting them together. This has to be part of this fact-finding process; it s you chance to be a detective. Apply reasonability tests to the final figures. If it s an internal application, and the number of users is more than the number of employees, maybe there s something wrong! Really, dig a little further; ask another person or different questions; think about the relationship of the transaction numbers and evaluate if they seem reasonable. And, most important of all, have your customers, development, and capacity planning sign off on them. What are the minimal metrics needed? Seasonal peak date(s) and usage This is usually a percentage over average use, and maybe there are several peaks, so make sure you have all of them. There may be peaks with different usage criteria. Daily peak time(s) and usage Number of users Batch job schedule and usage System interfaces This is important from the user standpoint and also from the application designer standpoint. It will give you early warning of the extent of your test environment and the scope of your test. Transaction numbers for all the important transactions Response time expectation. It may be necessary to break this down by transaction type, like logon, and short medium, and long transactions. (Since customers only experience response times as comparative, it might be useful to have the technical people come up with standard response times by type of application, and then those numbers can be the basis for that discussion.) If you have a current system, this is easier. Page 9

Software Check to see you have the correct types of user platforms for this project and enough virtual users for your peaks. If you don t have enough users, plan on purchasing more or devise a strategy to use the ones you have in some creative way. Hardware Plan the hardware you need; start early to be sure that it s available when you need it and be adamant about the time you need to exclusively use it. Make sure that you have the technical people available to set it up and test it before you plan to do the formal test. Test Planning What to test Each project is different. The place to start would be assessing the risks in the project. Make sure you test the really risky pieces of the application, which would be new architecture, new programs, or any new or complicated system interfaces. After that, look at the high volume or resource-intensive transactions, and then everything else. Make sure to prioritise your tests. You probably won t have time for all of them. What types of tests to do Once again, this should be a product of your risk assessment. If you re concerned about memory leaks in a windows web server environment, for instance, you would do soak tests. If this is a customer-facing web application, you ll want to do extensive break and peak load testing Entry Criteria Entry Criteria are the conditions needed to start the formal testing. There is a standard set that you will probably use is: Stable code Stable hardware Data availability and quality Completion of metric collection This is an area where you have to be firm. You will probably be pushed to start without stable code and hardware, and you may ease into what other people think is formal load testing by helping with some functional testing. Make a point of reminding people that you are not in formal load testing if you haven t had these criteria met. It would be a good idea to establish this with each test team early in the project; get signoff on these criteria from all interested parties business, production, development, and capacity planning. Exit Criteria Formal load testing passes or fails depending on whether or not Exit (or Success) Criteria are satisfied. It is therefore crucial to seek agreement before commencing load testing. Page 10

Typical Exit Criteria specifications include: Acceptable response times slowest acceptable response times (expressed as 90 th percentile figures) for a range of transactions at peak load Acceptable server resource limits typically CPU limits at peak load. Exit Criteria Specification requires: Completion of the Metrics exercise a clear picture of Peak Load is required Stakeholder involvement with participation from Business, Project, Production support and load testing. Refinement with successive project experience. Schedule Load testing must be planned into the project schedule, so it is necessary to have a tool that will allow the project planners to gauge how long load testing will take. We have developed a risk matrix tool that will allow the planners to estimate how long testing will take. It takes into account various elements of risk that should be considered in planning a project schedule. This is still in its formative stages, and we expect there will be much more refinement as we do more load tests. Page 11

Reporting Reporting should be done daily to assess progress. Reports should go to all of the load test committee. At end of the load test schedule, a full report should be sent to all of the concerned project participants detailing the tests done and their outcomes. If the exit committee decides that the project has failed testing, then detailed reasons for the failure must be given as well as the tests which have to be repeated to pass load testing. Testing Archive Consideration should be given as to what data should be saved during the tests and beyond. If this information is saved, it can be used for regression testing and release load testing. Scripts While it will probably be necessary to regenerate the scripts for each new round of load testing, notes should be kept on adjustments and procedures used to accomplish various script items, such as how unique new records were generated, and how problems were solved Reports. Reports and data from major tests should be saved so they can be compared with the next round of test. And how's it going at St.George Bank? Where we are in designing Our processes keep changing, so we're aiming at a moving target. We re also trying to keep up with organisation changes, so we remain flexible. We have created a risk matrix tool for the estimators and are working with our risk planners to incorporate all of our concerns into one tool. And how did your first big test go? We are still immersed in a large CRM implementation. Our load testing core workgroup consisted of : One experienced load test planner One or two experienced load test scripter(s) One capacity planner One or two vendor software experts Production support people for our platforms, the network, and web operations. One problem we had with personnel was that because this project was so long, contract people kept changing and personal plans took people away from the team at crucial times. We have been testing for three months, first on the production platform, and then moving to a staging platform. This also turned out to be quite a problem Page 12

because we moved from the production platform to start the pilot. The stagining platform had not been used before and took forever to configure because all of our experts were off fixing pilot problems. But we do already have some lessons we've learned Business metrics: do them early, do them often and VERIFY them Try to insure that LT is separate from development Be tough about your entry and exit criteria Make sure technical experts are available to fix the hardware and application Check with your technical experts to see what they use as monitoring tools. Conclusion Load Testing can be of great value to an organisation in terms of guaranteed implementation success, stability and performance of an application. As expected in this type of outcome, this guarantee comes only from rigorous adherence to certain requirements: entry and exit criteria, and well-designed and executed tests. Page 13