1 Precise i 3 A Performance Management Methodology Precise i 3 A Performance Management Methodology Copyright 2003 Precise Software Solutions Ltd. All rights reserved.
2 Information in this document is subject to change without notice and does not represent a commitment on the part of Precise Software Solutions Ltd. The software described in this document is furnished under a license agreement or non-disclosure agreement. It is against the law to copy the software except as specifically allowed in the license or non-disclosure agreement. No part of this manual may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, for any purpose without the express written permission of Precise Software Solutions Ltd. Copyright 2003 by Precise Software Solutions Ltd. ( Precise ). All rights reserved. Precise/Interpoint is a registered trademark, and the Precise logo, Precise/Indepth, Precise i 3 and Performance Warehouse are trademarks of Precise Software Solutions Ltd. PC/TCP is a registered trademark of FTP Software, Inc. Microsoft, Windows, Windows 95, Windows 98, Windows NT, Windows 2000, and SQL Server 2000 are either registered trademarks or trademarks of Microsoft Corporation. Adobe, Acrobat and Acrobat Reader are trademarks of Adobe Systems Incorporated. SAP R/3 is a registered trademark of SAP AG. All other trademarks used herein are the property of their respective owners. Document No. IAAL620-M00
3 Precise i 3 A Performance Management Methodology iii Table of Contents Chapter 1: Introduction... 1 Overview... 2 Part 1: Performance management... 5 Chapter 2: Performance problems and their sources... 7 Performance red zones... 8 Design and deployment problems... 9 System change problems Environment change problems Usage problems Summary Chapter 3: Managing performance Overview A management analogy Reactive management Proactive management Preventive management A hybrid approach to performance management Chapter 4: The Precise methodology Overview... 26
4 iv Table of Contents The performance management methodology Precise i The stages Stage 1 Detect Stage 2 Find Stage 3 Focus Stage 4 Improve Stage 5 Verify Part 2: Precise i 3 performance management methodology Chapter 5: Detecting performance problems Overview Performance reviews with Precise i Top-n reports Trend reports Exception reports Proactive alerting with Precise i Status at a glance Status drill-down Preliminary investigation Alert customization Reactive management with Precise i Chapter 6: Finding performance problems Overview Black Box Analysis Analysis perspectives Finding with Precise i Analyzing performance and load... 66
5 Precise i 3 A Performance Management Methodology v Examining application activities Examining infrastructure behavior Chapter 7: Focusing on causes Overview Focusing with Precise i Pinpointing the root cause Focus within a database tier Precise/Indepth for Oracle Focus within an application tier Precise/Indepth for J2EE Focus within a middleware tier Insight/Tuxedo Savvy Module Looking elsewhere Chapter 8: Improving performance Overview Improving with Precise i Precise/Indepth and Precise/Savvy Improving with Precise/Insight Chapter 9: Verifying the solution Overview Verifying with Precise i Verifying root cause resolution Verify within a database tier Indepth/Oracle Verify within an application tier Indepth/J2EE Verify within a middleware tier Insight/Tuxedo Savvy module Verifying problem elimination Verify a solution to a preventive-type problem Verify a solution to a proactive-type problem Verify a solution to a reactive-type problem
6 vi Table of Contents Part 3: Methodology in action Chapter 10: Case study Background Business environment Technical environment Detect Find Focus Improve Verify Chapter 11: Case study Background Business environment Technical environment Detect Find Focus Improve Verify
7 Introduction Overview
8 2 Chapter 1 Introduction Overview Information systems are the main business process platform in practically any modern organization. These computer-driven systems are no longer nice-to-have management toys but vehicles to carry out most organizational tasks. Information systems automate tasks from marketing and sales to manufacturing, bookkeeping and customer support. Information systems are the primary medium of communication within and between organizations. In many cases, the World Wide Web and e-commerce have created businesses that are entirely founded on globally accessible information systems. Organizations have acknowledged the increasing significance of information systems by making organizational changes: appointing Chief Information Officers (CIOs) and allocating significant budget to the development and implementation of information systems. The concept of information system performance is not hard to understand: performance of information systems may be defined as the time taken to complete various tasks. Making the distinction between systems that perform well and systems that perform poorly is more complicated. The system has to perform tasks in times that meet the organization's goals, and systems that slow business processes are obviously not performing well. Given the resources invested in information systems, many organizations also expect to see a speed-up in business processes and an increase in profits as a return on this investment. In many cases, for a system to perform well, it is necessary to use its IT infrastructure optimally and meet set performance goals. Performance problems are now seen as major business problems, and organizations can associate a price tag with performance inefficiencies. By slowing down business processes, poorly performing information systems can become an obstacle to achieving business goals. Slowed manufacturing, hold-ups in shipping, and delayed financial processing are all examples of how performance can affect overall business productivity and profitability.
9 Precise i 3 A Performance Management Methodology 3 Because many information systems from Customer Resource Management and help desks to online banking affect customer experience, poor performance often translates into poor customer service. On the other hand, well-performing information systems enhance quality service and provide organizations with a competitive edge. Many Internet-based organizations, for example, have found that the performance of their sites is a primary factor for customer loyalty. Poor performance also affects return-on-investment. As a result of performance problems, IT staff are distracted from their ongoing tasks, and consultants' fees and hardware upgrades add to the cost of poor performance. To address performance problems, we need to have ways to measure and assess performance and ways to improve it and prevent recurrences. Precise i 3 is a complete performance management system that provides a structured way to achieve these goals.
11 Performance management
13 Performance problems and their sources
14 8 Chapter 2 Performance problems and their sources Performance red zones System performance can be measured directly, as the actual response times of various components. However, if response times are hard to measure directly, other metrics can measure the efficiency of system components, reflecting the performance of the system. In our discussions about performance problems, we will use the term performance red zone to describe situations where the system fails to meet its performance goals. Some organizations have formally documented performance red zones perhaps expressed as a service level agreement. In other organizations, the red zone may not be formalized. The principle, however, is the same: as soon as performance enters the red zone, the organization incurs performance-related costs.
15 Precise i 3 A Performance Management Methodology 9 Design and deployment problems Many performance problems arise from issues that are not attended to during system development and deployment and that find their way into production. Because design and implementation processes are hardly ever perfect, it is not uncommon to see systems start their life in production within the performance red zone. There are many factors that can lead to shaky deployment, including design flaws, insufficient or inappropriate testing, and failure to effectively use test results to make improvements. Performance during the early days is crucial not only because of the direct business costs, but also because a bad experience with deployment usually results in loss of confidence and credibility. These effects can take a long time to overcome. From a performance management point of view, there is a dual challenge involved in system deployment. First, the development process must incorporate procedures to identify and resolve as many of the potential performance problems as early as possible, before they enter production. Second, appropriate procedures are required to quickly identify and rectify remaining problems after deployment.
16 10 Chapter 2 Performance problems and their sources System change problems A second category of performance problems comprises those resulting from changes introduced into a system already in production. Changes in information systems may be made for several reasons and can take different forms. Some changes reflect altered business processes while others enhance functionality or fix problems. These changes can be made to hardware (for example, server upgrade), software (operating system or database upgrade), or application (implementation of new modules or fixes). Any of these changes can result in performance problems so that a system that has been stable outside the performance red zone can be driven into it as a result of a change. Although individual system changes are sometimes perceived as minor, their adverse effects on the overall system may be large. The assumption that the changes are insignificant in magnitude often results in testing and quality assurance procedures that are inadequate and focus on small parts of the system. In reality, even isolated changes can turn out to have an impact that is much wider than expected. Controlling system change is a double challenge for performance managers: during the design and implementation of the change, performance needs to be taken into account, and testing procedures need to be used to identify as many of the performance issues as possible. Once the change has been implemented in production, it is important to develop the process to quickly identify and solve any remaining problems.
17 Precise i 3 A Performance Management Methodology 11 Environment change problems The third source of performance problems is the system's environment. Environmental changes include increases in system load or altered usage patterns. Unlike system changes, which are introduced in a controlled manner, environment changes are usually gradual, can occur in hard-to-predict times, and are almost impossible to control or test. In some cases, environmental changes may have little or no effect on system performance. Other times, gradual changes (such as increase in system load) will cause a degradation in performance that could eventually see it entering the red zone. In extreme cases, even a small change in an environmental variable may be all it needs to cause a sudden and dramatic deterioration in system performance (this level is often referred to as a tipping point ). In terms of performance management, we are again faced with a double challenge. First, environment changes need to be constantly monitored and their effects on the system analyzed. If an environment change adversely affects system performance, a process for identifying the root cause and resolving the problem has to be applied.
18 12 Chapter 2 Performance problems and their sources Usage problems In some cases, performance problems are a result of the organization's failure to align its IT resources with its business priorities. When this happens, business processes suffer as a result of system resources being used by activities that are not missioncritical. Businesses often use a shared IT infrastructure to achieve different business objectives. One network is used for browsing the Internet, downloading work-related material, and sending and receiving s. The same web servers are used to display corporate news, investor related information, and support e-business processes. These activities have different priorities; it is important to make sure the priorities are reflected in the way the common infrastructure is used. A resource such as a network or a database can be impeccably managed and administered, and yet the organization can achieve suboptimal business results. This is typical because of high-priority activities being compromised by uncontrolled or inappropriate usage of system resources. Usage-induced problems introduce another problem-solving area not encountered with design, system, or environment changes. Not only must there be a way to identify performance problems and associate them with usage patterns of the organization, there must also be a way to associate IT activities with business activities and establish correct communication channels and a common language between IT and business personnel.
19 Precise i 3 A Performance Management Methodology 13 Summary Performance can be measured by using a set of metrics to maintain levels below a predetermined 'red zone.' Where they occur, performance problems can be categorized into one of four problem areas design and deployment, system, environment, and usage, each presenting its own challenges in resolving problems effectively. Performance management comprises three core activities: reactive, proactive, and preventive. Precise i 3 provides the techniques and the tools that are necessary to carry out these activities using a structured, methodical, and holistic approach.
21 Managing performance
22 16 Chapter 3 Managing performance Overview We know that poor performance is costly to an organization and painful to IT staff. Performance in the unacceptable 'red zone' means the organization's ability to achieve its goals is hindered, and in extreme cases, operations can come to a complete halt. Because of the potential urgency of performance problems, IT staff need to address them without delay, making it more difficult to focus on long-term tasks and often affecting their quality of life. Performance management comprises activities aimed at minimizing and ideally eliminating periods where system performance is in the red zone. These activities may be a reaction to problems or can help in predicting and avoiding them.
23 Precise i 3 A Performance Management Methodology 17 A management analogy To better understand the distinction between the three components of performance management, consider the activities involved in maintaining your car. Although strict maintenance and prompt reaction to problems should keep your car running smoothly, sudden failures can occur. Once this happens, it is important to make sure the fault is properly fixed while minimizing the costs and time involved. A variety of steps can be taken, from using a licensed mechanic to insisting on original parts. In the case of a light bulb failure or a flat tire, an immediate on-hand solution may be possible. All these actions are of a reactive nature, where a fault is acted upon after it occurs. To avoid the risk and inconvenience of major faults as well as the costs of major repairs, most cars can alert you to undesirable conditions through warning lamps and gauges. These indicators eliminate the need to repeatedly run extensive checks of your entire car by alerting you to the development of specific problems. By having problems fixed promptly, you can avoid risk and cut costs. The warning lamp mechanism supports a proactive approach to car maintenance. Most people consider having their car serviced periodically as a prerequisite to avoid unnecessary maintenance costs. The periodic service acts as a health check, verifying that there are no unnoticed failures in any systems, and performs maintenance on the most crucial components. These activities are preventive in nature: we are hoping to significantly reduce the likelihood of problems occurring.
24 18 Chapter 3 Managing performance Reactive management The complexity of information systems and the dynamic business environment dictate that performance problems are a part of everyday life. Reactive performance management comprises the mechanisms and the processes aimed at effectively solving such problems when they occur. Reactive performance management is a structured, planned process that focuses on two main objectives: Maximize the alertness of IT staff, to minimize the time-to-detection of performance problems. Improve the skill and competence of IT staff, and equip them with the right tools to minimize the time-to-resolution once a problem has been detected. The graph in Figure 1 illustrates these points. The reactive process starts when performance enters the red zone at t0. The organization should minimize the time the problem remains unnoticed until t1, the time it takes to locate the root cause of the problem at t2, and the time it takes to solve it at t3. By minimizing these three intervals, the total time the system spends in the red zone is minimized. Figure 1: Reactive performance management
25 Precise i 3 A Performance Management Methodology 19 Proactive management Since reactive performance management activities are triggered by a performance problem, the system will inherently spend some time in the performance red zone until the problem is identified and resolved. Close monitoring of your system will in many cases detect that problems are likely to occur. Being able to predict performance problems, and promptly take precautions, can enable IT staff to start activities to resolve the issue before the red zone is entered. This can reduce or even eliminate the red zone time of the system. Once the alerting threshold has been exceeded, as shown in Figure 2, the process to identify and solve the problem commences. Because the system has not yet reached a problematic stage at detection time, IT staff are able to make a more educated, less hasty choice of solution. Figure 2: Proactive performance management
26 20 Chapter 3 Managing performance Establishing a proactive performance management mechanism involves three stages: 1 Determine the parameters that can be used to measure the system's performance or indicate potential performance problems. 2 Establish the threshold level for the parameters, which will effectively be set below the edge of the red zone. This stage is somewhat of a balancing act: if thresholds are set too high, alerts will be too late; too low, and there could be a flood of false alarms. 3 Decide the mechanics of alerting that a threshold has been exceeded, including who is alerted and through what channel. It is equally important to have the processes and tools in place to deal with performance issues once an alert for a potential problem has been issued.
27 Precise i 3 A Performance Management Methodology 21 Preventive management The preventive side of performance management is the equivalent of having your car periodically serviced or seeing your dentist twice a year. Prevention should be part of your routine activities to guarantee ongoing system health. The preventive approach focuses on minimizing the probability of performance problems in the first place. Of the three categories of performance management, it is the most controlled. The cost of problem resolution increases with its urgency: under pressure, the range of solutions is narrower, as is the ability to make educated, cost-effective choices. By taking a preventive approach and dedicating a proportion of IT resources to performance management on a regular basis, it is possible to significantly reduce the probability of performance approaching the system's red zone. This in turn minimizes the costs created by performance problems. Similar to car maintenance, preventive activities naturally focus on those system components that are more likely to suffer from problems or are more crucial to the smooth operation of the system. Preventive management means that system performance is monitored over time, as shown in Figure 3, and system health checks are carried out at regular intervals to determine performance and to prioritize issues. Based on these priorities, IT management can dedicate a proportion of staff time to the most crucial matters. Periodic health checks can also identify trends in system performance, forecast future performance-related issues, and kick off longer-term initiatives to deal with.
28 22 Chapter 3 Managing performance Figure 3: Preventive performance management
29 Precise i 3 A Performance Management Methodology 23 A hybrid approach to performance management As with good car maintenance, performance management does not consist of a single type of activity. The three components of good management reactive, proactive, and preventive complement one another and should be combined to guarantee that service levels are met effectively and efficiently. The three types of management activities feed one another, as shown in Figure 4. Figure 4: Hybrid approach to performance management The starting point of the process is the ability to effectively solve a performance problem. Based on the experience gained in reacting to problems, a set of criteria can be developed to support a proactive alerting mechanism.
30 24 Chapter 3 Managing performance The ability to predict problems can then be used to identify main danger zones or sensitive components and establish activities to monitor them closely. Such preventive measures will in turn cause the number of potential problems and alerts to decline. And the more effective the alerting mechanism becomes, the less likely it is to require more costly reactive tools. Once a comprehensive, hybrid performance management process has been established, you will see a decline over time in 'fire-fighting' situations. Most performance problems will either present themselves early enough for you to take appropriate precautions, or they will not occur in the first place as a result of preventive activities.
31 The Precise methodology Overview The performance management methodology Precise i 3 The stages
32 26 Chapter 4 The Precise methodology Overview The computer age has brought organizations to a complete reliance on their information systems. These systems no longer merely support a business process; in many cases, they are the business process. System unavailability brings business to a halt while poor IT performance is translated into poor business performance and reduced profitability. To guarantee that the investment in information systems yields the expected business return, it is necessary for organizations to ensure systems perform at their best. Performance management is defined as the task of minimizing the interference of performance problems with business operations. Modern IT infrastructures and the interdependence between the components that comprise business applications have both grown significantly more complex. In such an environment, intuition is unlikely to be sufficient to deal effectively with performance problems. To efficiently resolve performance problems, an organization requires expertise and experience from many departments. More importantly, it requires a structured, planned approach to all aspects of managing performance. A performance management methodology is a structured way to effectively achieve the performance management goals. A comprehensive methodology consists of two equally important components. The first is the conceptual process, which includes the tasks to perform and the measurements to make under different performance-related circumstances. The second component comprises the toolkit that is needed to support this process.
33 Precise i 3 A Performance Management Methodology 27 The performance management methodology Some organizations have already acknowledged the importance of performance as a discipline and have implemented a performance management methodology to address their most crucial issues. It is vital for all organizations to understand the importance of a holistic approach to coping with performance problems, and to realize the inherent ineffectiveness of ad-hoc fixes and problem-solving. All organizations can benefit from the discipline which Precise i 3 brings to the performance management process, and can take advantage of the toolkit it provides to support these processes. The Precise performance management methodology enables you to use a proven process, as well as proven supporting tools, to implement effective performance management within your organization.
34 28 Chapter 4 The Precise methodology Precise i 3 Precise i 3 is a comprehensive suite of performance management products. By using Precise i 3 to implement performance management, you ensure that your activities are consistent and well supported by your toolkit. The seamless integration between the various Precise i 3 products means that they can be used consistently throughout the whole process. Precise i 3 consists of three components that together support the goals of the performance management methodology. These components are Insight, Indepth and Inform. Precise/Insight is a high-level, system-wide analysis tool. It can help you analyze the activities of end-users, the service level they experience, and the impact their activities have on the various components of your system. Precise/Indepth is a product family that helps you perform a thorough investigation of the performance and inefficiencies of various components in your system. Precise/Inform consists of products that help you promptly identify performance problems, potential performance hazards, and components that will benefit from tuning.
35 Precise i 3 A Performance Management Methodology 29 The stages The performance management comprises five stages: Detect Identify the symptoms that could indicate a performance problem. Find Identify the source of the problem. Focus Discover the root cause of the problem. Improve Take the steps required to improve performance. Verify Make sure the steps taken have achieved the desired goal. These stages combine to form a process that provides a systematic approach to finding and resolving all kinds of performance issues, both predictable and unforeseen. Stage 1 Detect The Detect stage of the methodology consists of triggers or events that start a performance improvement process. To trigger the process, you will need an indication that performance can or should be improved, and Detect is designed to effectively alert you to such situations. When you complete the Detect stage of the methodology, you will be able to answer the following questions: What is the indication that there is a performance problem? Where in the system do I see the symptoms for the problem? Detect triggers fall into three familiar categories: reactive, proactive, and preventive.
36 30 Chapter 4 The Precise methodology A reactive tuning process is triggered after a performance problem occurs. Although reacting to problems is not the preferred method to manage performance, in most environments it is almost impossible to avoid some problems occurring, and so it is important to have processes in place to deal with them. There are two major challenges in reacting to performance problems. First, it is important to validate the existence of a real problem: since reports can be vague and subjective, you should verify that evidence points to a genuine problem. Second, since performance reports typically relate to past performance, you need to have enough historical information to understand the situation in which the problem occurred, and this means having the tools to analyse the circumstances surrounding the issue at the time it started. The Precise i 3 Performance Warehouse can supply the historical data necessary to carry out these reactive management tasks. The trigger for a proactive performance management activity is either a performance problem identified early in its development or circumstances that indicate a performance problem might occur. To implement the proactive component of the methodology, you will need to be alerted to such situations through a mechanism that monitors a set of performance-indicating metrics. Once such a metric exceeds a predefined threshold or exhibits abnormal behavior, an alert will be issued to attract your attention to problem symptoms. Working with a proactive alerting mechanism will involve fine-tuning. Although timely notification is important, an unrefined mechanism can issue a torrent of false or misleading alarms, making the real problems hard to isolate. Setting alert thresholds correctly and verifying the system is well tuned on a regular basis is therefore critical to the effectiveness of the proactive performance management activities. Within the performance management, Precise/Inform for Alerts, hereafter referred to as Inform/Alerts, provides the alerting mechanism to support proactive performance management.
37 Precise i 3 A Performance Management Methodology 31 Preventive performance management activities are aimed at weeding the system, eliminating potential risks to performance, and tuning the mechanisms for better problem handling. The trigger to preventive tuning is therefore not an actual problem but rather a decision on the part of the IT manager that the time has come for this sort of activity. Consequently, a great deal of discipline and experience is required to initiate a preventive tuning process. Preventive performance management is usually performed periodically and is aimed at those parts of the system that will have the most significant impact on long-term system performance. An important part of this management will be periodic performance reviews. Each performance review will result in a prioritized task list focusing on system components or activities that have exhibited negative performance behavior. Subject to IT staff availability and the priority of each item in the list you will then need to decide on the tasks that warrant immediate action. Precise/Inform for Foresight, later referred to as Inform/Foresight, provides the mechanisms to support the preventive performance management activities of the methodology through automated performance reviews. Stage 2 Find The Find stage is designed to associate the symptoms of performance problems with their sources. It is not uncommon for a problem in one component of an application to have its symptoms in a different component or tier, and the more complex a system is and the more internal and external interfaces it contains, the harder it becomes to trace a symptom's true cause.