The top 10 misconceptions about performance and availability monitoring
Table of contents Introduction................................................................ 3 The top 10 misconceptions about monitoring 10. Monitoring basic infrastructure is enough........................................ 3 9. Monitoring processes or services for an application suffices........................... 3 8. All systems in the IT Enterprise will (and should) be monitored.......................... 4 7. Monitoring all of the available metrics for a system or application is the best approach........ 4 6. Second or sub-second sampling rates are always necessary........................... 4 5. A company s infrastructure monitoring strategy can operate in its own, detached silo......... 5 4. The CMDB is the single physical repository of all knowledge........................... 5 3. Monitoring software needs to reside in-house..................................... 5 2. You can model your environment manually...................................... 6 1. You can get by with a single type of monitoring either real or synthetic.................. 6 HP Business Availability Center................................................... 7 For more information.......................................................... 7
Introduction Today s enterprises depend on the availability and performance of their mission-critical business services. If these services suffer from degradations in performance or fail completely, companies are subject to lost revenues and decreased customer satisfaction. In order to avoid these undesirable outcomes, IT departments must adopt effective monitoring strategies without actually making problems worse. Since the earliest days of business computing with monolithic mainframe installations and very small client/server-based systems IT departments have well understood the importance of monitoring availability and performance and have invested in various forms of monitoring solutions. However, as technology has rapidly evolved, some of yesterday s best practices are no longer valid. In some cases, following outdated strategies can lead to ineffective monitoring, high overhead, and increased costs. In addition, today s complex, distributed and service-oriented application, on which many mission-critical business processes rely, are forcing companies to re-evaluate their long-held beliefs regarding monitoring and to add new best practices to their existing strategies. This white paper examines the ten common misunderstandings about monitoring and offers suggestions for implementing complete businessfocused monitoring strategies that can adapt to today s more complex computing environments. The top 10 misconceptions about monitoring 10. Monitoring basic infrastructure is enough. Monitoring system metrics (such as CPU, memory, and disk) is important but these metrics do not provide adequate information to truly understand whether actual users or applications are experiencing performance problems. Trying to add up individual system performance metrics to understand actual application or end user performance does not work either. Due to advances in hardware reliability and performance as well as architecture, the causes of most performance problems today are usually problems with application components, as opposed to individual pieces of hardware. As a result, system monitoring alone, while still critical, will not provide an accurate or complete picture of true application performance. True end user focused monitoring is critical and is an essential piece of today s monitoring strategy. 9. Monitoring processes or services for an application suffices. Today s applications whether packaged, J2EE,.Net, or customized SOA applications are complex and span multiple systems and various technologies. Simply monitoring a few key services or processes does not provide a complete picture of application health and 3
Figure 1. Monitoring system metrics (such as CPU, memory and disk) does not provide adequate information to truly understand the customer experience. certainly does not provide the level of detail needed to troubleshoot thorny performance problems. In order to thoroughly understand application health, detailed component monitoring, diagnostics and dynamic configuration modeling are required to understand the complex interactions between the various services. 8. All systems in the IT enterprise will be monitored. While it is tempting to believe that monitoring everything that uses electricity in the IT enterprise is feasible, this is simply not the case. Hosted and outsourced services are becoming increasingly prevalent. Users traverse networks that are outside of your direct control. And your in-house teams may not share information. In addition, IT enterprises typically consist of several systems that do not support business-critical functions or applications. The good news is that 100 percent coverage is not necessary or even desirable. The trick is in knowing which systems relate to critical business functions and which ones do not. Discovery and dependency relationship mapping technology can help IT in monitoring the systems, applications, and application components that matter. Once you understand what is critical to service delivery, you can construct measurable and enforceable servicelevel agreements (SLAs) between teams and partners to facilitate end-to-end service availability and performance. 7. Monitoring all of the available metrics for a system or application is the best approach. Performance problems tend to follow Pareto s Rule 80 percent of problems are generally caused by 20 percent of the system s or application s components. The challenge is in knowing which metrics are the key indicators; otherwise, either too much data is collected or the wrong metrics are monitored. Instead of monitoring every possible metric, IT administrators should look for monitoring solutions with built-in expertise regarding the most important metrics to watch. 6. Second or sub-second sampling rates are always necessary. The most important alerts needed when monitoring infrastructure performance and availability are the ones for sustained performance problems. Monitoring with second or sub-second intervals is not necessary to identify sustained performance issues and usually results in massive amounts of data that are never used, or alert storms that get too many people involved in a situation which may not be an emergency. With virtualization, customers normally see sustained usage at levels that would trigger normal alert thresholds, but today these really represent efficient hardware utilization. Second or sub-second monitoring will uncover events that are temporary or transitional and not necessarily good indicators of performance problems that truly impact end user experience. While it is true that some aspects of performance and availability may execute faster than a second and therefore require sub-second sampling, these are few and far between. Therefore, care needs to be taken in deployment to ensure that the right level of information is captured. This is one of the many reasons that all organizations need to have a systems-monitoring solution that includes both agent and agentless monitoring. 4
Figure 2. Effective performance management solutions should be able to span the entire performance and availability lifecycle, from development to production. CIO Performance management of multiple applications throughout the lifecycle Development team Production team R&D and QA Validate performance Test the right usage scenarios Find all problems before going live Predictably manage product changes Operations and lines of business Provide availability Meet or beat SLAs Detect changes in real time Reduce management complexity Siloed performance practice Time and resource crunch 5. A company s infrastructure monitoring strategy can operate in its own detached silo. Today s enterprise monitoring strategies are becoming increasingly tied to other strategies within the organization. Pre-production testing and development require feedback from the real-time monitoring teams when designing new applications or tweaking existing ones. Change management strategies and solutions should be factored in when determining the cause of performance issues. Business people must be involved in helping to set thresholds and SLAs. The bottom line is that IT operations must be a good corporate citizen and integrate with business solutions, strategies, and processes. Modern-day enterprises need the ability to ensure that their applications and systems meet established performance and availability requirements in both preproduction and production. IT organizations must therefore have the ability to monitor, diagnose and resolve critical problems across the entire application lifecycle. Effective performance optimization and management solutions should be able to span the entire performance and availability lifecycle, from development to production. Specifically, these solutions should enable IT to: Performance-test applications prior to rolling them out to production, mitigating the risks of application downtime. Use capacity planning to create the best architecture in the production environment by optimizing across cost, performance, and utilization requirements. Monitor, measure and manage enterprise applications and the underlying infrastructure in production. Proactively diagnose and resolve application incidents and problems in both test and production environments. Apply real-world knowledge gained from user monitoring to improve the accuracy of performance testing. 4. The configuration management database (CMDB) is the single physical repository of all knowledge. Several organizations have tried to create a CMDB strategy focused on building one monolithic CMDB instance to be used throughout the organization. Many have found, after years of effort, that it is impossible to build one CMDB to serve all requirements. The bottom line is that no single CMDB can do everything you need, which is why CMDB federation is critical. Through federation, HP Universal CMDB can share information bi-directionally with other data repositories. HP believes very strongly in this concept and has been a driving force in ITIL version 3 and the CMDB Forum to create standards for the exchange of information and best practices around the concept of a distributed configuration management system. 3. Monitoring software needs to reside in-house. Application management outsourcing has gone mainstream and has been a core competency of HP for the past seven years through its Software-as-a-Service (SaaS) offerings. Customers have opted for SaaS delivery for a variety of reasons, including lower TCO, faster 5
Synthetic monitors execute scripts from agents distributed throughout the Internet or within your environment to simulate how a particular application is performing and raise alerts if the performance or availability falls outside acceptable thresholds. Since synthetic monitoring occurs around the clock from multiple locations, it acts as an early warning system and enables you to be proactive in managing service levels within a set of defined conditions. Real user monitors watch the network to capture the conversations between clients and servers to determine the availability, performance and effectiveness from the perspective of each individual user. This approach can give you visibility into every user experience for support and troubleshooting and also aggregates information on availability, performance, content and behavior patterns. It acts as a source of scripts for synthetic monitoring, and its data can be used for performance testing of application upgrades, ensuring customers do not suffer performance degradation when the upgrade goes live. Business transaction monitors track each and every business transaction to create a high-level view of the health of business transactions and the impact of failure on business processes. This gives business operations the visibility of business transactions and allows them to know which business transaction instances are affected by IT problems, thus allowing IT to leverage business metrics when prioritizing IT issues. implementation time, and the ability to deliver insight without the need for end users or even administrators to become application monitoring experts. In addition, an outsourced monitoring strategy offers the ability to validate the performance of applications outside of its firewalls from multiple locations around the world. For companies that require independent validation of service levels, SaaS provides the required third-party validation of external service providers. 2. You can model your environment manually. Many organizations initially believe that they can create and maintain manual maps that demonstrate how their infrastructure works together to deliver critical business services. In fact, many initial attempts are quite successful with one or two services. However, as time goes on and more services are added, it becomes difficult to the point of being impossible to maintain reliable models due to the number of changes and intricate dependencies. This is an area where automation is key to lowering costs and increasing accuracy for IT groups. By leveraging technology that automates discovery and dependency mapping, companies can reduce costs while reducing mean time to resolution and increasing mean time between failures. 1. You only need real or synthetic user monitoring When it comes to monitoring the customer experience, much debate has raged about the accuracy and overhead of different end user monitoring approaches. The fact is that several end user monitoring technologies are required, depending on the situation. There are three categories of end user monitoring technologies synthetic monitors, real user monitors, and business transaction monitors. These technologies not only capture the customer experience across complex transactions but also provide the information needed to accelerate problem isolation and resolution. By using all three customer experience monitoring approaches, organizations can perform business transaction management. With effective business transaction management, IT can communicate the relevant information to the right people at the right time with critical information that contributes significantly to the overall success of the business. Furthermore, the data from all of these monitors can be fed into service level management systems, allowing IT to provide the metrics necessary to align IT with the business. 6
Figure 3. HP Software-as-a-Service is an outsourced monitoring service that enables enterprises to leverage the pre-deployed infrastructure, operations and expertise of HP. Customer success Accelerate time to value Mitigate risk Unburden IT resources HP Software-as-a-Service Product People Process Configure, integrate and support HP Software products Ensure HP knowledge transfer to customers Apply HP best practices to create world-class processes HP Business Availability Center HP Business Availability Center is the only end-to-end solution that enables proactive business service management. It provides complete visibility into and control over the end user status and business availability of services running in complex, distributed application environments. HP Business Availability Center enables enterprises to map, measure, and manage application, system, and infrastructure performance and availability according to end user requirements, service levels and business goals. HP Business Availability Center helps enterprises quantify the business impact of application downtime and resolve performance problems when they arise. It also offers the ability to obtain real-time visibility into the complex and changing relationships between applications and the underlying infrastructure. HP Business Availability Center includes integrated applications and a business dashboard for performance and application monitoring, system availability management, service level management, configuration management, discovery and dependency mapping, diagnostics, and problem resolution. As a result, IT departments can reduce mean time to identification (MTTI), improve service level performance, reduce application downtime and lower total cost of ownership. HP Business Availability Center is a core component of the HP business service management (BSM) strategy. BSM links business services such as a bank funds transfer to their underlying applications, infrastructure, and network components to analyze the business impact of IT problems and reduce the potential costs of IT service downtime. BSM provides an integrated view of IT performance that encapsulates both technical and business perspectives in order to better align IT with business objectives. HP Software-as-a-Service HP Software-as-a-Service for Business Availability Center is a remote, outsourced monitoring service that enables enterprises to enhance the performance and availability of internal and external-facing applications by leveraging the pre-deployed infrastructure, operations, and expertise of HP. HP Software-as-a-Service takes ownership of the configuration and integration of HP software and combines that with the people who can mentor your teams on Business Availability Center best practices. Software-as-a-Service works to ensure that you accelerate your time to value, while mitigating deployment risk and unburdening your IT resources to focus on other IT initiatives. You can start with HP Software-as-a-Service and transition to an in-house implementation if you choose to at a later time. For more information To learn more about how HP Business Availability Center can improve the health of your mission-critical applications, contact your HP representative or visit us online at www.hp.com/software. 7
To learn more, visit www.hp.com/software 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. 4AA1-8656ENW, March 2008 Technology for better business outcomes