BUSINESS SERVICE & END USER EXPERIENCE MONITORING A UGUST 2, 2 010 A NTONIO ROLL LE V P OF PROFESSIONAL SEE RVICES W WW. GENERA TIONE T ECH.C OM W WW. RESOLV E -SYSTT EMS.CO M
BUSINESS SERVICE & END USER EXPERIENCE MONITORING UTILIZING THE GENERATIONE FRAMEWORK INTRODUCTION The mission critical nature of IT has increased the importance of monitoring vital IT resources. The complexity of IT has made it imperative that monitoring efforts are put into the context of business impact and the impact on the end user experience. We consider a Business Service is an IT function that allows an end user to perform a specific business task. Typically this IT function is a business critical or revenue generating task. To effectively monitor Business Services and the experience of the consumers of these Business Services generationee has developed two solution frameworks. The goals of the generationee Enterprise Service Management & Business Process Automation Framework (generatione Framework) are: 1) determine the business or end user experience impact of any given fault or performance degradation and 2) streamline the execution of processes to minimize or prevent negative business impact. These frameworks were developed to address the key challenges identified in the 2006 IBM Global CEO Study. These challenges include: Business and operational audiences lack the visibility needed to directly support and deliver against business objectives. Operation and business audiences lack the ability to the isolate root cause of a problem in real-time, are unable to quickly identify the impacted related components or services and immediately take action to correct the problem. The operational processes that directly support delivery of revenue generating business services and processes are not automated or integrated. Core objectives for the generatione Framework are to: Enable the management of the current and expected business services o Flexible to service the diversity of the products and services offered o Scalable to handle the current volume and projected growth Provide the ability to introduce and quickly support new Business Services: o Based on a framework providing centralized, pre-integrated, service driven functions necessary for new Business Services o Ability to easily configure business and IT rules Allow for efficient operations o Automation to eliminate or enhance manual and inefficient processes o Work flow to manage by exception rather than the entire process To meet these defined challenges and objectives, the generatione Framework utilizes the following components: GENERATIONE TECHNOLOGIES 2
Consolidated Operations Management Process Automation Business Impact Analysis Orchestration Visualization An overview of each of these components of the generatione Framework is provided within this document. COLLECTION EVENT MANAGEMENT To perform Business Service monitoring and to assesss end user experience, a mechanism must be in place to collect events from all related IT components. Within the generatione Framework, the Event Management process is used to collect event data from all sources and to support the integration of events from multiple domains. The Tivoli suite of provides a robust set of collection layer probes, gateways and monitors to support the management off events from any network attached IT component. Event Management is used too generate availability metrics such as: Hardware Outages: failure of a IT component impacting the working function of a Business Service whichh is measured through monitoring of real-time events; or Service Outages: failure of a communications path specific to Business Service impacting the working function of a Business Service which is measured through monitoring of real-time events.
PERFORMANCE MONITORING Successful service management requires the ongoing monitoring of performance to judge the overall health of a Business Service and the related end user experience. By collecting and storing historical performance data, one is able to perform trend analysis to determine a baseline of normal behavior and generate events when normal behavioral thresholds are exceeded. Normal can be based on a predefined, default, out-of-the-box thresholds; a customer-defined, customized setting; or a dynamic, measured baseline. Specific tools within the generatione Framework s reference architecture for performance monitoring include: IBM Tivoli Monitoring (ITM), Tivoli Performance Manager (Proviso), Tivoli Performance Analyzer (TPA) and Tivoli Service Quality Manager (Tivoli SQM). TRANSACTION MONITORING Event and performance management provides insight into the performance of specific IT components. A key part of Business Service and end user experience monitoring is transaction monitoring. With transaction monitoring every step of a transaction in monitored as it passes through the complex array of networks, servers, mainframes and applications. generatione utilizes IBM Tivoli Composite Application Monitoring (ITCAM) and Resolve to execute synthetic transactions, set thresholds on transaction response time and report on the overall health of end to end transactions within your environment. Events related to poor transaction performance or failures are relayed through the Event Management process. ANALYZE AND AUTOMATE The event stream generated at the Collection Layer is put into context within this layer. Several automation and analysis processes may be used to understand which Business Services are impacted by an event. EVENT ACCEPTANCE Converting the relatively raw data provided by the Collection Layer requires to a set of relevant actionable events may be accomplished by utilizing the generatione Resolve based Event Acceptance toolset. This tools set contains and event catalog to identify those events relative to the monitoring and recovery of business services. The event catalog provides a mechanism to determine which events are actionable, used only for reported, should be discarded. By undergoing an Event Acceptance process, customers are able to reduce the noise that is typically associated with monitoring IT systems. Once the noise is gone, one can focus on those events that impact Business Services and the end user experience. PROCESS AUTOMATION Utilizing the Resolve process automation toolset one may reduce the amount of time required to diagnose and resolve issues related to Business Services. Resolve process automation supports the definition, creation and execution of workflows that support IT operational processes. These Runbook processes can cross all management disciplines and interact with all types of infrastructure GENERATIONE TECHNOLOGIES 2
elements, such as applications, databases and hardware to perform diagnostic tasks, resolve issues or provision systems. These tasks may be executed on a schedule, in response to an event or initiated by an operator. Reacting quickly to events and removing human error can drastically reduce the impact of outages and in many cases resolve issues before there is significant end user impact. CORRELATION Event correlation simplifies and speeds the monitoring of events by consolidating events and error logs into a short, easy-to-understand package. By programmatically identifying patterns of events across multiple systems or components, one can identify patterns that might signify hardware or software problems, or failure of components. Typically correlation rules are deployed within the generatione Framework using Network Manager for topology based correlation and Impact and Resolve for complex business rules. By leveraging data contained in asset management systems, CMDB or topology databases, the correlation engine can determine to determine which Business Services are impacted by events. ENRICHMENT Adding data from disparate data sources assist with business impact analysis, escalations and troubleshooting. Resolve and Impact may be used to perform the Enrichment process. ESCALATION Notifying the correct person(s) about significant Business Service impacting events can assist with mean time to repair and help to increase customer satisfaction. Escalation rules may be used to alert support staff, external vendors, and customers to degrade services or outages. Determining who to notify, when to notify and how to notify may be accomplished using IBM Tivoli Impact or Resolve. DOMAIN MANAGEMENT generationee realizes that many customers may have an investment in domain specific management tools. These domain specific management tools may be integrated into the generatione Framework and used to support the event collection processes, correlation, enrichment, escalation and be leveraged as part of INFORM REAL TIME Service availability and resiliency of Business Services may be improved with real time views of events collected. Tivoli Netcool OMNIbus is used within the generatione Framework to consolidate data in operational silos into real-time Web based dashboard views. These customizable displays of events, service views and operational indicators give operators a single source for view all events within the IT infrastructure. BUSINESS SERVICE DASHBOARD The Business Service Dashboard provides a display of Business Services and the IT components that define each Business Service. The Dashboard shows the availability and performance metrics and indicates if these metrics have a negative impact on the overall health of the Business Service. Tivoli Business Service Manager (TBSM) is generatione s service dashboard tool of choice. 3
The service model within the Business Service Dashboard may be generated dynamically based on data within a CMDB, obtained by auto discovery tools such as IBM Tivoli Network Manager (ITNM) or Tivoli Application Dependency Discovery Manager (TADDM) or autopolution rules within the dashboard. HISTORICAL Historical reporting, provide critical information for understanding trends related to Business Services. Historical reporting of events and performance metrics is included within the generatione Framework to provide additional context and trend analysis. Historical performance metrics are captured within the Tivoli Performance Manager product. Historical event data can be stored in virtually any relational database and accessed using existing reporting tools or through the Tivoli Common Reporting application. END USER EXPERIENCE MONITORING With the full generatione Framework in place it one can quickly understand how the End User Experience is impacted by each event that is generated within the Collection Layer. The impact of an event on the Business Service model, gives an overall impression of the end user experience. IBM Tivoli Composite Application Monitors (ITCAM) provides. IBM Tivoli Composite Application Manager for Transactions helps organizations quickly and easily detect and isolate transaction response and availability issues, enabling faster problem resolution. ITCAM also enables proactive management of transactions, identifying bottlenecks and other potential problems before they impact the end user experience. ITCAM can monitor end-user response time for both Web and Windows applications using both robotic and real-user analysis capabilities. Resolve can monitor end user response time for those applications not supported by the ITCAM solution. GENERATIONE TECHNOLOGIES 4
CASE STUDY GOAL An internet wireless provider for commercial flights wanted to implement holistic monitoring in their environment. This monitoring would target the provider s mission critical applications that provided the ability for people to order wireless internet connectivity on a plane. In addition to the typical monitoring of outages in their diversee environment composed of very new technology, but they also wanted to understandd the end user experience of passengers. Data from this end user experience monitoring is to be used to improve their service. SOLUTION The infrastructure monitoring solution is comprised of NetCool OMNIbus, Proviso, ITNM, ITM, and ITCAM. ITM was setup to monitor the operating systems, databases, application servers and Web Services servers. ITCAM was configured to report on the internal transactions between the databases, application servers and Web servers and connectivity to the servers on the planes. ITNM was implemented to perform root cause analysis on any network events. Proviso was implemented to report on the performance of the network. ITM, ITCAM, and Proviso report events and threshold crossingss to NetCool OMNIbus and ITNM performs root cause analysis on the network events to suppress any systematic events from a root failuree in the network. The end user application transactionn monitoringg was implemented with Resolve by implementing a set of Run Books that mimic a passenger s transaction. The steps included in the Run Books are login, via a web browser, to gain access too the internet, purchase an internet access 5
package and review or execute on specials offers posted by the provider s business partners. Each Resolve Run Book executes a specific task to mimic an end user and these are performed on each flight at a configured interval. Any service issues or performance degradation are immediately reported to NetCool OMNIbus for action with the NOC as well as notifications to the Customer Help Desk. When deemed appropriate, to assist with customer satisfaction, the provider contacts customers via email or SMS giving them a free wireless flight to compensate for any service issues. BENEFITS This internet wireless provider for commercial flights has great visibility on their network, servers and applications and no longer spends countless hours on the phone with their airline customers troubleshooting issues. The end user transaction monitoring the provider can be very customer oriented and ensure that the customer experience is as positive as possible and when there are issues be aware of them and compensate customers accordingly. This proactive customer service has helped to drive the value of their brand significantly. GENERATIONE TECHNOLOGIES 6