A White Paper. The Best Practices Guide to Developing and Monitoring SLAs



Similar documents
A White Paper. Best Practices in Automated Agentless IT Monitoring

A White Paper. Best Practices in Automated Agentless IT Monitoring

How To Use Ibm Tivoli Monitoring Software

WAIT-TIME ANALYSIS METHOD: NEW BEST PRACTICE FOR APPLICATION PERFORMANCE MANAGEMENT

How IT Can Help Companies Make Better, Faster Decisions

Enterprise IT is complex. Today, IT infrastructure spans the physical, the virtual and applications, and crosses public, private and hybrid clouds.

Vistara Lifecycle Management

End-User Experience. Critical for Your Business: Managing Quality of Experience.

White Paper. The Ten Features Your Web Application Monitoring Software Must Have. Executive Summary

WHITE PAPER. SAS IT Intelligence. Balancing enterprise strategy, business objectives, IT enablement and costs

SOASTA Real User Monitoring Best Practices

Riverbed SteelCentral. Product Family Brochure

Managing IT Using the Summit Platform

GSX Monitor & Analyzer. for IBM Collaboration Suite

Uptime Infrastructure Monitor Whitepaper THE TRUTH ABOUT AGENT VS. AGENTLESS MONITORING. A Short Guide to Choosing the Right Monitoring Solution.

Briefing Paper Top 10 IT cost-saving benefits IT Managers should be getting from ITSM

Predictive Intelligence: Identify Future Problems and Prevent Them from Happening BEST PRACTICES WHITE PAPER

Capacity Management PinkVERIFY

Riverbed SteelCentral. Product Family Brochure

For more information about UC4 products please visit Automation Within, Around, and Beyond Oracle E-Business Suite

whitepaper Network Traffic Analysis Using Cisco NetFlow Taking the Guesswork Out of Network Performance Management

The Evolution of Load Testing. Why Gomez 360 o Web Load Testing Is a

The top 10 misconceptions about performance and availability monitoring

Capacity planning with Microsoft System Center

Cisco TelePresence Select Operate and Cisco TelePresence Remote Assistance Service

How To Use Axway Sentinel

Network Management and Monitoring Software

IBM Tivoli Composite Application Manager for WebSphere

ROCANA WHITEPAPER How to Investigate an Infrastructure Performance Problem

Response Time Analysis

SolarWinds Database Performance Analyzer (DPA) or OEM?

Application Performance Monitoring (APM) Technical Whitepaper

effective performance monitoring in SAP environments

Taking the Service Desk to the Next Level BEST PRACTICES WHITE PAPER

Real vs. Synthetic Web Performance Measurements, a Comparative Study

Application Performance Management (APM) Inspire Your Users With Every App Transaction. Anand Akela CA

Comparison Paper Argent vs. Nimsoft

Red Hat Network: Monitoring Module Overview

Quick Start Guide. Ignite for SQL Server. Confio Software 4772 Walnut Street, Suite 100 Boulder, CO CONFIO.

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining

SAP Performance Management. A Trend Study by Compuware and PAC

A recipe using an Open Source monitoring tool for performance monitoring of a SaaS application.

Wait-Time Analysis Method: New Best Practice for Performance Management

Citrix EdgeSight User s Guide. Citrix EdgeSight for Endpoints 5.4 Citrix EdgeSight for XenApp 5.4

The Truth about Agent vs. Agentless Monitoring

A White Paper. Using Enhanced Application Monitoring to Achieve Peak SQL Server Performance

Web Load Stress Testing

HP Systems Insight Manager and HP OpenView

Server & Application Monitor

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?

CRM Integration Best Practices

ITIL Event Management in the Cloud

GSX for Exchange. When End User performance... Matters! GSX Solutions 2015

Virtual Desktop Infrastructure Optimization with SysTrack Monitoring Tools and Login VSI Testing Tools

Vanguard Knowledge Automation System

Summit Platform. IT and Business Challenges. SUMMUS IT Management Solutions. IT Service Management (ITSM) Datasheet. Key Benefits

Managed Services Technology Stack

Solution Brief TrueSight App Visibility Manager

Consulting Solutions Disaster Recovery. Yucem Cagdar

How to Resolve Major IT Service Problems Faster

The Top 10 Reasons Why You Need Synthetic Monitoring

WHITE PAPER: ENTERPRISE SOLUTIONS. Symantec Insight Inquire Symantec i 3 Application Availability and Performance Management Solution

Application Performance Management for Enterprise Applications

Paper Robert Bonham, Gregory A. Smith, SAS Institute Inc., Cary NC

IBM Tivoli Netcool network management solutions for enterprise

Redefining Infrastructure Management for Today s Application Economy

Kaseya White Paper Proactive Service Level Monitoring: A Must Have for Advanced MSPs

ITIL / ITSM: Where Do I Start?

Analyzing IBM i Performance Metrics

Choosing IT Service Management Software

Track-It! 8.5. The World s Most Widely Installed Help Desk and Asset Management Solution

End Your Data Center Logging Chaos with VMware vcenter Log Insight

APPLICATION PERFORMANCE MONITORING

Why Nagios and Server Monitoring Are Failing Modern Apps

GSX Monitor & Analyzer When end-users performance Matters! GSX Solutions 2014

SAP Managed Services SAP MANAGED SERVICES. Maximizing Performance and Value, Minimizing Risk and Cost

Accelerate Testing Cycles With Collaborative Performance Testing

Intelligent Inventory and Professional License Management

Application Visibility and Monitoring >

Velocimetrics for Post-Trade Processing

Whitepaper. Business Service monitoring approach

Problem Management: A CA Service Management Process Map

5 Things Growing Businesses Need to Look for in CRM Software

Avanade ViewX Technology

CA Oblicore Guarantee for Managed Service Providers

Monitoring Domino Servers with HP Operations Manager

Transcription:

A White Paper The Best Practices Guide to Developing and Monitoring SLAs

Best Practices for Meeting End-User Demand: Put SLAs and Service Level Monitoring to Work for You Information technology departments are increasingly finding that SLAs and Service Level Monitoring do much more than document that written service level agreements (SLAs) are being met. When defined and applied properly, they can help stretched IT staff meet end-user demand by providing a more coherent picture of the performance of distributed business applications. Five Fundamentals of SLA Monitoring Define Collect Interpret Resolve With an approach that covers the fundamentals well, your organization Present can spend less time and money on the mechanics monitoring tools, data collection, data analysis, and performance reporting and concentrate on making sure that IT services align with the needs of the business. Next-generation agentless monitoring software is making service level monitoring practical and affordable for organizations of all sizes, not only those that can afford to license and maintain high-end systems management frameworks. These basics apply to a wide range of situations: whether your organization follows wellknown best practices for service management such as the Information Technology Infrastructure Library (ITIL) framework, whether you stick to your own in-house principles, or even when your organization needs to demonstrate good IT performance but doesn t use formal SLAs. Define the business service. Service level management (SLM) starts with an agreed-upon definition of the service being delivered and the expectations for its performance. To clarify the terms used in this discussion, SLM is the process of managing IT services to meet the expectations of users and the business on a sustained basis. The written, agreed-upon conditions are known as an SLA, or service level agreement. In some cases the SLA represents an understanding between an IT department and internal end users. In others, the SLA is a legally binding contract that specifies penalties for failure to perform. This second type of SLA is common between a company and an external service provider. The essence of a well-crafted SLA has been described by journalist Tim Wilson using a familiar example. Writing in Network Computing, he cited the once-famous guarantee of Domino s Pizza. The company specified a service (pizza delivery), a metric for measuring performance (30 minutes from the time of the call) and a penalty for failure to meet the SLA (free pizza). 1 Both service provider and customer shared a clear

Page 3 understanding of what was expected and when. Traditional performance monitoring doesn t support the SLA view of the world too well because it is intended to keep tabs on the health of individual components, such as application software, a server, or router. The pizza delivery analogy can be used to explain why: Traditional performance monitoring might tell you the driver arrived 10 minutes late. Yet you might never be able to correlate that to a delay in getting the order to the kitchen, or the fact the oven was too full to bake the pizza immediately. For example, the performance of enterprise resource planning (ERP) order management might depend upon a Web page, Windows CPU usage, and a UNIX database running on one or more sets of servers. A three-tier e-commerce application may rely on Web servers, middleware, and a back-end database. Or you might have load balancing that distributes application processing across a group of servers to improve response time and/or provide redundancy. Effective service level management benefits from having a method of grouping resources that deliver the business service governed by an SLA. The ability to define and monitor what constitutes acceptable performance is similarly useful. A best-of-breed product allows you to set up the necessary business service profiles in just minutes ideally, with a point-and-click Web browser interface. Example: View of IT Resources Delivering a Business Service

Page 4 You will notice this differs from stress-testing with synthetic transactions, which some organizations use to simulate the user experience. The focus of this paper is on establishing a view of the business service in order to monitor the behavior of an IT production environment, and on getting to the root cause of any issues. Collect and correlate to reveal service performance. IT resources are very often monitored with collections of separate and independent tools that do not provide a cohesive representation of status. As soon as you take a businesslevel view, using automation to collect and correlate data across systems brings advantages. Scanning a Windows event log or looking at perfmon counters can be reasonable methods to monitor one server, for instance. But collecting, aggregating, and correlating the data becomes far more challenging and time-consuming when you are responsible for multiple systems or a three-tier application infrastructure that must function within certain parameters. Without automation, the monitoring data that reflects the entire service delivery chain is not usable in real time, either. It s certainly possible to aggregate and correlate the collected data manually. By the time you finish, however, the opportunity to take preventive steps or confine a crisis will almost certainly be gone. Lacking real-time data to reflect service performance, you are likely to miss the chance to improve SLA compliance through proactive measures. Here it s worth noting that compared with agent-based monitoring used for this purpose, an agentless approach can save time and money if designed with SLA monitoring in mind. All agentless software by definition eliminates the need to install software agents on monitored systems because data is collected using standard underlying technologies, such as Windows Management Instrumentation (WMI) and Secure Shell (SSH). The most capable of agentless products not only use mechanisms such as these to collect and aggregate data from links in the chain of service delivery, but also correlate the monitoring data to the business services being provided. Interpret the business impact. Traditional performance monitoring provides only raw numbers that must be analyzed in order to determine the business impact. Look for service level monitoring capable of automatically offering a judgment on the data based on parameters you specify. (You still want easy access to basic metrics and statistics as needed.) Is performance good, degraded, unacceptable? Also make sure you can designate scheduled or ad hoc maintenance times, during which SLA requirements are set aside. Say that four servers run the same application when one goes out of service for some reason; this might qualify as good under the desired service level. The status might

Page 5 change to degraded if a second server went out of service, and so on. From the business and user perspectives, the service level as a whole matters more than the condition of any individual server. An immediate understanding of business impact is important both for management reporting purposes (non-technical managers care about business, not about memory or CPU load) and to help direct IT staff to what really needs attention. In the preceding scenario, a server that is completely down might be less important than a slowdown in an underlying database that has an impact on all servers. Where resources are grouped (or clustered) together, achieving satisfactory service levels is sometimes possible even when an individual component has failed. Here, the SLA is met when at least three of four servers are functioning, degraded when two are functioning, and failed if fewer then two are functioning. Resolve quickly; prevent when possible. When a problem condition (an event, in monitoring terms) does occur, you need the ability to find out where, when, and for how long the condition has existed. You must be able to identify where the problem lies within the service delivery chain, and assess its impact on the business service. The desired result is reduced time-to-resolution or when possible, to contain adverse events and trends before they have a noticeable effect on your users and applications. The capability to drill down to investigate an event is essential for the rapid problem identification and resolution that benefits SLA performance. Well-designed agentless monitoring tools ought to provide options for drilling down that instantly supply the type of information you need. Perhaps a senior IT administrator prefers to see raw numbers. Other colleagues may be able to address root causes faster by examining event summaries that are complemented by details about individual SLA failures, including recommended corrective actions. Example: Event Monitoring Showing SLA State Transitions This event summary display shows transitions in service level agreement compliance.

Page 6 SLA-oriented monitoring will also account for developing trends. Perhaps you observe what appears to be an isolated spike in CPU usage during off-peak hours. Along with a real-time view, you need the means to investigate historical data to reveal whether the upswing is a recurring condition so you can plan accordingly. Present don t just tell. Regular reporting on service level compliance is becoming an obligation in many organizations. Even if you don t have formal SLAs to uphold, is likely that you have the need to demonstrate IT s value to the business by showing how well service is being delivered. Extracting the necessary data and preparing this documentation can consume many hours of IT staff time. Agentless monitoring with rich reporting capabilities can lighten the load. Its cost and convenience leaves little reason not to take advantage of batch reports that can be scheduled to run automatically, and e-mailed to recipients on a regular basis. When it comes to reporting, showing gets the point across more effectively than telling. Crisp, informative graphics let executives and users see service level performance at a glance. The best tools can generate this style of report automatically and display it through a Web browser interface or distribute it automatically via e-mail. Example: Executive Summary SLA Report Charts and graphs document SLA compliance at a glance, and can be distributed via e-mail or a Web portal. Taking a cue from best practices for SLM, use strong reporting as your ally to forecast trends months in advance. Trend analysis supports capacity management and IT financial management. For example, you can budget for capital expense and negotiate better prices from vendors when you can anticipate when CPU or memory capacity will hit their limits.

Page 7 Example: Trend Report Trend lines based on monitoring data inform IT planning and budgeting. Conclusion Making do with a patchwork of separate tools and writing your own tools and scripts are no longer the only choices for IT departments that seek to streamline service level monitoring without an enterprise framework. Next-generation agentless monitoring can help get the job done efficiently and affordably. By choosing a commercial monitoring product from a company with an established reputation, you also avoid pitfalls associated with unsupported shareware that is not robust enough for the task. Because agentless products have been limited in capabilities until recently, evaluate them thoroughly to find functionality that approaches higher-end, agent-based products. Informed selection can lead to an SLA monitoring solution that is cost-effective, useful right out of the box, backed by technical support, and maintained with ongoing updates. When the fundamentals are taken care of, your organization will have more resources to make sure that IT remains a strategic asset to the business.