Defining and Monitoring Service Level Agreements for dynamic e-business



Similar documents
SLA Business Management Based on Key Performance Indicators

Monitoring the QoS of Web Services using SLAs. P.O. Box 1385, GR 71110, Heraklion, Crete, Greece {zegchris,

Enhancing A Software Testing Tool to Validate the Web Services

Business Commitments for Dynamic E-business Solution Management: Concept and Specification

Towards Management of SLA-Aware Business Processes Based on Key Performance Indicators

WebSphere Business Monitor

Service Portfolio Management PinkVERIFY

Figure 1: Illustration of service management conceptual framework

Service Level Monitoring with Nagios. National Technical University of Athens Network Operations Center

Business Application Services Testing

Web Application Hosting Cloud Architecture

QoS Integration in Web Services

Contents. Overview 1 SENTINET

Terms of Use - The Official ITIL Accreditor Sample Examination Papers

RUGGEDCOM NMS. Monitor Availability Quick detection of network failures at the port and

<Insert Picture Here> Increasing the Effectiveness and Efficiency of SOA through Governance

HP Systinet. Software Version: Windows and Linux Operating Systems. Concepts Guide

Framework for Measuring Performance Parameters SLA in SOA

Capacity Plan. Template. Version X.x October 11, 2012

Front Metrics Technologies Pvt. Ltd. Capacity Management Policy, Process & Procedures Document

MODEL DRIVEN DEVELOPMENT OF BUSINESS PROCESS MONITORING AND CONTROL SYSTEMS

SOA Success is Not a Matter of Luck

Connecting Custom Services to the YAWL Engine. Beta 7 Release

HP Service Manager. Software Version: 9.34 For the supported Windows and UNIX operating systems. Processes and Best Practices Guide

Service-Oriented Architecture and its Implications for Software Life Cycle Activities

CMA5000 SPECIFICATIONS Gigabit Ethernet Module

WebSphere Business Monitor

An architectural blueprint for autonomic computing.

CA Service Catalog r12

Service Level Agreement in Cloud Computing

LBPerf: An Open Toolkit to Empirically Evaluate the Quality of Service of Middleware Load Balancing Services

Ensuring End-to-End QoS for IP Applications. Chuck Darst HP OpenView. Solution Planning

Service Design, Management and Composition: Service Level Agreements Objectives

White Paper. Software Development Best Practices: Enterprise Code Portal

Service-Oriented Architectures

WebSphere Application Server V6.1 Extended Deployment: Overview and Architecture

Key Performance Indicators for Cloud Computing SLAs

Performance Testing. Slow data transfer rate may be inherent in hardware but can also result from software-related problems, such as:

SOA GOVERNANCE MODEL

Oracle SOA Suite Then and Now:

SOA Planning Guide The Value Enablement Group, LLC. All rights reserved.

Developing SOA solutions using IBM SOA Foundation

ITG Software Engineering

ITPS AG. Aplication overview. DIGITAL RESEARCH & DEVELOPMENT SQL Informational Management System. SQL Informational Management System 1

How To Write A Cloud Service Level Agreement

The Way to SOA Concept, Architectural Components and Organization

Service Level Agreement Comparison in Cloud Computing

<Insert Picture Here> Application Testing Suite Overview

Java Management Extensions (JMX) and IBM FileNet System Monitor

IBM Rational Asset Manager

Managed File Transfer in Enterprise Java Applications

Sentinet for BizTalk Server SENTINET 3.1

<Insert Picture Here> Optimized WebLogic Monitoring with Oracle Enterprise Manager

Address IT costs and streamline operations with IBM service desk and asset management.

Summit Platform. IT and Business Challenges. SUMMUS IT Management Solutions. IT Service Management (ITSM) Datasheet. Key Benefits

Sentinet for BizTalk Server SENTINET

Fingerprinting the datacenter: automated classification of performance crises

How To Model A System

OpenAdmin Tool for Informix (OAT) October 2012

Course Description. Course Audience. Course Outline. Course Page - Page 1 of 5

Federation of Cloud Computing Infrastructure

The Jamcracker Enterprise CSB AppStore Unifying Cloud Services Delivery and Management for Enterprise IT

IT Services Management. ITIL:Service Design. Martin Sarnovský. Department of Cybernetics and AI, FEI TU Košice

5 Performance Management for Web Services. Rolf Stadler School of Electrical Engineering KTH Royal Institute of Technology.

Management Packs for Database

NetComplete Ethernet Performance Management. Ethernet Performance and SLA Monitoring Application

Establishing a business performance management ecosystem.

CA Nimsoft Monitor. Probe Guide for CA ServiceDesk Gateway. casdgtw v2.4 series

CONDIS. IT Service Management and CMDB

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Eye of the Storm Enterprise Technical Overview

IBM. Tivoli. Netcool Performance Manager. Cisco Class-Based QoS Technology Pack. User Guide. Document Revision R2E1

ITIL. Lifecycle. ITIL Intermediate: Continual Service Improvement. Service Strategy. Service Design. Service Transition

An Oracle White Paper July Oracle Database 12c: Meeting your Performance Objectives with Quality of Service Management

How To Manage Cloud Service Provisioning And Maintenance

CA Nimsoft Monitor. Probe Guide for URL Endpoint Response Monitoring. url_response v4.1 series

Cisco Application Networking for IBM WebSphere

Domain Name Service Service Level Agreement (SLA) Vanderbilt Information Technology Services

Improvised Software Testing Tool

Transcription:

Defining and Monitoring Service Level Agreements for dynamic e-business Alexander Keller, alexk@us.ibm.com Heiko Ludwig, hludwig@us.ibm.com LISA 02 11/07/2002 Philadelphia, PA, USA 2002 IBM Corporation

Why should SysAdmins care about SLAs? How much does it cost you to guarantee a Response Time less than 1 sec.? How much do you bill a Customer for a Throughput of 1000 TAs/sec? How much Revenue is lost per Hour of Downtime of Server X? Express System Resources in Financial Terms What are realistic Thresholds for Response Time/Throughput/Bandwidth? Can you accommodate additional Workload and accept another Customer? How much Workload do SLA Measurements put on Server X? How does this impact your SLAs with other Customers? SysAdmins will become involved in SLA Negotiation (today: Lawyers) If your Systems become overloaded, which Customer will be starved out? Classify Customers according to Revenue SLA Violation may not be due to technical Failure, but Result of Business Decision What s more expensive? A Disk-Crash on a Server or an overloaded Ethernet Segment? Depends on how much the Customer pays whose Data is hosted there! Fix Outages according to Customer Classification (today: Severity of Outage) 2

Real-world SLAs and their Requirements Today: Confined to Availability Availability% := (n - #hours_svc_down)* 100 / n Users being able to establish a TCP Connection to the Server Customer s ability to access the Software Application on the Server if the Server is responding to HTTP Requests issued by monitoring SW BUT: There is no agreed-upon Definition of Availability! What is needed? Define new SLAs on demand (e.g., Grid, Virtual Enterprises, Web Services) Accommodate ANY QoS Parameter Definition and Service Level Go beyond Availability : Response Time, Throughput, Bandwidth Connect to existing Application and Resource Instrumentation Support Customer/Provider Relationships of arbitrary Depth Delegate SLA Monitoring Tasks to Third Parties Address Confidentiality Requirements of the Parties ( Need to know ) Automated Setup of Monitoring Environment based on SLA Definition 3

Terminology: SLA Parameters, Metrics, Functions Business Metrics SLA Parameters Composite Metrics Resource Metrics Mapping Mapping Function Measurement Directive Customer-defined The analyzed SLAs share a common Structure: Involved Parties, SLA Parameters Function Metrics used as Input to compute SLA Parameters The Functions that define how Metrics are aggregated Provider-defined How Metrics are retrieved from Managed Resources (Measurement Directive) 4

Web Service Level Agreement (WSLA) Framework Service Customer Client Application Service Description Service Provider Service SLA Management SLA Primary Parties Supporting Parties Parameters/Metrics Measurement Directives Functions Obligations Response time Throughput Availability SLA Management SLA annotates an existing Service Specification: References Service Description (e.g., Web Services: WSDL) Other Service Descriptions possible, e.g., for Business Processes, Messaging, IT Resources XML Schema based Language for SLAs, Runtime Architecture comprising several SLA Monitoring Services 5

The WSLA Services: Atomic Building Blocks Establishment & Deployment Services Supports negotiation and authoring of SLAs Deploys the relevant (!) Parts of the SLA to the different Parties E.g., multiple Measurement Services may not see each other Measurement Service Probes and measures Resource Metrics according to SLA Specification and aggregates them into SLA Parameters Condition Evaluation Service Compares SLA Parameters obtained from Measurement Service against specified Service Levels Notifies the involved Parties that a Violation has occurred during a valid Time Period Management Service & Business Entity (not yet supported) Carries out corrective Actions, provided they do not violate Business Policies Access to - proprietary - Tuning Controls and Configuration Parameters of managed Resources often not available, Must be checked against Business Policies embodied by Business Entity 6

SLA Lifecycle in the WSLA Architecture Service Customer Establishment 1. Negotiate / Sign SLA references Web Service WSDL Servlet Engine 5. Terminate Service Provider Admin Console AppServer Monitoring/Management Interfaces SLA Compliance Monitor Measurement Deployment 2. Deploy Condition Evaluation 3. Report 4. Act Management Business Entity 7

Delegating SLA Monitoring Tasks to Third Parties XInc ACMEProvider Management Violation Notifications Violation Notifications Management ZAuditing Condition Evaluation Measurement Aggregate Response Time, Throughput Measurement Client Application YMeasurement Service Operation Availability Probe Response Time, Operation Counter Offered Service Measurement Service Providers guarantee Accuracy and Objectivity (e.g., Keynote Systems) 8

Typical Structure and Elements of an SLA Parties: Signatory Parties Supporting Parties Service Description: Service Operations Bindings SLA Parameters Metrics Measurement Directives Functions Schedule Obligations: Validity Period Predicate Actions Involved Parties: IDs and Interfaces of Signatory Parties IDs and Interfaces of Supporting Parties Service Characteristics & Parameters: Operations offered by Service Transport encoding for Messages Agreed-upon SLA Parameters (Output) Metrics used as Input How/where to access Input Metrics Measurement Algorithm Measurement Duration, Sampling Rate Guarantees & Constraints: When is SLA Parameter guaranteed? How to detect Violation (Formula) Corrective Actions to be carried out 9

SLA Structure Example: Service Throughput Parties: Signatory Parties Supporting Parties Service Description: Service Operations Bindings SLA Parameters Metrics Measurement Directives Functions Schedule Obligations: Validity Period Predicate Actions Involved Parties: customer.com, provider.com msp.com, keynote.com, Service Characteristics & Parameters: StockQuoteService:GetQuote() SOAPGetQuote average throughput of service #Requests(svc) www.msp.com/getmetric?requests(svc) AVG(#Requests(svc)) over 24 hours, every 60 minutes Guarantees & Constraints: weekdays, 9am-5pm > 1.000 TA/second open TT, pay penalty/premium 10

Example: Defining SLOs with Constraints Why define Constraints for Service Level Objectives? If your hosted Site becomes too popular and creates excessive Load, your Throughput SLO may be impossible to fulfill Service Provider needs to protect himself against this Situation SLOs are defined for regular Workloads BUT: What is a regular Workload? Needs to be defined within the SLA! Example: If the System Load is over 80% for more than 30% of the Time, the Obligation of a Service Provider to guarantee 1000 TAs/sec is waived 2 SLA Parameters: AvgThroughput: Average Throughput for TAs; must be > 1000 TA/sec OverloadPct: The % amount of time System Utilization is => 80% Both Parameters are measured every 5 Minutes on an hourly Basis In an SLO, OverloadPct is used as Precondition for AvgThroughput 11

Example (cont.) How many TimeSeries Elements > Threshold? New Value put in Time Series How obtained? IBM T.J. Watson Research Center SLAParameter OverloadPct has defined by Metric PercentOverUtilized Function PercentageGreaterThanThreshold defined by Metric UtilizationTimeSeries Function TimeSeriesConstructor defined by Metric ProbedUtilization Measurement Directive Probe: acme.com/systemutil ServiceObject WSDL:getQuote Assign to SLA Parameter has SLAParameter AvgThroughput defined by Metric AvgThroughput Function Average defined by Metric ThroughputTimeSeries Function TimeSeriesConstructor defined by Metric Throughput Function Divide defined by defined by Metric Transactions Measurement Directive Read: TXcount Metric TimeSpent Measurement Directive Read: Timecount 12

Defining SLA Parameters and Metrics: Assignment of Metric to SLA Parameter Who Communicates with whom? And how? <SLAParameter name="overloadpct" type="float" unit="percentage"> <Metric>OverLoadPct</Metric> <Communication> <Source>YMeasurement</Source> <Pull>ZAuditing</Pull> <Push>ZAuditing</Push> </Communication> </SLAParameter> Define the Metric: How many Values (in %) of a Utilization Time Series are over a Threshold of 80%? <Metric name="overloadpct" type="float" unit="percentage"> <Source>YMeasurement</Source> <Function xsi:type="pctgtthreshold" resulttype="float"> <Schedule>BusinessDay</Schedule> <Metric>UtilizationTimeSeries</Metric> <Value> <LongScalar>0.8</LongScalar> </Value> </Function> </Metric> Create the Time Series: - probe every 5 Minutes - keep the last 12 Values 13 <Metric name="utilizationtimeseries" type="ts" unit=""> <Source>YMeasurement</Source> <Function xsi:type="tsconstructor" resulttype="float"> <Schedule>Every5Minutes</Schedule> <Metric>ProbedUtilization</Metric> <Window>12</Window> </Function> LISA 02 </Metric> Philadelphia, PA, USA 07/11/2002 2002 IBM Corporation

SLOs in the WSLA Language: ACMEProvider guarantees the SLO The SLO is valid for 1 Day Time Format: RFC 3060 Precondition: OverloadPercentage < 30% Guarantee: Average Throughput > 1000 Send NewValue Event to registered Parties whenever Guarantee is broken 14 <ServiceLevelObjective name= SLO_for_AvgThroughput"> <Obliged>ACMEProvider</Obliged> <Validity> <Start>2001-11-30T14:00:00.000-05:00</Start> <End>2001-12-31T14:00:00.000-05:00</End> </Validity> <Expression> <Implies> <Expression> <Predicate xsi:type="less"> <SLAParameter>OverloadPct</SLAParameter> <Value>0.3</Value> </Predicate> </Expression> <Expression> <Predicate xsi:type= Greater"> <SLAParameter>AvgThroughput</SLAParameter> <Value>1000</Value> </Predicate> </Expression> </Implies> </Expression> <EvaluationEvent>NewValue</EvaluationEvent>... </ServiceLevelObjective>

Conclusions and Outlook WSLA supports: Flexible Specification of inter- and intra-organizational SLA Parameters Highly customizable Service and IT Resource-Level SLOs Nested Customer/Provider Relationships Definition of third ( supporting ) Parties in SLA Management Formal, XML-Schema based Description Language Applicable to various Kinds of Services (Web Services, Storage, eutilities etc.) SLA Compliance Monitor Implementation Part of IBM Web Services Toolkit Current Work: Comprehensive SLA Framework, comprising: Business Metrics and Pricing, Business Processes, Workflow and Service Composition, SLA Editing and Reuse of common SLA Artifacts, Integration with existing Management Frameworks (WBEM / CIM) 15

Let us know what you think! Download WSTK 3.2 with SLA Compliance Monitor from: http://www.alphaworks.ibm.com/tech/webservicestoolkit LISA 02 11/07/2002 Philadelphia, PA, USA 2002 IBM Corporation