Welcome to today's webinar: How to Transform RMF & SMF into Availability Intelligence



Similar documents
Application Note RMF Magic 5.1.0: EMC Array Group and EMC SRDF/A Reporting. July 2009

BROCADE PERFORMANCE MANAGEMENT SOLUTIONS

The IntelliMagic White Paper on: Storage Performance Analysis for an IBM San Volume Controller (SVC) (IBM V7000)

W W W. Z J O U R N A L. C O M o c t o b e r / n o v e m b e r INSIDE

Capacity Planning Use Case: Mobile SMS How one mobile operator uses BMC Capacity Management to avoid problems with a major revenue stream

Proactive Performance Management for Enterprise Databases

Operations Management for Virtual and Cloud Infrastructures: A Best Practices Guide

solution brief September 2011 Can You Effectively Plan For The Migration And Management of Systems And Applications on Vblock Platforms?

Customer Evaluation Report On Incident.MOOG

The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000

can you effectively plan for the migration and management of systems and applications on Vblock Platforms?

Achieving Business Performance Goals through Virtualization Management Best Practices

HP APPLICATION PERFORMANCE MONITORING

Optimizing your IT infrastructure IBM Corporation

Improve end-to-end management with IBM consolidated operations management solutions.

STORAGETEK VIRTUAL STORAGE MANAGER SYSTEM

Atrium Discovery for Storage. solution white paper

IT Service Management Real-time Enduser Context Has A Dramatic Affect On Incident and Problem Resolution Times

Solution White Paper BMC Service Resolution: Connecting and Optimizing IT Operations with the Service Desk

ScienceLogic vs. Open Source IT Monitoring

Reduce IT Costs by Simplifying and Improving Data Center Operations Management

Return On Investment XpoLog Center

CA Service Desk On-Demand

Application Performance Monitoring (APM) Technical Whitepaper

Predictive Intelligence: Identify Future Problems and Prevent Them from Happening BEST PRACTICES WHITE PAPER

can you improve service quality and availability while optimizing operations on VCE Vblock Systems?

THE VIRTUAL DATA CENTER OF THE FUTURE

Copyright 11/1/2010 BMC Software, Inc 1

How To Use Ibm Tivoli Monitoring Software

IBM Tivoli Composite Application Manager for WebSphere

Test Run Analysis Interpretation (AI) Made Easy with OpenLoad

IT Operations Management. Intelligent. Integrated. Innovative.

IBM Tivoli Service Request Manager

Logicalis Managed Service Strategy & Support. Geraldine Moatti Proposition Manager, Services

Tivoli Automation for Proactive Integrated Service Management

A Vision for Operational Analytics as the Enabler for Business Focused Hybrid Cloud Operations

Storage Virtualization for Mainframe DASD, Virtual Tape & Open Systems Disk

Monitoring Best Practices for

Improve your IT Analytics Capabilities through Mainframe Consolidation and Simplification

Analyzing IBM i Performance Metrics

Monitoring Best Practices for COMMERCE

7/15/2011. Monitoring and Managing VDI. Monitoring a VDI Deployment. Veeam Monitor. Veeam Monitor

One of the database administrators

Redefining Infrastructure Management for Today s Application Economy

Maximizing Your Storage Investment with the EMC Storage Inventory Dashboard

CICS Transactions Measurement with no Pain

EMC VMAX 40K: Mainframe Performance Accelerator

BMC Service Assurance. Proactive Availability and Performance Management Capacity Optimization

Capacity planning for IBM Power Systems using LPAR2RRD.

Bright Idea: GE s Storage Performance Best Practices Brian W. Walker

Address IT costs and streamline operations with IBM service request and asset management solutions.

Affordable Remote Data Replication

IBM Global Business Services Microsoft Dynamics CRM solutions from IBM

VMware Virtualization and Cloud Management Overview VMware Inc. All rights reserved

Server & Application Monitor

Benefits of Deploying VirtualWisdom with HP Converged Infrastructure March, 2015

IBM Tivoli Storage Productivity Center (TPC)

END TO END DATA CENTRE SOLUTIONS COMPANY PROFILE

Performance Management for

Red Hat Enterprise linux 5 Continuous Availability

Cisco Unified Computing Remote Management Services

Challenges of Capacity Management in Large Mixed Organizations

Move beyond monitoring to holistic management of application performance

Hitachi Adaptable Modular Storage 2000 Family and Microsoft Exchange Server 2007: Monitoring and Management Made Easy

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

BMC Mainframe Solutions. Optimize the performance, availability and cost of complex z/os environments

EMC Data Protection Advisor 6.0

We will discuss Capping Capacity in the traditional sense as it is done today.

Customer Relationship Management

SOLUTION WHITE PAPER

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

Transaction Monitoring Version for AIX, Linux, and Windows. Reference IBM

An Oracle White Paper November Oracle Real Application Clusters One Node: The Always On Single-Instance Database

Remote Support: Key Metrics to drive Improvement in your Center

Simplify and Automate IT

XpoLog Center Suite Log Management & Analysis platform

IT Infrastructure Management

SmartCloud Analytics Log Analysis

BMC ProactiveNet Performance Management: Delivering on the Promise of Predictive Control Across the Total IT Environment SOLUTION WHITE PAPER

Transcription:

Welcome to today's webinar: How to Transform RMF & SMF into Availability Intelligence The presentation will begin shortly

Session Abstract: How to Transform RMF & SMF into Availability Intelligence It is time for a new, more intelligent approach to interpreting the RMF & SMF data. One that provides a dramatically different result that you can easily verify on your own data. RMF & SMF produce the world s richest source of machine-generated data about enterprise infrastructure performance and configuration. But even the best run shops are not able to use this data to avoid incidents causing unavailability. To outsmart unavailability, you have to automatically crawl through all the workload data every day at a very granular level. This data needs to be enriched and constantly evaluated against detailed expert knowledge about the infrastructure. Statistical analysis (the primary method in other new Analytics solutions) is not enough. Using expert knowledge in this kind of process, you can see for the first time, the risk in your infrastructure to handle your peak workloads. And how that risk is changing over time. This new visibility gives you warning before your online monitors can even detect any disruption to service levels. 2

Availability on z/os Systems What does the z stand for? zero downtime What is your availability? z/os vs. end-user experience 3

z/os Infrastructure Areas Many necessary for availability: Processor, WLM Goals, etc. Channels Coupling Facility XCF FICON Disk Storage Replication / DR Tape / Virtual Tape Storage 4

Incidents Leading to Application Unavailability Predictable Response for Unpredictable: Find the problem earlier Response for Predictable: Avoid incident with proactive action Accelerate the problem fix Unpredictable 5

Increasing the Predictable Portion Unpredictable What would be the impact on: 1. Your IT staff? 2. Your Employees? 3. Your Customers? Predictable 6

Seeing Threats to Continuous Availability Question: Which has better intelligence to avoid outages: A 20 thousand Dollar automobile; or A 20 million Dollar mainframe? 7

SLA Performance IT Infrastructure Availability Monitoring Today Your existing monitors look at symptoms here, only after users experience problems Easy to get, but is an effect, not a cause Response Time Time IntelliMagic 2014 8

SLA Performance Monitoring with Availability Intelligence Availability Intelligence identifies risk here, before response time suffers Easy to get, but is an effect, not a cause Response Time Sub-component Saturation Time Requires evaluating every data point with expert domain knowledge about every component IntelliMagic 2014 9

SLA Performance Changing the Outcome - Avoiding Disruptions Most infrastructure fires can be prevented by intervening here Time Response Time Sub-component Saturation IntelliMagic 2014 10

Maintaining IT Availability Today: Two States Focus Level Brain State Little Free s Full Engaged Panic Disengaged 11

With Availability Intelligence: A New 3 rd State Focus Level Brain State Little Free Full Engaged Panic Disengaged 12

What is Availability Intelligence? What: Foreknowledge about hidden threats to availability Why: To better protect continuous availability at primary site by 1. Avoiding incidents (make more of them predictable) 2. Accelerating the resolution (reduce MTTR) How: Use built-in expert domain knowledge in automatic analysis of the performance and configuration data 13

Expert Knowledge & How to Use it For Availability Intelligence, it is not enough to have: Easier, nicer graphs Statistical analysis (as is common with IT Operations Analytics) Instead, it requires: Detailed knowledge about specific hardware components in use Best practices to configure, manage infrastructure components Calculate new, meaningful metrics out of the raw data Good or Bad? How to asses and rate the risk in the infrastructure How to visualize the risk and problems in the infrastructure 14

Example: Foreknowledge of Hidden Threats Inside the Storage Arrays Lead Measures: Lead Measures: Within Array Between Arrays Application Workloads Config or Failure Imbalance? Changes? Adapter Utilization FICON Errors Disk Device Loads FW Bypass, etc. Front-end Back-end, Cache Lag Measure: Storage Array Response Times 15

7 Key Areas to Apply Expert Knowledge to SMF/RMF Machine- Generated Data Domain Knowledge, Expertise 1. Collect 2. Normalize Apply Infrastructure knowledge and Availability expertise Availability about HW/SW Intelligence is applied in each step Automation 6. Recommend 3. Enrich 5. Rate 4. Assess 7. Visualize Benefits 1. Avoid Incidents 2. Accelerate fixes Sample actions: Rebalance work Fix lost redundancy Isolate change Correct error Hardware upgrade 16

Automating the Application of Expert Knowledge Assessing risk every interval, for every device, in every data center Automated application of expert knowledge to the data using all 7 areas is the only way to continually execute the ITIL v3 definition Capacity Management: The Process responsible for ensuring that the Capacity of IT Services and the IT Infrastructure is able to deliver agreed Service Level Targets in a Cost Effective and timely manner considers all Resources required to deliver the IT Service... 17

IntelliMagic Industry Leadership in Availability Intelligence Solutions: Provides new visibility of threats to continuous availability using built-in expert knowledge to interpret the data More than 20 years of solutions for deep infrastructure analysis Privately held, financially independent Customer centric, responsive Solutions used daily in some of the world s largest data centers 18

IntelliMagic Vision for z/os: 3 Modules 1. z/os Systems Processors, WLM, Coupling Facility, XCF, Jobs/Datasets 2. z/os Disk Supports every Disk vendor and configuration FICON, Replication, Jobs, Datasets, Storage groups, GDPS 3. z/os Tape/Virtual Tape IBM TS7700, Oracle StorageTek VSM Next year: EMC DLm 19

Availability Intelligence: a Good Fit for SaaS Frequently updated hardware knowledge Very quick time to results (~24 hours) Okay for security - no PII in infrastructure measurement data Easy dissemination of intelligence reports Easy access to expert consultants 20

Data Center Rollups of Key Risk Indicators Disk Storage Systems Performance Metrics Highest Rating for this Dashboard Key Risk Indicators IntelliMagic 2014 21 Consolidate individual ratings on infrastructure resources into data center views to see risk across enterprise at a glance 21

Visualizing Risk to Continuous Availability No Border, No Rating Yellow Border, Early Warning Red Border, Performance Exceptions Green Border, Good What does the data mean for your infrastructure availability? Automatic rating of key metrics according to built-in expert knowledge, to obtain intelligence about threats you can use to protect availability 22

Rating the Risk using Expert Domain Knowledge Based on straight thresholds where appropriate (like hardware limits) Based on dynamic thresholds where the limits also depend on workload characteristics 23

DASD Infrastructure Example: Avoiding disruption to production service levels 24

Disk Storage System Dashboard [rating: 0.49] Rating based on DSS data using DSS Thresholds Response Time on first storage array is rated green no discernable problem to end-users yet. But a threat to availability exists in an underlying metric (back-end disk drive read response rate) 25

Response Time (ms) [rating: 0.00] Rating based on DSS data using DSS Thresholds Response time is a lag measure But seeing it plotted against the dynamic thresholds (grey backgrounds) is useful to have an idea of what can be expected for that type of workload on that particular array configuration 26

Breakdown of Response Time Components (ms) Breakdown of response time into its components allows identification of the largest contributors 27

Disconnect (ms) [rating: 0.00] Rating based on DSS data using DSS Thresholds Overall, Disconnect Time is not yet out of range for this array 28

Disconnect time components (ms) Built-in knowledge enables a further breakdown of disconnect time into its components 29

Drive Read Response (ms) [rating: 0.49] Rating based on DSS data using DSS Thresholds What was identified on the exception report is a deeper issue: Back-end drives are starting to become saturated. With minimal workload growth, this will soon show up in response time and impact production users 30

Cost Effective Remediation Example: Holistic Evaluation (CPU vs. IO) 31

Using and Delay components per Service Class (%) (top 20) for all Service Classes by Service Class Faster job execution is required. Question: For the select service class(es), is it cheaper to obtain the needed performance win with upgraded CPU or storage? 32

ms Approx 65% of Time is Using/Waiting on DASD 4 Average Response Time Components for Entire Subsystem 3.5 Is it the time spent waiting on DASD already the best in class, or is there room for improvement? 3 2.5 2 1.5 1 0.5 0 0:30 0:45 1:00 1:15 1:30 1:45 2:00 2:15 2:30 IOSQ Pending Connect Disconnect 33

Comparing Options for Run Time Improvement CPU Using CPU Delay DASD Using & Delay Total Seconds Run Time savings Before 1196 1523 3915 6634 na Results of Modeling: 1. CPU Upgrade 2.Storage Upgrade 416 265 3915 4596 15% 1196 1523 1027 3746 44% 1. upgrading CPU to best available vs. 2. upgrading storage to next generation 34

Availability intelligence uses expert knowledge in interpretation of the data Offers new protection of continuous availability at the primary site to: 1. Avoid Service Disruptions 2. Accelerate Fixes Conclusion Fast and easy to prove at your site with a low commitment contract for IntelliMagic Vision as a Service Any sufficiently advanced technology is indistinguishable from Magic Arthur C. Clarke, 1962 35

Join us in San Antonio for the 2015 CMG Conference! Save the dates: November 2 nd to 5 th at The St. Anthony in downtown San Antonio 3 blocks to both the Alamo and the Riverwalk