An Enterprise Dynamic Thresholding System

Similar documents
VMware Virtualization and Cloud Management Overview VMware Inc. All rights reserved

Virtualization Technologies. Embrace the new world of healthcare

BMC ProactiveNet Performance Management: Delivering on the Promise of Predictive Control Across the Total IT Environment SOLUTION WHITE PAPER

vrealize Operations Manager User Guide

VMware's Cloud Management Platform Simplifies and Automates Operations of Heterogeneous Environments and Hybrid Clouds

Session 1: Managing Software Licenses in Virtual Environments. Paul Baguley, Principal, Advisory Services KPMG

Splunk for VMware Virtualization. Marco Bizzantino Vmug - 05/10/2011

vsphere with Operations Management (vsom) and vcenter Operations (vcops)

1. Simulation of load balancing in a cloud computing environment using OMNET

Goliath Performance Monitor Prerequisites v11.6

SolarWinds Virtualization Manager

CA Virtual Assurance for Infrastructure Managers

7/15/2011. Monitoring and Managing VDI. Monitoring a VDI Deployment. Veeam Monitor. Veeam Monitor

Avoiding Performance Bottlenecks in Hyper-V

Citrix XenDesktop & XenApp

VMware vcenter Operations Manager for Horizon Supplement

simplify monitoring Environment Prerequisites for Installation Simplify Monitoring 11.4 (v11.4) Document Date: January

Optimization of QoS for Cloud-Based Services through Elasticity and Network Awareness

Comprehensive Monitoring of VMware vsphere ESX & ESXi Environments

Proactive VDI Performance Monitoring

Course Title: Virtualization Security, 1st Edition

The New Virtualization Management. Five Best Practices

ANALYTICS IN BIG DATA ERA

Stop the Finger-Pointing: Managing Tier 1 Applications with VMware vcenter Operations Management Suite

SolarWinds Virtualization Manager

End to End Security do Endpoint ao Datacenter

Boost your VDI Confidence with Monitoring and Load Testing

Asigra Cloud Backup V13.0 Provides Comprehensive Virtual Machine Data Protection Including Replication

Virtual Infrastructure Security

GRAVITYZONE HERE. Deployment Guide VLE Environment

How To Protect A Virtual Desktop From Attack

CA Virtual Assurance/ Systems Performance for IM r12 DACHSUG 2011

HP Business Service Management 9.2 and

How to Select a Virtualization Management Tool

Black-box Performance Models for Virtualized Web. Danilo Ardagna, Mara Tanelli, Marco Lovera, Li Zhang

How to Migrate Citrix XenApp to VMware Horizon 6 TECHNICAL WHITE PAPER

VMware vcenter Operations Manager Administration Guide

Research Challenges in Virtualization. Steven Hand Senior Architect, Citrix R&D Reader in Computer Systems, U. Cambridge

VMsources Group Inc

Integrated Data Protection for VMware infrastructure

Citrix EdgeSight User s Guide. Citrix EdgeSight for Endpoints 5.4 Citrix EdgeSight for XenApp 5.4

VDI FIT and VDI UX: Composite Metrics Track Good, Fair, Poor Desktop Performance

TRIPWIRE REMOTE OPERATIONS: STOP OPERATING, START ANALYZING

VDI Without Compromise with SimpliVity OmniStack and Citrix XenDesktop

Aternity Virtual Desktop Monitoring. Complete Visibility Ensures Successful VDI Outcomes

VMware vcenter Operations Manager Enterprise Administration Guide

CA NSM System Monitoring Option for OpenVMS r3.2

Work Smarter, Not Harder: Leveraging IT Analytics to Simplify Operations and Improve the Customer Experience

agility made possible

Vulnerability Management

Server Load Prediction

Citrix XenApp & XenDesktop Troubleshooting Engagement Report

Our Cloud Backup Solution Provides Comprehensive Virtual Machine Data Protection Including Replication

A Technical Overview of VMT s Architecture: Virtual Infrastructure Management. Key Architecture Components

EBS. Remote Infrastructure Managed Services. EBS Ltd. 12, Mihail Tenev Str Sofia Bulgaria office@ebs.bg

CA Virtual Assurance for Infrastructure Managers

ESXi Cluster Services - SLA

PHD Virtual Backup for Hyper-V

APS Connect Denver, CO

HP Service Health Analyzer: Decoding the DNA of IT performance problems

Aternity Desktop and Application Virtualization Monitoring. Complete Visibility Ensures Successful Outcomes

In-Guest Monitoring With Microsoft System Center

Unlimited Server 24/7/365 Support

Executive Summary: Cost Savings with ShutdownPlus Rolling Restart

Profit from Big Data flow. Hospital Revenue Leakage: Minimizing missing charges in hospital systems

Cisco Unified Computing Remote Management Services

Power Management in Cloud Computing using Green Algorithm. -Kushal Mehta COP 6087 University of Central Florida

Strategic Online Advertising: Modeling Internet User Behavior with

Water Network Monitoring: A New Approach to Managing and Sustaining Water Distribution Infrastructure

Application Lifecycle Management White Paper. Source Code Management Best Practice: Applying Economic Logic to Migration ALM

Benchmarking Hadoop & HBase on Violin

High Availability for Citrix XenServer

SIMPLIFYING AND AUTOMATING MANAGEMENT ACROSS VIRTUALIZED/CLOUD-BASED INFRASTRUCTURES

Genetic Algorithms for Energy Efficient Virtualized Data Centers

Proactive and Predictive Virtualization Management Optimizes Datacenter Availability and Utilization

OpsLogix Capacity Reports Management Pack White Paper

Monitoring applications in multitier environment. Uroš Majcen A New View on Application Management.

Virtual Client Solution: Desktop Virtualization

CA NSM System Monitoring. Option for OpenVMS r3.2. Benefits. The CA Advantage. Overview

User-Centric Proactive IT Management

Double guard: Detecting Interruptions in N- Tier Web Applications

BPPM 9.5 Architecture & Scalability Best Practices 2/20/2014 version 1.4

Virtual Desktop Infrastructure Optimization with SysTrack Monitoring Tools and Login VSI Testing Tools

To add Citrix XenApp Client Setup for home PC/Office using the 32bit Windows client.

Transcription:

An Enterprise Dynamic Thresholding System Mazda Marvasti, Arnak Poghosyan, Ashot Harutyunyan, and Naira Grigoryan Presented by: Bob Patten, VMware 2013 2011 VMware Inc. All rights reserved

Agenda Modern IT management challenges The data agnostic method for anomaly detection An enterprise dynamic thresholding system Data categorization approach Category-specific dynamic threshold determination Experimental insights Real-World Customer Use Case 2

Modern IT management challenges Management based on operator domain expertise is no longer efficient Huge distributed cloud systems, virtualized environments Complicated interrelation between the constituent components Business infrastructures are highly dynamic behaviors do not fit classical Gaussian normal distributions Static thresholding of processes and performance indicators become inadequate yielding thousands of un-actionable alerts Manually developed and maintained correlation rules yield no significant benefit to problem identification 3

The data agnostic method for anomaly detection Automatically learns the normal behavior of any time-series metric Makes no assumptions as to the metrics behavior or distribution Calculates an upper and lower bound hourly dynamic threshold Determines normal or abnormal behavior (anomalies) of individual metrics When metrics behave abnormally, additional algorithms and deterministic methods can be applied to determine system abnormality VMware s vcenter Operations Manager (vc Ops) is an industry-leading Big Data solution for IT Management which utilizes the described enterprise dynamic thresholding algorithms 4

An enterprise dynamic thresholding system The monitoring and alerting based on data analysis behind vc Ops 5

Example metric dynamic threshold Weekend/Weekday repeating pattern of normal behavior Resulting Dynamic Thresholds 6

Data categorization approach 7

Data categorization approach: examples Trendy Multinomial 8

Data categorization approach Sparse Regular/Periodic 9

Category-specific DT determination: sparse data Performing data density recognition based on probability calculation that reveals distribution of gaps Random? Uniform? Pattern? Differentiating the following clusters of data: Data Identification: Dense/Sparse (relative to monitoring interval) Data with technical gap (localized gap due to malfunction of monitoring device) Corrupted Data 10

Category-specific DT determination: stable data Statistical stability recognition of data If data is stable or its stable portion can be selected then the data is defined as Stable Data Otherwise data is defined as Corrupted 11

Category-specific DT determination: variability R = iqr( x k N 1 k=1 ) 100%, iqr x N iqr( x k k=1 N ) k k=1 0 12

Similarities (%) Category-specific DT determination: periodicity Periodic data: seeking similar patterns in the historical behavior of time series The notion of the Cyclochart is similar to the frequency spectrum in the Fourier analysis or signal processing 100 90 80 70 60 50 40 30 20 0 5 10 1 5 20 2 5 Periods (Days) 30 35 13

Category-specific DT determination: optimization Statistically trade-off the number of false positive and false negative alerts Two different approaches for determination of DT s via maximization of the objective function Data-range-based analysis g P, S = e ap S S max Data-variability-based analysis 14

Experimental insights A specific customer metric data set Selected 3215 monitored metrics Those metrics represented the essential flows for one of the customer s critical business services Data length is one month Ran metrics through Dynamic Thresholding analytics process Resulting count of periodic/non-periodic/corrupted data Periodic Non- Periodic Corrupted Overall 1511 1595 109 3215 15

Experimental insights Distribution across the categories Data Category Count (Percentage) of Metrics in the Category Multinomial 724 (22.5%) Trendy 165 (5.1%) Semi-Constant 532 (16.5%) Transient 102 (3.2%) Sparse 88 (2.7%) Low-Variability 826 (25.7%) High-Variability 669 (20.8%) Corrupted 109 (3.4%) 16

A Production Use Case 4 Hour Proactive Notification Production Scenario Citrix Xen Desktop Remote Desktop Environment on Virtual Infrastructure Multiple XenApp Server VMs serve the end-users Remote Desktops Monday morning, March 24 th, significant abnormal behavior XenApp VM owner (Citrix Admin) called at 8:00 AM, returned call at 10:00 AM Initial evaluation by Citrix admin is All OK, end users are not complaining Subsequent investigation yielded a call-back and thank you to Operations A config change in the Citrix env over the weekend was causing orphaned sessions Citrix Admin fixed the error and cleaned up the sessions If Operations had not proactively notified Citrix Admin, end users would have been seriously impacted 17

A Production Use Case XenApp Server Abnormal Behavior 18

A Production Use Case XenApp Server Abnormal Behavior 19

Conclusions Our categorization techniques allow achieving a more accurate Dynamic Threshold for the individual metric By using optimization techniques we achieve optimal balance between false positive and false negative alerts This would not be possible with classical parametric approaches including Fourier transform, and other common purpose enterprise algorithms Moreover, this approach enables other algorithms to be applied to determine system issues with more accuracy. 20