White Paper. NEC Invariant Analyzer with Oracle Enterprise Manager

Similar documents
solution brief NEC Remote Managed Services Prevent Costly Communications Downtime with Proactive Network Monitoring and Management from NEC

solution brief NEC Secure Network Maintenance Programs NEC Secure: Simplify Network Maintenance with an Expert, Dedicated Service Provider

How To Manage Your It From A Business Perspective

UC Desktop Suite. NEC Corporation of America

CA Database Performance

Business Cloud Services Contact Center

Clinical Workflow Solutions EXTENSION HealthAlert

UC for Business Unified Messaging for Microsoft Exchange Integration

Proactive Performance Management for Enterprise Databases

Kaseya Traverse. Kaseya Product Brief. Predictive SLA Management and Monitoring. Kaseya Traverse. Service Containers and Views

Datasheet FUJITSU Cloud Monitoring Service

HP SiteScope software

How To Use An Orgs.Org Database With An Orgorora Cloud Management Pack For Database (For Cloud)

Network Management and Monitoring Software

Business Cloud Services

VMware Virtualization and Cloud Management Overview VMware Inc. All rights reserved

& USER T ECH.C W WW. SERVICE

How To Use Ibm Tivoli Monitoring Software

Simplified Management With Hitachi Command Suite. By Hitachi Data Systems

APPLICATION MANAGEMENT SUITE FOR SIEBEL APPLICATIONS

How To Make A Successful Business From A Profit

Oracle Enterprise Manager 13c Cloud Control

UNIVERGE SV8000 Series. Fulfilling the promise of UNIVERGE 360

Aternity Virtual Desktop Monitoring. Complete Visibility Ensures Successful VDI Outcomes

CA Insight Database Performance Monitor for Distributed Databases

Brocade Network Monitoring Service (NMS) Helps Maximize Network Uptime and Efficiency

Build A private PaaS.

Capacity Planning Use Case: Mobile SMS How one mobile operator uses BMC Capacity Management to avoid problems with a major revenue stream

ORACLE CLOUD MANAGEMENT PACK FOR ORACLE DATABASE

Kaseya White Paper Proactive Service Level Monitoring: A Must Have for Advanced MSPs

End-to-end Service Level Monitoring with Synthetic Transactions

CA NSM System Monitoring Option for OpenVMS r3.2

Monitoring Microsoft Exchange to Improve Performance and Availability

Predictive Intelligence: Identify Future Problems and Prevent Them from Happening BEST PRACTICES WHITE PAPER

White paper: Unlocking the potential of load testing to maximise ROI and reduce risk.

Citrix desktop virtualization and Microsoft System Center 2012: better together

CA NSM System Monitoring. Option for OpenVMS r3.2. Benefits. The CA Advantage. Overview

Monitoring, Managing and Supporting Enterprise Clouds with Oracle Enterprise Manager 12c Name, Title Oracle

Five Essential Components for Highly Reliable Data Centers

Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module

Top Purchase Considerations for Virtualization Management

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

Management Packs for Database

How To Monitor Hybrid It From A Hybrid Environment

Red Hat Network: Monitoring Module Overview

can you improve service quality and availability while optimizing operations on VCE Vblock Systems?

HP End User Management software. Enables real-time visibility into application performance and availability. Solution brief

THE CONVERGENCE OF NETWORK PERFORMANCE MONITORING AND APPLICATION PERFORMANCE MANAGEMENT

effective performance monitoring in SAP environments

Cloud Infrastructure Services for Service Providers VERYX TECHNOLOGIES

UC Suite FOR UNIVERGE SV9100. Smart Communications for Small and Medium Business. necam.com

Optimizing Business Continuity Management with NetIQ PlateSpin Protect and AppManager. Best Practices and Reference Architecture

Implement a unified approach to service quality management.

BB2798 How Playtech uses predictive analytics to prevent business outages

IBM Tivoli Directory Integrator

Oracle Enterprise Manager 12c Microsoft SQL Server Plug-in version

Riverbed OPNET AppInternals Xpert PRODUCT BRIEF

Redefining Infrastructure Management for Today s Application Economy

An Oracle White Paper July, Managing Oracle Business Intelligence Enterprise Edition using Enterprise Manager Cloud Control 12c

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

How To Monitor Your Entire It Environment

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

G DATA TechPaper #0275. G DATA Network Monitoring

MICROSOFT EXCHANGE MAIN CHALLENGES IT MANAGER HAVE TO FACE GSX SOLUTIONS

ORACLE OPS CENTER: PROVISIONING AND PATCH AUTOMATION PACK

How To Test For Elulla

An Oracle White Paper February Oracle Data Integrator 12c Architecture Overview

HRG Assessment: Stratus everrun Enterprise

STEELCENTRAL APPINTERNALS

Operations Management for Virtual and Cloud Infrastructures: A Best Practices Guide

An Oracle White Paper June, Enterprise Manager 12c Cloud Control Application Performance Management

An Oracle White Paper November Backup and Recovery with Oracle s Sun ZFS Storage Appliances and Oracle Recovery Manager

Proactively manage the capacity, availability, and consumption for all digital and cloud services.

Evolution from the Traditional Data Center to Exalogic: An Operational Perspective

Tidal Enterprise Scheduler and Microsoft Windows

Consulting Services for Veritas Storage Foundation

White Paper. How to Achieve Best-in-Class Performance Monitoring for Distributed Java Applications

WEBLOGIC SERVER MANAGEMENT PACK ENTERPRISE EDITION

Best Practices for Optimizing Storage for Oracle Automatic Storage Management with Oracle FS1 Series Storage ORACLE WHITE PAPER JANUARY 2015

XpoLog Center Suite Log Management & Analysis platform

Security Operations Metrics Definitions for Management and Operations Teams

Aternity Desktop and Application Virtualization Monitoring. Complete Visibility Ensures Successful Outcomes

The First Complete Cloud Management Solution with Oracle Enterprise Manager. Jean Pierre van Tiggelen EMEA Senior Sales Director Manageability

Planning the Migration of Enterprise Applications to the Cloud

CROSS INDUSTRY PegaRULES Process Commander. Bringing Insight and Streamlining Change with the PegaRULES Process Simulator

Datasheet FUJITSU Software ServerView Cloud Monitoring Manager V1.0

Best Practices for Managing Virtualized Environments

Transcription:

NEC Invariant Analyzer with Oracle Enterprise Manager

Table of Contents Introduction... 3 Proactive Performance Analysis... 3 Overview of Oracle Enterprise Manager... 4 Oracle Enterprise Manager Cloud Control Framework... 4 Overview of NEC Invariant Analyzer... 4 Benefits of Using NEC Invariant Analyzer... 5 Proof of Concept (PoC): NEC Invariant Analyzer with Oracle Enterprise Manager... 5 Set Up... 5 Configurations... 6 Findings... 6 Detection of Unrecognized Anomaly... 6 Isolating Performance Bottleneck Analysis... 7 Conclusion... 8 NEC Corporation 2012 2

Introduction This white paper documents the Proof of Concept (PoC) using NEC Invariant Analyzer (IA), a behavior learning analytics-based technology, with Oracle Enterprise Manager, an application and systems management solution, to monitor performance of a large IT deployment, which detected performance anomalies and assisted in root cause analysis. Included in this document is a description of how to set up NEC Invariant Analyzer with Oracle Enterprise Manager. The target system was a collaboration application deployed enterprise wide, supporting tens of thousands of users worldwide with access to email, calendar and file sharing. NEC Invariant Analyzer used performance data collected by Oracle Enterprise Manager, from the target system including 16 database servers and 98 storage devices. In total, 30,000 metrics were collected. During this PoC, NEC Invariant Analyzer identified anomalies and suspected performance bottlenecks, helping to augment conventional threshold monitoring methods and identifying potential hot spots for systems and application administrators to make adjustments and fine tune the thresholds in their enterprise monitoring tools. Today s IT applications support mission critical operations or customerfacing, revenue generating business processes for many enterprises. This is an evolution from previous implementations where applications were perhaps less complex and housed in a single or a few physical servers located in the same data center. Today, with the adoption of SOA, on-premise or cloud-based applications representing critical business processes are composite and can span many systems, over several data centers, with some possibly provided by external parties such as service providers. In these architectures, conventional performance management methods based on threshold setting of key IT metrics such as CPU utilization, I/O statistics, or network traffic offer a limited view into the overall health of the application and the implications on how well the application is supporting the underlying business processes. It is often a challenge for IT professionals to proactively address performance issues if their tools fail to provide them with accurate guidance on early detection of pending performance issues and possible root cause of failures. Proactive Performance Analysis: In increasingly complex IT environments, using advanced system management solutions such as Oracle Enterprise Manager is key to ensuring stable operation and minimizing application performance degradation and system downtime. However, even in a well-managed and well-maintained environment, gradual degradation of service levels, as opposed to sudden disruption of services are difficult to detect by conventional threshold monitoring. Often, a system actively being monitored yields no helpful error messages to provide forewarning about the underlying systems that is being taxed and about to exceed the threshold set forth by the administrators. Log files and performance metrics often fail to detect that the systems are about to fault because some types of anomalies can occur without warning and can go undetected. Systems administrators are caught off-guard by the undetected anomalies which if unaddressed can cause serious system disruptions or affect user experience. Oracle Enterprise Manager can help system administrators identify problems that arise from the systems being managed and raise the root cause to those problems using the Incident Manager console. In tight economic times where IT budgets are stretched, the ratio of systems being managed to the number of systems administrators has been increasing steadily. Systems administrators need tools to automate the detection of problems across a broad set of IT assets, including infrastructures such as server, networking, and storage to platforms such as applications, middleware and databases. Manually monitoring standalone metrics data and failing to leverage automation of performance monitoring and analysis can lead to inefficient use of IT resources. With the adoption of cloud computing, IT systems formerly deployed into isolated silos are now networked together to form shared services such as Infrastructure as a Service (IaaS) or Platform as a Service (PaaS). As the adoption of IaaS and PaaS becomes more prevalent, management suites like Oracle Enterprise Manager can provide a holistic view of the entire application and infrastructure stack. Visibility and understanding of the dependencies and overall management of the entire stack become imperative. Separate application management and infrastructure management solutions are a challenge for IT professionals both for implementation and ongoing management. Having one complete solution such as Oracle Enterprise Manager provides the application and database administrators a set of common workflows such as provisioning the assets, tracking configuration changes, applying systems and software updates, implementing cloud management such as metering and chargeback, and dynamic policybased resource allocation across the shared services. As a result, although the task of managing multiple domains can be daunting, system administrators are better equipped to manage effectively the overall system and application stack. NEC Corporation 2012 3

Overview of Oracle Enterprise Manager Oracle Enterprise Manager enhances the value of business applications by providing a centralized systems and application management solution to reduce IT silos and provide a business approach to IT Management. Three important ways in which Oracle Enterprise Manager accomplishes its mission of providing business context for IT are: A complete cloud lifecycle management solution which allows customers to quickly setup, manage and support enterprise clouds and traditional Oracle IT environments, from application to disk Maximum return on IT management investment through intelligent management of the Oracle stack and engineered systems, with real-time integration of Oracle s knowledgebase with each customer environment Unsurpassed service levels for traditional and cloud applications through business-driven application management In addition, connectivity between the Oracle Management Server (OMS) and My Oracle Support allows customers to get more timely information about health checks and updates and to work more effectively with My Oracle Support. Oracle Enterprise Manager Cloud Control Framework The architecture of Oracle Enterprise Manager Cloud Control consists of three tiers: 1. Repository Database: Oracle Enterprise Manager has a data store, which serves to gather all the data for the targets being monitored. 2. Oracle Management Server (OMS): The OMS is the central part of Oracle Enterprise Manager. All information flows through the OMS. Every data point gathered by the agents and all requests made by the administrator go through this critical tier. 3. Agents: the agent is responsible for data collection and remote task execution. Overview of NEC Invariant Analyzer Recognizing the need for more effective, more proactive IT performance management solutions, NEC developed a behavior learning analyticsbased technology to facilitate the monitoring and analysis of performance issues. NEC Invariant Analyzer is the system performance analysis solution delivering such advanced capabilities. Invariant Analyzer looks for consistent, time invariant correlations between performance metrics of an IT infrastructure during normal operations. For example, there might be a correlation between input traffic to web server and the CPU load of an application processing network data from the web server. When input traffic increases, the CPU usage by the application also increases; conversely. When input traffic decreases, the CPU usage by the application also decreases. Such correlation is always valid (time invariant) under normal system operations regardless of changes in the workload. Invariant Analyzer detects and uses such consistent (invariant) correlation between the performance metrics. Figure 2: Invariant relationship between two metrics during normal behavior Figure 1: Oracle Enterprise Manager Cloud Control Architecture Figure 3: Broken relationship between two metrics during abnormal behavior NEC Corporation 2012 4

NEC s Invariant Analyzer uses performance data collected across multiple IT domains during normal operation period and looks for invariant correlations to build a reference model. To build the model, Invariant Analyzer checks and excludes irrelevant or redundant performance metrics such as constant values or dropped data. During analysis, performance metrics are monitored against the reference model and any deviations from the model forecast are identified. As long as performance metrics data and the reference model forecast are consistent, the system is operating normally. When there is significant deviation between performance metrics data and the reference model forecast, a broken invariant is determined and a large number of broken invariants could indicate a system behavior anomaly. Increasing system behavior anomaly can predict potentially serious impending system failure. to detect performance issues even when no error messages are being generated. In addition, Invariant Analyzer is agnostic to the methods used to collect performance data or the origin of the data. As long as the performance data is a numerical value and periodically collected, Invariant Analyzer can detect broken invariant relationships relative to the reference model, thereby detecting system behavior anomalies. Invariant Analyzer is built upon the premise that there are no constraints on type of hardware or software components. NEC believes any type of performance data from different layers and across IT domains can be used to build the reference model and to monitor performance. The basic method to import performance data to Invariant Analyzer is to convert performance data into CSV files and import. Invariant Analyzer can also import performance data from generic monitoring tools, such as Windows Performance Monitor or Linux sar, without converting into CSV files. Invariant Analyzer has the capability to perform real time analysis by importing performance data from NEC s system monitoring tools or third party systems management tools. With the planned integration with Oracle Enterprise Manager s Connector Framework, Invariant Analyzer can perform real time performance monitoring and analysis through the real time connectors in development for Oracle Enterprise Manager. Figure 4: NEC Invariant Analyzer detects and displays broken invariants The analysis result shows detected system behavior anomalies and associated broken invariants. Invariant Analyzer thus informs the system administrator on occurrence time, severity, and distribution of broken invariants. In addition, Invariant Analyzer can recognize broken invariant patterns against past identified patterns, enabling faster troubleshooting of similar failures, based on past actions taken to address these failures. Benefits of Using NEC Invariant Analyzer Invariant Analyzer results are independent of individual s experience, knowledge, or intuition. Even for large scale systems, Invariant Analyzer automatically detects system behavior anomalies based on computed comparison with a reference model. This new and innovative approach differs from conventional threshold monitoring and makes it possible Proof of Concept (PoC): NEC Invariant Analyzer with Oracle Enterprise Manager This section describes the Proof of Concept (PoC) project conducted in Oracle environment using NEC Invariant Analyzer with Oracle Enterprise Manager to monitor performance of large IT deployment, detect performance issues and assist in root cause analysis. Set Up Comprised of 16 database servers and 98 storage devices, the target system was a collaboration application deployed enterprise wide, supporting tens of thousands of users worldwide with access to email, calendar and file sharing. The application is multi-tier and composed of a web-tier, database-tier and storage-tier. Oracle Enterprise Manager is Oracle s application and system management solution to manage IT environments including Oracle database, middleware, applications, infrastructure, and engineered systems. Oracle Enterprise Manager collects performance metrics from the target application system at predefined intervals throughout the day. NEC Corporation 2012 5

NEC Invariant Analyzer with Oracle Enterprise Manager Figure 5: Target System Overview Performance metrics including throughput, I/O, and latency were There are approximately 30,000 metrics from database servers (host, extracted from Oracle Enterprise Manager console that monitors the database instance, listener, and ASM instance) and storage devices. production environment. Scripts were scheduled for data extraction Invariant Analyzer built an invariant model from a total of 30,000 x every three days, from the Oracle Enterprise Manager Repository over 30,000 = 900,000,000 combinations and analyzed abnormal behavior. a period of one month. From this performance data, Invariant Analyzer established a baseline model defining normal operation for that environment. Findings Configurations NEC Invariant Analyzer identified suspect anomalies and performance NEC Invariant Analyzer is able to accept various metrics as input bottleneck, which are typically difficult to detect through conventional data, including output of UNIX sar command, event log of Windows threshold monitoring alone. OS, and SNMP data from network devices. Invariant Analyzer is also Detection of Unrecognized Anomaly able to import data files in CSV format. Invariant Analyzer does not currently have the capability to import metrics data directly from Oracle Enterprise Manager so CSV files are used. NEC Invariant Analyzer detected anomalies that did not violate existing Oracle Enterprise Manager thresholds. NEC Invariant Analyzer is agnostic to monitoring threshold generally utilized by monitoring The metrics data for the target system is held in an Oracle Enterprise solutions such as Oracle Enterprise Manager. Therefore, Invariant Manager Repository database. Invariant Analyzer imports the data in Analyzer can recognize invariants that do not correspond to threshold- two steps. The first step is to extract raw data from the repository by based monitoring. In this first case, IA detected anomalies that would SQL. The second step is to convert raw data into CSV file format, and not normally register as alarms in Oracle Enterprise Manager. import the files to Invariant Analyzer. Figure 7: Anomaly graph in invariant Analyzer shows anomalies during abnormal period Figure 6: Data Flow from Oracle Enterprise Manager to NEC Invariant Analyzer NEC Corporation 2012 6

Approximately 30,000 metrics were collected and used for analysis. For the reference model, 288 time samples were used (at 15 minute intervals over three days, or 4 per hour x 24 hours x 3 days = 288) which, after analysis resulted in 3.6 million invariants out of the 900 million maximum (30,000 x 30,000) possible invariants extracted by Invariant Analyzer. In building the reference model, Invariant Analyzer checks and eliminates irrelevant or redundant metrics, e.g. constant values, dropped data to optimize analysis. With the large number of correlations from the automatic compilation of performance metrics, Invariant Analyzer can analyze more combinations and spot invariants more effectively than typical conventional methods whereby administrators selectively monitor a few dozen metrics and then analyze them manually. Anomaly Ranking Storage_1 Storage_8 Detail of each relationship Celldisk Large Road Latency of Storage_2 Read throughput of Storage_1 Estimate Residence Latency of Celldisk is surged Throughput is lower than expected Isolating Performance Bottleneck Analysis In this second case, Oracle Enterprise Manager generated alarms on 16 servers which had CPU load over 80% - the set thresholds for all 16 servers - during some periods of time. The load of cluster database was automatically balanced, however Invariant Analyzer identified one specific server with a Run Queue Length nearly double of other DB hosts during the analyzed period. One of the reasons that this server experienced such a high Run Queue may be partly due to the backup server running on last two nodes. Approximately 3,000 DB host and DB instance metrics were collected and used for analysis. For making the model, 192 time points were used (at 15 minutes intervals over two sequential days or 4 per hour x 24 hour x 2 days = 192), resulting in 5,500 invariants being extracted by Invariant Analyzer. Anomaly Ranking Detail of each relationship Estimate Run Queue Length of DB_Host_1 Run Queue Length of DB_Host_5 Blue: Actual Value Run Queue Length surge Gap between actual and forecast Blue: Actual Value Green: Estimated Value Residence Actual Value - Estimated Value Broken invariant concentrates to Storage_1 Figure 8: Detail of Anomaly Correlation - Incident 1 Invariant Analyzer detected a variance over time between read throughput and read latency for a specific storage server in the storage cluster. The typical expected behavior is a linear relationship between the two metrics, however the analysis result showed the read latency being much higher than its forecasted value (from the reference model). Because the read latency was not extreme, thresholds set were not violated therefore escaping detection by conventional monitoring alone. Indeed, this failure was missed by administrators because there was no threshold transgression and no alarm generated. As described above, Invariant Analyzer detected the unusual behavior of a particular storage server where there was a variance between read throughput and read latency. The combination of these high read latency and low read throughput may be indicative of a performance degradation of the storage server. As this phenomenon can be observed when the disk causes random access instead of sequential access, a system administrator could investigate this server further for unusual workload, possibly periodic and related to end of month activities. Invariant between throughput and latency is broken Broken invariants of Run Queue Length are in ranking US login storm start between 14:00hr 16:00hr on the graph Figure 9: Detail of Anomaly Correlation - Incident 2 Invariant Analyzer detected variations during the daily login period from the US and EMEA regions. The number of simultaneous logins created a surge of CPU activity, and the number of requests to the DB hosts affected the performance of the target system. Monitoring the DB instances alone may fail to detect the unbalanced workload DB hosts. Invariant Analyzer detected unusual behavior of DB system during EMEA login time period and US region login period over several days. These correspond to login storm period from EMEA and US region. As all 16 servers CPU load was over 80% threshold during login storms, alerts related to all servers were generated. During this period, the Run Queue Length of two DB server hosts showed abnormal behavior which was not shown at other DB hosts even though automatic load balancing was activated during login storms. Further Invariant Analyzer analysis showed correlations between CPU utilization of these hosts and the throughput of DB instances, and these correlations were not broken during the period. With this data, a system administrator can investigate these DB hosts and what is causing the long run queue. With longer Run Queue Length, the response time for users allocated to these hosts must be slower than for users allocated NEC Corporation 2012 7

to other hosts. As performance data of application was not included in the PoC, it was impossible to do further cross domain analysis. The available analysis results pointed to a possible bottleneck giving a system administrator a first clue to investigate. Conclusion Today s IT applications have become critical to the enterprise s revenue cycle management and any errors could result in hindered payments or delayed transactions. With applications tied to an enterprise s revenue generation workflows, the performance of the traditional and cloud application becomes paramount. The purpose of this PoC is to show how to use NEC Invariant Analyzer with Oracle Enterprise Manager to assist in root cause analysis for a mission-critical collaboration application. Within the target system environment, analysis was done offline through exporting metrics from Oracle Enterprise Manager and importing the data into NEC Invariant Analyzer. In two separate cases during the PoC, NEC Invariant Analyzer detected anomalies or invariants to the systems being monitored that may warrant further investigation. Oracle Enterprise Manager already offers a robust portfolio for application management. Oracle Enterprise Manager tools provide automation, orchestration, monitoring and management to IT professionals so that they can focus on business-level activities by providing insights into complex business processes and visibility into business metrics from the application down to individual transaction level (known as Business Transaction Management). When integrated with Oracle Enterprise Manager, NEC Invariant Analyzer can be a complementary solution and extend Oracle Enterprise Manager s capabilities. In the future, an IA bi-directional data exchange connector can extend Oracle Enterprise Manager capabilities. For any questions please send inquiries to: ia@necam.com. Corporate Headquarters (Japan) NEC Corporation www.nec.com Oceania (Australia) NEC Australia Pty Ltd www.nec.com.au North America (USA & Canada) NEC Corporation of America www.necam.com Asia NEC Corporation www.nec.com Europe (EMEA) NEC Philips Unified Solutions www.nec-philips.com About NEC Corporation of America Headquartered in Irving, Texas, NEC Corporation of America is a leading provider of innovative IT, network and communications products and solutions for service carriers, Fortune 1000 and SMB businesses across multiple vertical industries, including Healthcare, Government, Education and Hospitality. NEC Corporation of America delivers one of the industry s broadest portfolios of technology solutions and professional services, including unified communications, wireless, voice and data, managed services, server and storage infrastructure, optical network systems, microwave radio communications and biometric security. NEC Corporation of America is a whollyowned subsidiary of NEC Corporation, a global technology leader with operations in 30 countries and more than $42 billion in revenues. For more information, please visit www.necam.com. WP11017 v.03.12.12 2012 NEC Corporation. All rights reserved. NEC, NEC logo, and UNIVERGE are trademarks or registered trademarks of NEC Corporation that may be registered in Japan and other jurisdictions. Oracle and Java are registered trademarks of Oracle and/or its affiliates. All trademarks identified with or are registered trademarks or trademarks respectively. Models may vary for each country. Please refer to your local NEC representatives for further details.