Monitoring Remedy with BMC Solutions



Similar documents
HP OO 10.X - SiteScope Monitoring Templates

Monitoring applications in multitier environment. Uroš Majcen A New View on Application Management.

27 th March 2015 Istanbul, Turkey. Performance Testing Best Practice

Winning the J2EE Performance Game Presented to: JAVA User Group-Minnesota

BMC Service Assurance. Proactive Availability and Performance Management Capacity Optimization

Q&A Session for Understanding Atrium SSO Date: Thursday, February 14, 2013, 8:00am Pacific

<Insert Picture Here> Java Application Diagnostic Expert

SENTINEL MANAGEMENT & MONITORING

Scalability and BMC Remedy Action Request System TECHNICAL WHITE PAPER

WINDOWS SERVER MONITORING

PATROL From a Database Administrator s Perspective

FUNCTIONAL OVERVIEW

Monitoring SAP Business Objects

SQL diagnostic manager Management Pack for Microsoft System Center. Overview

APPLICATION PERFORMANCE MONITORING

Monitoring can be as simple as waiting

SapphireIMS 4.0 BSM Feature Specification

SapphireIMS Business Service Monitoring Feature Specification

Borland Silk Performer Synthetic Transaction Monitoring for BMC Software

White Paper. The Ten Features Your Web Application Monitoring Software Must Have. Executive Summary

SolarWinds Database Performance Analyzer (DPA) or OEM?

MID-TIER DEPLOYMENT KB

MONITORING A WEBCENTER CONTENT DEPLOYMENT WITH ENTERPRISE MANAGER

WHITE PAPER. Domo Advanced Architecture

SSO Plugin. Integration for Jasper Server. J System Solutions. Version 3.6

Monitoring Best Practices for COMMERCE

PTC System Monitor Solution Training

Server & Application Monitor

Application Performance Monitoring for WhatsUp Gold v16.1 User Guide

Load and Performance Load Testing. RadView Software October

Using WebLOAD to Monitor Your Production Environment

BMC Remedy IT Service Management Suite Installing and Configuring Server Groups

Datasheet FUJITSU Cloud Monitoring Service

BPPM 9.5 Architecture & Scalability Best Practices 2/20/2014 version 1.4

Application Performance Monitoring for WhatsUp Gold v16.2 User Guide

Enterprise Manager 12c for Middleware

mbits Network Operations Centrec

A Guide to New Features in Propalms OneGate 4.0

SSO Plugin. Integrating Business Objects with BMC ITSM and HP Service Manager. J System Solutions. Version 4.

Cisco Unified Computing Remote Management Services

Identifying Problematic SQL in Sybase ASE. Abstract. Introduction

Oracle Enterprise Manager 13c Cloud Control

Monitoring IBM HMC Server. eg Enterprise v6

This document will list the ManageEngine Applications Manager best practices

Best Practices Report

Load Balancing and High Availability for BMC Remedy Action Request System. Kelly Deaver, BMC Technical Marketing

OnCommand Performance Manager 1.1

Kaseya Traverse. Kaseya Product Brief. Predictive SLA Management and Monitoring. Kaseya Traverse. Service Containers and Views

Strategies for Monitoring Large Data Centers with Oracle Enterprise Manager. Ana McCollum Consulting Product Manager

University of Edinburgh. Performance audit. Date: Niels van Klaveren Kasper van der Leeden Yvette Vermeer

Monitoring HP OO 10. Overview. Available Tools. HP OO Community Guides

SQL Server Solutions GETTING STARTED WITH. SQL Diagnostic Manager

Monitoring PostgreSQL database with Verax NMS

Delivering Quality in Software Performance and Scalability Testing

STEELCENTRAL APPINTERNALS

Managing your Red Hat Enterprise Linux guests with RHN Satellite

BSM Interoperability Installation and Configuration Guide

System Requirements Table of contents

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

System Administration of Windchill 10.2

WEBAPP PATTERN FOR APACHE TOMCAT - USER GUIDE

Best of Breed of an ITIL based IT Monitoring. The System Management strategy of NetEye

GFI Product Manual. Deployment Guide

Module 15: Monitoring

Improving. Summary. gathered from. research, and. Burnout of. Whitepaper

APPLICATION MANAGEMENT SUITE FOR SIEBEL APPLICATIONS

Management Packs for Database

XpoLog Center Suite Log Management & Analysis platform

Test Run Analysis Interpretation (AI) Made Easy with OpenLoad

Application and Web Load Testing. Datasheet. Plan Create Load Analyse Respond

Application Performance Management for Enterprise Applications

Web Conferencing Version 8.3 Troubleshooting Guide

Why Alerts Suck and Monitoring Solutions need to become Smarter

Network Management and Monitoring Software

pc resource monitoring and performance advisor

Dualog Connection Suite Hardware and Software Requirements

Frequently Asked Questions Plus What s New for CA Application Performance Management 9.7

Siebel & Portal Performance Testing and Tuning GCP - IT Performance Practice

Performance Testing of Java Enterprise Systems

Header 1. John T. Irwin Software Consulting Manager EMEA Managing End User Experience

Developing Value from Oracle s Audit Vault For Auditors and IT Security Professionals

Whitepaper. Business Service monitoring approach

Mark Bennett. Search and the Virtual Machine

Business white paper. HP Process Automation. Version 7.0. Server performance

Proactive and Reactive Monitoring

Solution Brief TrueSight App Visibility Manager

Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations

Service Level Agreement Guide. Operations Center 5.0

MICROSOFT EXCHANGE MAIN CHALLENGES IT MANAGER HAVE TO FACE GSX SOLUTIONS

CA Virtual Assurance/ Systems Performance for IM r12 DACHSUG 2011

Workflow Templates Library

Transaction Performance Maximizer InterMax

A FAULT MANAGEMENT WHITEPAPER

Internet Services. CERN IT Department CH-1211 Genève 23 Switzerland

Enabling ITIL Best Practices Through Oracle Enterprise Manager, Session # Ana Mccollum Enterprise Management, Product Management

Monitoring IBM Maximo Platform

Quest Solution for Application Performance Management

Transcription:

Monitoring Remedy with BMC Solutions Overview How does BMC Software monitor Remedy with our own solutions? The challenge is many fold with a solution like Remedy and this does not only apply to Remedy, but also competing solutions as well as other web based enterprise solutions. Analysis The source of performance (or lack of it) can be attributed to a large variety of factors, some within the software itself, and some within the immediate infrastructure, as well as within the greater environment e.g. the internet. BMC Solutions The following stack of solutions at the present time June 2014 - can be used for full stack monitoring of the solution. It must be noted that these solutions will help pinpoint where the issues are (and usually slow performance is a combination of issues a set of bottlenecks) allowing them to be addressed, and may not in themselves resolve the issue. BMC has multiple modules that can monitor Remedy (7.6.04 is the oldest version we monitor). A complete Remedy stack monitoring should include the following: From the End-user perspective o BMC APM EUEM (web-interface only for http/https traffic), additional watchpoints may be required o Borland Silk Performer Synthetic Transaction Monitoring for BMC Software (newer replacement of TM-ART) Mid-tier/application tier o BMC APM- Application diagnostics o BPPM for Internet Servers (monitors the web server, such as Apache, Microsoft IIS etc.) Remedy Applications (i.e. Incident Management, Change Management etc.) o BMC PATROL Knowledge Module for Remedy AR Server Back-end database (Oracle, Sybase, MS SQL Server etc.) o BPPM for Databases (monitors all databases that Remedy supports) Any Network equipment, such as F5 load balancers o Entuity or any other network monitoring tool. Operating system where remedy is running on (i.e. Windows, Linux, VMs etc.) o BPPM for Servers/Virtual Servers (monitors health of the system, like CPU, Memory, Disk, Remedy processes/services, logs) Hardware platform, storage o BMC Performance Manager for Hardware by Sentry Software 1 P a g e

Some Causes of Performance Issues This is NOT an exclusive list, but illustrates the complexity and some of the points that can cause performance issues. They are not mutually exclusive, quite probably the reverse, with a combination of them causing the solution to be slow. Browser version / type / cache some browsers, especially older versions of those browsers are significantly slower than others. Internet Explorer is slower than Chrome. Caching settings may alter the performance LAN / WAN / Internet connectivity is another recurrent source of performance issues Server set up, clustering, number of users per JVM in the mid-tier configuration Hardware specification and balancing (memory, CPU, storage) Database hardware / configuration indices, field settings, I/O, values and filters Frequency of polling / cron tasks Query efficiency Default queries pulling too many records at one time Reporting requirements and efficiency of queries underlying reports All of the above could have an impact, and can be examined in far greater depth in order to get to a resolution to performance issues The Reality Before looking at an example of how Remedy is monitored it is really important to understand that there is no one solution, and one that may be good today may not be so good tomorrow things change such as (not an exclusive list): Other traffic on the network Additional volume of records Alterations to configuration o With or without change management o New / update reports are written, o New / updated queries are deployed o New functionality deployed Archiving may be performed periodically Use of new / different browsers / caching setting Furthermore, even with just 2 deployments of Remedy, used in similar organizations there are enough potential environmental as well as configuration settings that some monitoring setting that may work in environment 1 will not necessarily be as useful in environment 2. Again, differences could be Geographic coverage Other network traffic Hardware differences (e.g. a different manufacturer s database or version on database is deployed) Different settings (SLA s may be heavily deployed in one environment and not so in another) Volumes of data transacted as well as stored (Archiving may not be taking place) Different reports and KPI s configured with more or less efficient queries Different levels of automation deployed Complexity of the deployment (e.g. Approval process) 2 P a g e

Levels of notifications As a consequence the details below must be viewed as a guideline and no more when determining what to monitor, how to monitor and what solution is used to monitor the system. The details below apply to deploying BMC Software monitoring solutions, but could probably be adapted to other monitoring solutions. This document covers each monitoring at high level for Production Environment. AR SERVER MONITORING: The following OS KM parameters are set to alert when the set thresholds are breached. Windows OS Monitoring Occurrences Incident Ticket Parameters Thresholds Polling Cycle Logical Disks [Free space%] Major Event Critical Event D: < 15% < 10% Immediate Yes 2 mins C: < 15% < 10 % Immediate Yes 2 mins Memory Major Event Critical Event Memory Used in % > 85 % > 95% 11 Yes 2 mins CPU Major Event Critical Event Total Processor Utilization in % > 85 % > 95 % AR Services Status (Up/down) Major Event Critical Event 9 Yes 2 mins BMC Remedy Action Request System Server onbmc-s - Service down BMC Remedy Flashboards Server - onbmc-s - Service down McAfee Framework Service - Service down McAfee McShield - Service down McAfee Task Manager - Service down Remote Procedure Call (RPC) - Service down BMC Remedy Email Engine - onbmc-s 1 - Service down Email monitoring using Email Script for Servers Critical Event 3 P a g e

which has Email engine Running Number of emails that have been incorrectly flagged as delivered - > 0 Time since oldest Pending email - > 900 Seconds Emails that are pending delivery - > 0 Immediate Yes 2 mins Immediate Yes 2 mins Immediate Yes 2 mins AR KM Monitoring Service down Immediate Yes 2 mins LDAP Port Monitoring Port Down Immediate Yes 2 mins TCP Established Immediate Yes 2 mins Connections >3000 Tomcat SSO Down Assignment Engine On Demand Monitoring Approval Engine On Demand Monitoring On Demand Reconciliation Jobs Monitoring 4 P a g e

However, there are lot many other parameters in monitoring which are used for analysis. 5 P a g e

6 P a g e

MID TIER MONITORING: Individual mid tiers are monitored in the same way as AR server are along with the new set of process and service for the Mid-Tier Server. Windows OS Monitoring Parameters Thresholds Occurrences Incident Ticket Polling Cycle Logical Disks [Free space%] Major Event Critical Event D: < 15% < 10% Immediate Yes 2 mins C: < 15% < 10 % Immediate Yes 2 mins Memory Major Event Critical Event Memory Used in % > 85 % > 95% 11 Yes 2 mins CPU Major Event Critical Event Total Processor Utilization in % > 85 % > 95 % 9 Yes 2 mins Mid-tier Services Status (Up/down) Major Event Critical Event Apache Tomcat Tomcat6 - Service down McAfee Framework Service - Service down McAfee McShield - Service down McAfee Task Manager - Service down Remote Procedure Call (RPC) - Service down TCP Established Connections >3000 Immediate Yes 10 mins 7 P a g e

There are lot of other parameters monitored using the OS KM as shown below. 8 P a g e

DASHBOARD AND ANALYTICS SERVER MONITORING: Along with the standard monitoring of the OS following service and processes are monitored for the Dashboard servers: Processes Status ( Up/down) Major Event Critical Event Occurrences Incident Ticket Polling Cycle CIA NA Process down Dash board Services Status (Up/down) Major Event Critical Event Apache Tomcat NA Service down BOE120MySQL NA Service down BMC Atrium DIL Repository NA Service down 9 P a g e

McAfee Framework Service NA Service down McAfee McShield NA Service down McAfee Task Manager NA Service down Remote Procedure Call (RPC) NA Service down BMC Atrium DIL Server NA Service down Server Intelligence Agent (onbmc_ada) NA Service down Report Execution On Demand A configured a report which is executed at regular intervals to identify if there is an issue with the BO and DB. F5 LOAD BALANCER MONITORING USING CUSTOMIZED PATROL KM: With the help of F5 load balancer KM, we are monitoring the status of active pool members in F5. If any of the pool member goes down an alert is sent to BPPM. DATABASE MONITORING: SQL database is monitored for multiple parameters and the below one s are used for alerting to keep an eye on the heart of AR systems. Occurrences Incident Polling Cycle Parameters Alarm conditions Alarm Ticket Suspect Database Any Database Yes Immediate Yes 4 hours SQL Server Agent Immediate Yes 15 mins Any Job Failure Yes Job Failures Blocker Procs For any blocking processes if the blocking persists for more than 30 secs. Yes SQL Agent Status When service is Yes down SQL Server Status When service is Yes down Cache Hit Ratio <90 Yes Long Running Trans >300Sec The alert should contain the session id executing the transaction along with the user Yes 10 P a g e

Deadlock name for the session Warning alert for any deadlock. The alert needs to have the session ids of all the sessions that are involved in deadlock. Yes Disk Space Monitoring for Databases: Maintain 50% free storage space for all production DB servers Warning alert threshold set at 25% free space o Email, alerting and escalation Critical alert threshold set at 20% free space o Email, alerting and escalation TMART MONITORING: We are running synthetic transaction by the name HLAL which goes to Homepage, Login, Application Listing and Logout to check if the AR application is working fine and measure any performance degradation. The availability is checked within the data centre and on case to case basis we run the transactions to run from remote data centres. 11 P a g e

ALERTING: Production: HLAL: Availability or Accuracy < 100 % for consecutively 2 cycles. This is done keeping in mind that there should not be an increase in false alerts because of any network glitch, browser or monitoring application related issue. HLAL_perf: Login Response Time > 10 seconds for consecutively 5 cycles. An analysis has been done over multiple ITSM systems and 10 seconds login time have been found to be the benchmark for deciding if the performance is indeed getting worse. We run the script every 2 seconds. Dev & QA: HLAL: Availability or Accuracy < 100 % for consecutively 2 cycles. No performance alerting for Dev and QA URL s. INDIVIDUAL AR SERVER MONITORING USING AR SERVER KM: The Patrol Knowledge Module for AR Server is used to monitor the individual AR Server availability. This is configured in case of AR Server group is implemented. The KM uses Java based drivers to connect to the individual AR Server. The KM detects basic performance and availability of the AR Server. DEV & QA ENVIRONMENT MONITORING: Development and Quality Assurance environments are monitored for Availability using TMART and only, Disk utilization and McAfee Services are monitored in BPPM. Availability of URL s is monitored using the TMART transaction HLAL which goes to the Homepage, do a Login, does the Application Listing and finally Logout. 12 P a g e

BPPM is used only to monitor the Disk Space and McAfee related services. Following parameters are monitored. Windows OS Monitoring Occurrences Incident Ticket Polling Cycle Parameters Thresholds Logical Disks [Free space%] Major Event Critical Event D: < 15% < 10% Immediate Yes 2 mins C: < 15% < 10 % Immediate Yes 2 mins Services Status (Up/down) Major Event Critical Event McAfee Framework Service - Service down McAfee McShield - Service down McAfee Task Manager - Service down Acknowledgements With thanks to: Franco Ferrero Bob Mosely Nick Goff Theodore Cory 13 P a g e

1. For each of the monitoring layers outlined below we want to know the specific monitoring targets and their default thresholds a. Mid-tier/application tier i. BMC APM- Application diagnostics What app threshold do you monitor for? ii. BPPM for Internet Servers (monitors the web server, such as Apache, Microsoft IIS etc.) Which Apache thresholds? JMX monitoring points? Etc.. b. Remedy Applications (i.e. Incident Management, Change Management etc.) i. BMC PATROL Knowledge Module for Remedy AR Server What specific things is the KM for Remedy AR Server monitoring? We want the details and default thresholds please KM monitors AR Application status and AR Server Statistics. As of now there is no thresholds set. Metrics of AR Server Statistics c. Back-end database (Oracle, Sybase, MS SQL Server etc.) i. BPPM for Databases (monitors all databases that Remedy supports) We use Oracle Enterprise Manager (OEM) for DB monitoring. We want to ensure we have the default Oracle DB monitoring points and thresholds provided to ensure we sync them Database monitoring include availability of database, tablespace usage, also database related filesystems for utilization d. Operating system where remedy is running on (i.e. Windows, Linux, VMs etc.) i. BPPM for Servers/Virtual Servers (monitors health of the system, like CPU, Memory, Disk, Remedy processes/services, logs) Again, what Remedy processes, services, and logs are monitored, what are the default thresholds OS monitoring for Windows including Total CPU, % of Memory used, Disk Freespace. Processes include arcmdbd,armonitor,arplugin,arrecond,arserver,arsvcdsp,slmbrsvc,slmcollsvc. arerror log is monitored for plugin errors. Default OS threshold for Windows 14 P a g e

Apart from this, we use a custom Patrol KM to create blackout for Change suppressions. 15 P a g e