Whitepaper on Business Service monitoring approach - Harish Jadhav Page 1 of 15
Copyright Copyright 2013 Tecknodreams Software Consulting Pvt. Ltd. All Rights Reserved. Restricted Rights Legend This document may not, in whole or in part, be copied photocopied, reproduced, translated, or reduced to any electronic medium or machine readable form without prior consent, in writing, from Tecknodreams. Information in this document is subject to change without notice and does not represent a commitment on the part of Tecknodreams. Page 2 of 15
Introduction Performance monitoring helps organizations to accurately monitor all aspects of IT infrastructure, identify the root cause of problems and notifies IT infrastructure administrator. Traditionally IT administrators focused on monitoring in silos. i.e. server monitoring, network monitoring, etc. Increasing business dependency over IT is necessitating the monitoring of all components which impacts business which includes server, network, devices, applications, mission critical services, etc. Business need IT performance monitoring is a very important area for an IT manager who wants to monitor their organization IT infrastructure. This discipline has always been an area of high cost and high demands on the IT infrastructure administrator as there have always been a number of conflicting standards. Managed units have not been compatible with each other causing the IT infrastructure administrator to use multiple different management application to be able monitor the IT components involved in IT organization. This situation has necessitated a need of a single, simple management application, which can support monitoring of all kinds of elements in an IT organization. To assure business-it availability Servers, Operating systems, network devices, IT services and applications should be proactively monitored round the clock and corrective actions should be taken upon any issues. Advantages/Gains to Organization Business Continuity No down time due to proactive monitoring. Deal proactively with bottlenecks before they affect uptime or server response Capacity Planning - Data about system performance helps for benchmarking, evaluating upgrades and helps to identify the under utilized resources and also helps in capacity planning and hardware acquisition. Alerts/Alarms Automated alerts over multi channel delivery(email/sms), etc. helps the business owners to get notified about the threshold violations and avoid potential downtime/issues. Trend analysis/reports - Monitored values are typically made available for trend analysis with real-time and historical data for analysis and planning purpose. Types of monitoring Performance monitoring systems can collect data from the target system/ device/ application in three ways - agent-based, agentless and packet based monitoring. Page 3 of 15
Agent-based monitoring In this kind of monitoring, agent which is a specially designed application runs in the devices to be monitored. This agent is responsible for collecting the data and sending it to monitoring system. Agent shall have intelligence to perform threshold analysis and send only the qualified data to the server. Packet based monitoring In this kind of monitoring, monitoring systems gathers all the packets generated by devices and make a analysis on the packets and gives information like bandwidth utilization, connection time etc. But this is again a solution which may not be required by small network segment players. Agentless Monitoring In this kind of monitoring, installation of proprietary agents on the target system is not required. The monitoring application makes use of the native technology supported by the server/device. i.e. Monitoring applications may support WMI, SNMP, WBEM, IntelAMT, etc. for data gathering purpose. For example if the server runs Microsoft windows operating system, WMI agents will be available as part operating system installation. Agentless monitoring is considered to be an advantageous of all three as it takes advantage of the native management supported and doesn t involve the over-head of proprietary agent installation and management. Monitoring using agent less mechanism Broadly agent less approach can be divided into two types of communication Poll Driven/Active Monitor (Pull Mechanism) This term refers to the general technique of having the one who wants the information ask for it. Manager asks for the data and agent return the required data to Manager. Interrupt Driven/Passive Monitor (Push Mechanism) - This term refers to having a device with information that another needs to know decide to send the information of its own violation. To monitor network entities in proper way, it is very important to have support for both as part of the performance management platforms. Primarily, mission critical components which are to be monitored can broadly be categorized as servers, devices, applications and services. Monitoring Servers: Servers can run on different platforms like Windows, Solaris, and Linux etc. To monitor this kind of heterogeneous natured server clusters, it is recommended taking advantage of native manageability support. Page 4 of 15
Typically, windows server supports WMI (Refer Appendix A for details), so windows server performance data can be gathered using WMI technology. Solaris server supports WBEM (Refer Appendix A for details) which can be used for performance monitoring by management application. Linux servers can be monitored using SSH (Refer Appendix A for details). ESX,XEN servers and Hyper-v servers can be monitored using vijava API s and WMI respectively. Monitoring for operational and health states on servers and storage is very important and this can be monitored with WMI, SNMP, WBEM SMASH, IPMI etc. Server hardware vendors expose additional manageability interfaces through enterprise SNMP MIB extension as well. Manageability platform should have an ability to extend their monitoring capabilities to take advantages of vendor extensions as well. In addition, we can leverage IPMI (Refer Appendix A for details) for hardware monitoring like sensor related data even in the absence of operating system by using out-of-band technology. All the server operating systems provide support for SNMP which can be used for monitoring as well. Hence the choice of management technology shall be better left to the CIO/IT manager of the organization based on their preferences and policies and the management platform should have an ability to support both. WMI can be used for application monitoring as well. This native technology has builtin providers which can be used to monitor critical applications like active directory, IIS server, exchange server etc. Fault monitoring can be achieved through event logs which gets generated by windows servers and syslog which gets generated by UNIX flavored servers. SNMP traps generated by servers can help us to find out the faults in the servers. WMI also provides event based monitoring for process, service etc. Monitoring Network Devices: Devices can be of type router, switch, firewall, printer, UPS, load balancers etc. By default these devices supports SNMP native technology. So it is easy to monitor the devices using SNMP. It is very important to understand that for devices just by collecting basic performance will not help rather we should look at collecting the device specific parameters which provides complete view of device health. For example in printer, we should collect the data like printer jammed status, number of pages printed apart from basic performance statistics like CPU utilization, interface utilization etc. As there are many vendors, each vendor can have their own enterprise MIB support, the management application should have provision to support by giving configuration to add the new support whenever new device gets encountered in the environment. In fact, now a day s most of well known device vendors are providing WBEM as the native method for management of devices which helps us to collect rich set of data as compared to SNMP. Well known vendors have given built-in WBEM support for storage arrays, SAN switches etc. Page 5 of 15
Fault monitoring for devices can be achieved through syslog and SNMP traps by exporting these messages and converting them to management application specific events. Monitoring Applications: Application monitoring is critical to assure the health of the applications. Server and device monitoring provide a view of underlying hardware and network infrastructure. Application monitoring takes care of the health of the application which delivers the actual service. Applications get delivered with standard based monitoring interfaces like JMX, SNMP, WMI, etc. Management platforms should make use of these interfaces to get application specific parameters which impact business and monitor the values against the healthy thresholds. Synthetic Transaction Monitoring / User Experience Monitoring: Synthetic transaction monitoring is also referred as user experience monitoring and has emerged as strong candidate to measure the service / application performance. This provides the view of service or application performance as experienced by the end-user. This is nothing but the application performing the user action and measuring the various parameters which matters. Traditionally, monitoring used to be limited to network and servers. However, it didn t provide end user view. The performance bottlenecks, service intermittences were not getting captured in such approaches. IT department and End users had two different opinions about the services delivered. Synthetic transaction monitoring helped the IT department to address this gap effectively. It depends on the services (standard and custom) used by the organizations, management products provide this feature as custom plug-in to extend the capability. Standard services like HTTP, FTP, E-mail, DNS, etc. should be measurable using standard protocols. However, custom business transactions monitoring has to be scripted or operations has to be recorded and played back to measure the user experience. Business Service Monitoring Business service monitoring is an abstract layer which is relatively a new concept. Emergence of cloud computing, hosted data centers, virtualized servers has changed the way, the business is conducted in today s world. Businesses depend on hosted applications or servers to deliver services. CIO s are interested in measuring the availability and health of the services they offer to their customers. Based on the service availability or health, escalations and notifications have to be performed. Business SLA s are tied to the services rather than the physical hardware or application. Page 6 of 15
BSM view enables the administrators to define the services and the dependent components which impact the service delivery. The monitoring platform provides an intelligence to monitor all the dependent component, perform the needed calculation to measure the impact and then raise an alarm upon any service degradation. Engineers who are responsible for a given service can see all the dependent parameters in a single view and make quick decisions about the actions to be performed. SLA s can be defined against the Service health which is the derived parameters and directly reflects the complaints to business commitment. Notifications Management platforms should provide easy and powerful notifications as they carry information to the end-user about the problem. SMS notifications, E-mail notifications are most common today. Some businesses require launching of custom applications as well to perform self-healing or automated issue resolution upon error. So notification engines should provide support to perform more than one action. Matured management products provide the following: - SMS/Email based alerts (Role based to avoid individual contacts) - Launching of applications to achieve self-healing - Trouble ticketing Service desk Integration / Trouble Ticketing Management platforms should provide means to raise trouble ticket upon any faults or threshold violations with appropriate severity. Integrated platforms enable automated ticket creation. Most of the ticketing platforms provide email to ticket creation as well, which can be made use of. This ensures that the troubles/faulty conditions are brought into the attention of support personal with formal support call which has to be closed within the SLA period, thus ensures speedy issue resolution. SapphireIMS Enterprise Suite Enterprise businesses needs a platform that is able to gather statistics which impacts business IT performance in a heterogeneous environment and shows a comprehensive health of business IT activity. SapphireIMS is an enterprise grade IT service management platform, enables the organizations to manage their IT in a comprehensive manner. It adapts agent less monitoring approach as a primary mode of monitoring along with poll and interrupt driven communication. This solution provides combination of performance data which are considered to be important for IT administrator to monitor his/her network using WMI, SNMP, WBEM, SSH, IntelAMT, IPMI, Hyper-v WMI APIs and VIJAVA APIs. This helps the IT administrators to have a feel of unified view of their Enterprise IT. Page 7 of 15
SapphireIMS is capable of collecting performance related parameters, digital asset like inventory and hardware inventory, all kinds of logs generated by different devices, threshold monitoring, alert generation capability and share with administrators/users through service information portal and powerful reports. Web Server Windows Windows - xp Linux Mac OS SapphireIMS provides following features to simplify the IT management of an complex environment. Standard based management avoiding proprietary extensions. Agentless, Offthe-shelf, scalable system. Heterogeneous and unified infrastructure management. Supports Windows, Unix, devices for comprehensive management Availability and Performance monitoring of distributed IT infrastructure including servers, operating systems, network devices, business applications Dashboard view for executives to understand the overall network performance in In quick time Threshold monitoring & Alerts Service desk integration for automated trouble ticketing Powerful reporting engine Page 8 of 15
Sample screen shots have been provided as a reference. Dash-Board View - Performance Monitoring Page 9 of 15
Page 10 of 15
Page 11 of 15
Dash-Board view Event log Page 12 of 15
Report Health Alarms & Notifications SapphireIMS IT service management platform is extendible, scalable and is available in flexible business models which can be easily deployed to manage the IT environments. The exemplary support makes the experience unique. For more details about SapphireIMS suite, please contact sapphire@tecknodreams.com and visit http://www.sapphireims.com Page 13 of 15
Appendix A Simple network management protocol (SNMP) SNMP is essentially a request-reply protocol running over UDP (ports 161 and 162), though TCP operation is possible. SNMP protocol provides required data from hosts using simple request like Get, GetNext and GetBulk to management station on port 161. So it can be considered as PULL mechanism. SNMP usually provides the bulk of the performance and asset metrics needed for switches, routers, firewalls, power equipment, servers and most other infrastructure components. It is considered to be most widely used protocol/framework in network management area. SNMP Traps/Notification Monitoring the traps generated by various devices which have support of SNMP on port 162.The manager should support collection of SNMPV1 traps and SNMP V2 Notifications. This is the event which gets generated from host to manager. So this is considered as PUSH Mechanism. Windows Management Instrumentation (WMI) WMI, designed for enterprises with a significant volume of Windows-based systems, provides a management data of windows specific systems. WMI is the Microsoft implementation of Web-Based Enterprise Management (WBEM), which is an industry initiative to develop a standard technology for accessing management information in an enterprise environment. WMI uses the Common Information Model (CIM - A language independent programming model that uses object oriented techniques to describe an enterprise) industry standard to represent systems, applications, networks, devices, and other managed components. CIM is developed and maintained by the Distributed Management Task Force. Ability to obtain management data from remote computers is what makes WMI useful. Remote WMI connections are made through DCOM.An alternative is to use Windows Remote Management (WinRm) obtains remote WMI management data using the WS- Management SOAP-based protocol. Event Notification By using WMI event notifications, you can monitor the state of any WMI- managed resource and respond to an issue much earlier, perhaps before it is even noticed by your users. WMI notification can be achieved by sending a mail, logging message in application event log, executing script. Web based enterprise management (WBEM) WBEM Web based enterprise management. WBEM is industry wide and is getting more and more reliable information from the UNIX platforms piped into everyday. WMI is just Microsoft version of WBEM. By default this will come as part of OS installation for Solaris and HPUX. For Linux, open source provider is available eg: Page 14 of 15
WBEM SBLIM (Standard based Linux instrumentation for manageability), which has to be installed prior to start of monitoring Linux systems using this standard. The hardware management can be done using WBEM Systems management architecture for sever hardware (SMASH) SSH Secured shell SSH is standard based collections, which is widely known as secured protocol, which makes use of SSHD daemons running in the monitored systems and make use of standard BSD complaint commands to get the required data from remote system. The standard port 22 is used for the connection to SSHD server. The encryption provides confidentiality and integrity of the data transfer between managed system and management system. Syslog This is again kind of event forwarding from all the devices to manager on port 514. This is widely known as Heterogeneous network logging workhouse. In terms of syslog protocol, manager will be called as collector optionally called as syslog daemon or server. Syslog can have roles like device which generates a message, relay which is responsible for collecting and forwarding the message. IntelAMT This is a hardware-based technology for remote management. It can manage PCs anytime even if PC power is off, the OS is inoperable, management agent are missing or hardware failures. Integration into a third-party management solution, PCs with Intel Vpro processor technology will ease the work of administration. IPMI Intelligent Platform Management Interface This is a hardware-based technology for remote management. It can manage servers anytime even if server power is off, the OS is inoperable, management agent are missing or hardware failures. IPMI can used to monitor platform status like system temperatures, voltages, fans, power supplies and chassis intrusion; to query inventory information; to review hardware logs etc. Page 15 of 15