Best of Breed of an ITIL based IT Monitoring The System Management strategy of NetEye by Georg Kostner 5/11/2012 1
IT Services and IT Service Management IT Services means provisioning of added value for customers by being supported in achieving the desired results for them. The customer is not responsible for the direct costs, the applied technology and the risks of service delivery. IT Service Management is a set of specific organizational skills that are used to generate added value for customers in the form of IT services. 5/11/2012 2
ITIL Continual Service Improvements IT Services need to be re-classified and adjusted to the changing business requirements. Metrics Technology: Component / Application Service Oriented: End to End Services Processes: ( IT Service Management ) 5/11/2012 3
System Monitoring with NetEye technology metrics Technology: Component / Application Availability Reliability Performance 5/11/2012 4
Monitoring with NetEye view into a datacenter Service Level Manager DMZ Datacenter Application Manager Facility Manager Business Services Proxy CRM ERP Mail server Mail Service CRM Service Application server LAN Network Admin ERP Service System Admin Server Virtualization Facilities Users Data storage SAN NAS Active directory, DNS, DHCP, File server, Print server
System Monitoring with NetEye technology metrics Check hardware for server HP, Dell, Fujitsu, HP-UX, Solaris, IBM, Super Micro, AS/400 Disks monitoring (Raid Controller, SMART status) Check Terminal Services, Citrix Virtual server monitoring (VMware ESX, Xen, KVM, Hyper-V, ) Monitoring the availability, performance of databases Oracle, MS SQL, MySQL, DB2, PostgreSQL) Verify the I/O load and latency on the disks, SAN, Performance checks for server Windows, Unix, Check the services of Active Directory, DNS, DHCP, SMTP, RADIUS, NTP, Check your systems Over 2500 monitoring plugins Over 250 of Addons 5/11/2012 6
System Monitoring with NetEye technology metrics CPU utilization Exchange 2010 Store I/O Latency Exchange 2010 RPC Latency NTP Peers 5/11/2012 7
IT Service monitoring for email service Alert correlation System Service Monitoring email - IT Service Mail Relay DMZ - Ethernet Real User Monitoring email Firewall Service FW 1 FW 1 Mail AND Relay Host status Disk Space Load Mail Queue Host status Network Monitoring Core Switch SMTP Memory Exchange Server AND CPU Load Paging File Usage Disk space Memory Service: Exchange Exchange Server LAN - Ethernet Active Directory DNS Service System Service Monitoring DNS Service OR Primary Domain Controller Backup Domain Controller 5/11/2012 8
Network Monitoring what you can check with NetEye Network health, latency and bandwidth monitoring point to point, network interface in/outbound Analyze your network Cacti Graphs for in/outbound traffic min., avg., max. values on switch, routers Definition of active/passive checks (SNMP Requests, SNMP Traps) NfSen Netflow SNMP Trap Handler 5/11/2012 9
Network Monitoring with NetEye network metrics Headquarter Latency Bandwidth In/outbound Traffic Network Traffic on Application Level with NetFlow Server Clients Internet Network Probe Switch Port Mirror / Span Port Router Router NetFlow Collector Server WAN Clients Branch Office System Management Solution 5/11/2012 10
Monitor your network detailed graphs Latency Inbound Outbound usage Bandwidth 5/11/2012 11
Network Traffic Monitoring details on packets, bytes and ip/port Network traffic analysis based on protocols Source IP and Destination IP identification Filtering on single TCP / UDP ports Capability of network analyzing on packets, bytes per ip/port 5/11/2012 12
Service Monitoring with NetEye service metrics Service: End to End from the user perspective Availability Reliability Performance - Latency 5/11/2012 13
Real User Monitoring Real User Monitoring (RUM) aims to measure the end user s experience providing data on availability, response time and reliability of the real used IT (eshop, CMS, SAP, Sharepoint, Mail, applications, ) Real User Monitoring availability and reliability are typically verified by active monitoring like intelligent robots simulating user interactions Users experience Real user interactions simulation to check availability and reliability Application latency monitoring to check the response time Real User Monitoring response time is typically "passive monitoring, i. e. the RUM device collects HTTP(S) traffic without any impact on the deployed applications (no trace, no debug, no performance impact) 5/11/2012 14
Monitoring and the Cloud.. services getting more important Facility Service Level Manager DMZ Datacenter Application Manager Manager Business Services Proxy CRM ERP Mail server Mail Service CRM Service Application server LAN Network Admin ERP Service System Admin Server Virtualization Facilities Users Data storage SAN NAS Active directory, DNS, DHCP, File server, Print server
RUM with WÜRTHPHOENIX NetEye NetEye provides Real User Monitoring through: 1. WebInject: automated testing of web applications and web services 2. Watir Webdriver or Selenium HQ: automated testing of web applications and web services on a deeper level 3. Virtual User: automated testing tool that allows a computer to emulate a human user, performing actions, such as clicking the mouse and typing keys 4. Application Latency Monitoring: measure the response time of each user transaction analyzing the communication performance to get key performance indications: Network Latency Application Server Latency Users experience Watir WebDriver 5/11/2012 16
Latency Latency Future monitoring targets What is causing slow performance? Network or Application? Protocol visibility HTTP - HTTP(S) DNS. User sends a request Firewall Latency Proxy Future monitoring requirements: Protocol Recognition extends NetFlow The measurement of the latency on protocol level ntop computes the network KPM to be monitored by NetEye Firewall Latency CRM Server Database Protocol visibility Oracle MS SQL Server MySQL Protocol visibility iscsi, FCoE SAN NAS 5/11/2012 17
Real End User Experience Monitoring the approach of WÜRTHPHOENIX NetEye NetEye aims to provide real end user experience monitoring thanks to KPM metrics from nprobe: Users experience Application latency monitoring measures the response time of each user transaction analyzing the network communication to receive the key performance indicators Client network latency Server network latency Application server latency Client page load time TCP retransmissions TCP fragmented Transmitted bytes Transmitted packets 5/11/2012 18
Real End User Experience Monitoring how to calculate the response time System Admin NetEye WAN Application Server Router Router nprobe Client LAN LAN SERVER LATENCY CLIENT LATENCY 1 ms 20 ms Http Request APPLICATION LATENCY NETWORK LATENCY 29 ms 21 ms Total response time 50 ms Http Request 5/11/2012 19
Alerts generated on latency deviation how to record the baselines The system runs in normal network and application conditions for some days to record the baselines The system calculates the average client/server/application latency based on the requests in the defined period At this point a periodic check runs (i. e. every 5 minutes) comparing the average latency to the relative baselines Warning and critical alerts are generated based on customizable thresholds percentage Minimum and maximum watermarks can be configured to create reasonable statistics (i. e. if the average latency is very low (5ms), the percentage will not a reliable mechanism for the check) Baseline definition Calculation of the average Client/ Server/Application latency Warning and critical notifications based on thresholds percentage 5/11/2012 20
Real End User Latency Monitoring recorded baselines Monitoring metrics for each Application 5/11/2012 21
Real End User Experience Monitoring latency indicators aggregated by locations Application LAN User IP aggregated by clients 5/11/2012 22
Real End User Experience Monitoring latency indicators aggregated by applications Drill down to URL details 5/11/2012 23
Real End User Experience Monitoring real OS and browser utilization 5/11/2012 24
Real End User Experience Monitoring real type content and return codes 5/11/2012 25
Statistics on real end user experience 5/11/2012 26
ITIL Process Monitoring with NetEye process metrics in IT Service Management Metrics in IT Service Management Reachability of service desk First response, solution time Distribution per priority: major, minor, normal Number of incidents, problems per time period Success audits for problems Time to implement standard changes Response based on knowledge database Simplify service desk activities with Action Launchpad App in NetEye Open Source Tools Addons to extend and customize process flows Asset management for software / hardware License management 5/11/2012 27
Incident life cycle Incident resolution Monitoring and Event Management Incident Management CMDB Alert generation 1 Acknowledge 2 3 Automatic ticket creation (Interruption of re-notifications and escalations) Ticket assignment / open 4 CI Monitoring 5 Issue identification and evaluation 7 Recovery: Alert status set to OK 6 Ticket resolution and closure Asset MGMT 28
Hierarchical escalation OTRS follows the ITIL standards User Incident Management Problem Management Service Desk Service requests Processing requests SMS Experts group A WEB Management NetEye 1 Level Support 2 Level Support 3 Level Support Experts group B Email Phone Call Reply Solution traceability Experts group C FAQ consultation Functional escalation Service Level Agreement measurement 29
IT System Management with NetEye get the right metrics to improve IT services Thanks for your attention! 5/11/2012 30