Ein Unternehmen stellt sich vor. Nagios in large environments



Similar documents
Maintaining Non-Stop Services with Multi Layer Monitoring

Best of Breed of an ITIL based IT Monitoring. The System Management strategy of NetEye

Comprehensive Monitoring of VMware vsphere ESX & ESXi Environments

Agenda. Capacity Planning practical view CPU Capacity Planning LPAR2RRD LPAR2RRD. Discussion. Premium features Future

Features Overview Guide About new features in WhatsUp Gold v14

FUNCTIONAL OVERVIEW

ActiveXperts Network Monitor. White Paper

Heroix Longitude Quick Start Guide V7.1

BACKUP BEST PRACTICES FOR A XENSERVER & XENDESKTOP ENVIRONMENT

mbits Network Operations Centrec

syslog-ng 3.0 Monitoring logs with Nagios

Application Discovery Manager User s Guide vcenter Application Discovery Manager 6.2.1

Network Monitoring Comparison

WhatsUp Gold v11 Features Overview

Network Management Deployment Guide

EMC Data Protection Advisor 6.0

Remote Network Monitoring Software for Managed Services Providers

TPAf KTl Pen source. System Monitoring. Zenoss Core 3.x Network and

SEP Disaster Recovery and Backup Restore: Best

OpenITSM - IT Service Management with Open Source Software

Network Monitoring. Sebastian Büttrich, NSRC / IT University of Copenhagen Last edit: February 2012, ICTP Trieste

WhatsUpGold. v3.0. WhatsConnected User Guide

Running custom scripts which allow you to remotely and securely run a script you wrote on Windows, Mac, Linux, and Unix devices.

System Management Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Optimization of QoS for Cloud-Based Services through Elasticity and Network Awareness

Stratusphere UX Prerequisites & Preparation Overview. Stratusphere Requirements Stratusphere Hub Appliance (SHA)... 2

How To Get Started With Whatsup Gold

WhatsUp Gold v11 Features Overview

EventSentry Overview. Part I About This Guide 1. Part II Overview 2. Part III Installation & Deployment 4. Part IV Monitoring Architecture 13

OpenITSM - IT Service Management with Open Source Software

Network Management & Monitoring Overview

Smart Business Architecture for Midsize Networks Network Management Deployment Guide

SEP Disaster Recovery and Backup Restore: Best Practices

Solarwinds Training Standard, Pro & Expert

Monitoring Windows Servers and Applications with GroundWork Monitor Enterprise 6.7. Product Application Guide October 8, 2012

WhatsUp Gold 2016 Getting Started Guide

Capacity planning for IBM Power Systems using LPAR2RRD.

Summer Webinar Series Network Monitoring Probe Virtual Appliance

IBM. Vulnerability scanning and best practices

The Check_MK monitoring system. Open Source Days 2016, Copenhagen Speaker: Troels Arvin Slides:

A SURVEY ON AUTOMATED SERVER MONITORING

Monitoring of computer networks and applications using Nagios

The syslog-ng Premium Edition 5F2

Network Monitoring with SNMP

PTC System Monitor Solution Training

Network Monitoring with SNMP

PANDORA FMS NETWORK DEVICE MONITORING

Availability and Integrated Management. Monitoring

Monitor the Cisco Unified Computing System

ManageEngine (division of ZOHO Corporation) Infrastructure Management Solution (IMS)

opensm2 Enterprise Performance Monitoring December 2010 Copyright 2010 Fujitsu Technology Solutions

Comparison Paper Argent vs. Nimsoft

LDAPCON Sébastien Bahloul

PRTG Training Standard, Pro & Expert

vsphere Client Hardware Health Monitoring VMware vsphere 4.1

PANDORA FMS NETWORK DEVICES MONITORING

WHITE PAPER September CA Nimsoft Monitor for Servers

Application Note. Cacti monitoring. Document version: v1.0 Last update: 8th November 2013

simplify monitoring Environment Prerequisites for Installation Simplify Monitoring 11.4 (v11.4) Document Date: January

HP SiteScope software

Installing and Using the vnios Trial

DEPLOYMENT GUIDE. Websense Enterprise Websense Web Security Suite TM. v6.3.1

SENTINEL MANAGEMENT & MONITORING

Remote Monitoring Unit SC8100. Monitoring Unit SC8100

BlackBerry Enterprise Server for Microsoft Exchange Version: 5.0 Service Pack: 2. Feature and Technical Overview

Integrating HP Insight Management WBEM (WMI) Providers for Windows with HP System Insight Manager

Veritas Cluster Server from Symantec

E- SPIN's IPSwitch WhatsUp Gold Network Management System System Administration Advanced Training (5 Day)

WINDOWS SERVER MONITORING

1. INTERFACE ENHANCEMENTS 2. REPORTING ENHANCEMENTS

iphouse has chosen LogicMonitor to offer a Software as a Service (SaaS) monitoring solution.

Network Monitoring With Nagios. Abstract

WhatsUp Gold v16.0 Getting Started Guide

Kaseya Traverse. Kaseya Product Brief. Predictive SLA Management and Monitoring. Kaseya Traverse. Service Containers and Views

whitepaper SolarWinds Integration with 3rd Party Products Overview

The Truth about Agent vs. Agentless Monitoring

Data Driven Success. Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA)

Network Monitoring. Review of Software

WÜRTHPHOENIX NetEye Version 3

Zabbix 3.0. The Simple, the Powerful and the Shiny by Zabbix SIA

Availability Management Nagios overview. TEIN2 training Bangkok September 2005

1. INTERFACE ENHANCEMENTS 2. REPORTING ENHANCEMENTS

MRTG used for Basic Server Monitoring

ANS Monitoring as a Service. Customer requirements

Datasheet FUJITSU Cloud Monitoring Service

Server & Application Monitor

F Cross-system event-driven scheduling. F Central console for managing your enterprise. F Automation for UNIX, Linux, and Windows servers

Centerity Monitor Standard V3.8 USER GUIDE VERSION 7.14

Network Management & Monitoring Overview

[Document Title] SolarWinds Server & Application Monitor (SAM) [Document Subtitle] Angi Gahler. Share: Author: Manish Chacko

Network Monitoring. By: Delbert Thompson Network & Network Security Supervisor Basin Electric Power Cooperative

Goliath Performance Monitor Prerequisites v11.6

New features and highlights

Configuring Dell OpenManage IT Assistant 8.0 to Monitor SNMP Traps Generated by VMware ESX Server

AppDirector Load balancing IBM Websphere and AppXcel

Wait, How Many Metrics? Monitoring at Quantcast

Transcription:

Ein Unternehmen stellt sich vor Nagios in large environments

Agenda About ITdesign Introduction Customer environments and requirements Heterogenous environment How to get data from end systems? 350 Servers (> 25 000 measurements) Optimized plugin design 550+ Routers and Switches ITdesign solution for interface measurement

About ITdesign Consultingcompany founded 2000 as spin off from DEC/COMPAQ located in vienna Total of 38 people working on infrastructure- and softwareprojects Focus on High availibility infrastructure (Novell, Microsoft, VMware, CITRIX) Programming (PERL, C#, JAVA, AS/400 RPG) etc. 5 people (+ 1 external) working on Nagios Profitable every year with a growth rate of 10-20% per year (on people and cash)

Contact information ITdesign Software Projects & Consulting GmbH Anton Freunschlag-Gasse 49 A-1230 Wien Tel.: +43(1)699 33 99-0 Fax: +43(1)699 33 99-33 E-Mail: office@itdesign.at Werner Neunteufl Technical Consultant Mobile +43(664) 230 45 33 werner.neunteufl@itdesign.at

Customer requirements Management Service Level Agreements End2End performance monitoring (e.g. SAP Client) Reports / Statistics Service views no technical details Service Monitoring DON T forget the Management they pay this!

Customer requirements Technical Platform independend - monitor everything including Windows, UX*, IBM host, VMWARE, Applications, Logfiles, etc. Monitoring must not send to much mails! (no Notification overload) Handle vacation / attendance of people Easy maintenance Nice technical view No agents (which could negatively influence end systems)

3 cool Customer environments Heterogenous environment AS/400, VMware ESX 3.0, End to End Application Performance, Unix, Windows, SLA calculation 350 Servers (> 25 000 measurements) monitor Volumes, NLMs loaded, DirXml, Timesync, SLP, LDAP, etc. etc. 550+ CISCO routers and switches Monitor each interface with all properties replace CACTI with Nagios

3 cool Customer environments Heterogenous environment AS/400, VMware ESX 3.0, End to End Application Performance, Unix, Windows, SLA calculation 350 Servers (> 25 000 measurements) monitor Volumes, NLMs loaded, DirXml, Timesync, SLP, LDAP, etc. etc. 550+ CISCO routers and switches Monitor each interface with all properties replace CACTI with Nagios

Heterogenous environment Customer requirements Integration of all end systems AS/400, i5, iseries Business applications End 2 End application performance measurement Environment (UPS, Air condition, etc.) Databases VMware ESX Backup Software HW Montioring (e.h. HP Systems Insight Manager)

Heterogenous environment How to get data? Active Active checks with plugins (e.g. snmp, ssh, WMI) Passive snmp traps (from any device) nsca (e.g. AIX monitoring) mail (e.g. Backup software) ftp (e.g. QSYSOPR messages from AS/400) End 2 End measurement from clients

Heterogenous environment Passive interface design Generic solution for all network transports Customizing on the nagios side and/or on the end system Handle performance data like active plugins Simplify parsing input data Configuration instead of programming Modular design for future extensions

Heterogenous environment Nagios passive Output layer / Interface to performance data XML parser CSV parser TXT parser nsca Transport layer nsca Mail snmp file ftp

Heterogenous environment Example 1: AS/400 Integration Operators do not allow to access the machine in any way! Software running on the AS/400 Read data with IBM's APIs for collecting perfomance data Warning and Critical are set on the AS/400 Transfers data with ftp from the AS/400 to the Nagios machine Passive event interface takes data, processes performance data and sends passive event to nagios

Heterogenous environment Example 1: AS/400 Integration

Heterogenous environment Example 2: End to End performance monitoring Measurement is done on dedicated clients Robot software collects data from applications We convert data into XML and CSV and sent it with mail to the nagios server Passive event interface collects performance data and triggers events into nagios

Heterogenous environment Example 2: End to End performance monitoring

3 cool Customer environments Heterogenous environment AS/400, VMware ESX 3.0, End to End Application Performance, Unix, Windows, SLA calculation 350 Servers (> 25 000 measurements) monitor Volumes, NLMs loaded, DirXml, Timesync, SLP, LDAP, etc. etc. 550+ Routers and switches Monitor each interface with all properties replace CACTI with Nagios

350 Servers (> 25 000 measurements) Problems We never faced such large environments before and had only one nagios server available Host down problem stopped scheduling queue Performance problems everywhere CPU, Network (WAN) WEB view is overloaded Performance data graphs Solution -> design something new / but what?

350 Servers (> 25 000 measurements) Design requirements - general PERL instead of C Use existing PERL know how No embedded PERL (some tests fail) PERL costs performance! Design must compensate this! No modification of nagios source code No modification of nagios plugins Avoid conflicts with upwards compatibility Avoid conflicts with GPL (checked this with a lawyer) Better views (WEB output)

350 Servers (> 25 000 measurements) Design requirements - technical Host down must not stop scheduling queue (no problem anymore because of Nagios 3.0) Plugins are the success factor #1 Plugins cooperate Plugins must reduce network traffic wherever possible Plugins must cache data on disk Performance data must not influence CPU load Graph engine for performancedata must not influencet CPU load

350 Servers (> 25 000 measurements) Example - information from remote system 1 Process running? 10 TCP packets / ~30ms 2 Processes running? 11 TCP packets / ~30ms 3 Proccesses running? 12 TCP packets / ~30ms Conclusio: read more information than you need now and store it on disk (plugin caching)

350 Servers (> 25 000 measurements) Plugins cooperate Nagios Process Nagios Processdisk plugin mem plugin Server properties Nagios output > link to CGI program cache data on disk collect performance data Write performance data to disk

350 Servers (> 25 000 measurements) Online demo of optimized plugins on notebook

350 Servers (> 25 000 measurements) Traditional setup each measurement separately

350 Servers (> 25 000 measurements) ITdesign solution: one plugins does multiple measurements

350 Servers (> 25 000 measurements) Drill down into Operating system details

350 Servers (> 25 000 measurements) Writing and collecting performance data Plugins write current measurement to disk and mark each measurement with an ID Highly optimized scheduled cron job takes all measurement data and stores it into filesystem or SQL Database To avoid huge amount of data only changes (deltas) are stored Example: process availibility of httpstkd

350 Servers (> 25 000 measurements) Graphing performace data RRD databases and graphs are created when the user klicks on the appropriate link We call this feature RRD graphs on demand + CPU load only for a very short time + RRD databases are created on click + Change graphs on the fly (no need to recreate RRD databases) + Graphs do not loose measurement details + Zoom in / out implemented on the server side

350 Servers (> 25 000 measurements) Example: Graphing performace data with zoom in

3 cool Customer environments Heterogenous environment AS/400, VMware ESX 3.0, End to End Application Performance, Unix, Windows, SLA calculation 350 Servers (> 25 000 measurements) monitor Volumes, NLMs loaded, DirXml, Timesync, SLP, LDAP, etc. etc. 550+ routers and switches Monitor each interface with all properties replace CACTI with Nagios

550+ routers and switches Generic solutions do not work because Reduce network traffic is the biggest challange caching data on disk is not enough Execution time is a problem (network polling) Sometimes it s easier to write a special plugin Write an application for reading interfaces via snmp: interface_table.pl plugin

550+ routers and switches No need to know each interface only SNMP community string required / each interface is monitored automatically (plug and play) Warning and Critical can also be set on throughput to recognize link overload Find changes an each interface (e.g. ISDN backup link goes up or remote support from the provider dials in) Could include or exclude interfaces

550+ routers and switches Online demo of optimized plugins on notebook

550+ routers and switches interface_table plugin measures a complete network device

550+ routers and switches interface_table.pl plugin evolved to the most wanted plugin we have because: Some customer use it as inventory and even add on to network documentation Monitoring of one complete device / no more checks required Very short deployment time command line is like interface_table.pl C public H router1 w <> -c <>

3 cool Customer environments Heterogenous environment AS/400, VMware ESX 3.0, End to End Application Performance, Unix, Windows, SLA calculation 350 Servers (> 25 000 measurements) monitor Volumes, NLMs loaded, DirXml, Timesync, SLP, LDAP, etc. etc. 550+ CISCO routers and switches Monitor each interface with all properties replace CACTI with Nagios

Summary Nagios

Question?

Ein Unternehmen stellt sich vor Thanks for your attention