Advanced System Monitoring



Similar documents
Automated System Monitoring

How To Monitor Your Computer With Nagiostee.Org (Nagios)

Maintaining Non-Stop Services with Multi Layer Monitoring

There are numerous ways to access monitors:

Availability Management Nagios overview. TEIN2 training Bangkok September 2005

Features Overview Guide About new features in WhatsUp Gold v14

Nagios. Contents. From SME Server. Maintainer

DEPLOYMENT GUIDE Version 1.0. Deploying the BIG-IP LTM with the Nagios Open Source Network Monitoring System

Using WhatsUp Gold VoIP Monitor About, configuring, installing, and using the VoIP monitor features in WhatsUp Gold

MONITORING EMC GREENPLUM DCA WITH NAGIOS

Introduction to system monitoring with Nagios, Check_MK and Open Monitoring Distribution (OMD)

W3Perl A free logfile analyzer

Monitoring MySQL. Presented by, MySQL & O Reilly Media, Inc. A quick overview of available tools

SIG-NOC Meeting - Stuttgart 04/08/2015 Icinga - Open Source Monitoring

Using WhatsUp Gold VoIP Monitor About, configuring, installing, and using the VoIP monitor features in WhatsUp Gold

WhatsUp Gold v11 Features Overview

Strengths and Limitations of Nagios as a Network Monitoring Solution

Ein Unternehmen stellt sich vor. Nagios in large environments

Incremental Backup Script. Jason Healy, Director of Networks and Systems

ICINGA2 OPEN SOURCE MONITORING

Features Overview Guide About new features in WhatsUp Gold v12

Network Monitoring. Review of Software

How To Run A Powergen On A Network With Nagios (Networking) On A Microsoft Powergen (Netware) On Your Computer Or Network With A Network (Network) On An Ipnet (

Fluke Networks NetFlow Tracker

GestióIP IPAM v3.0 IP address management software Installation Guide v0.1

Deploying the BIG-IP LTM with the Cacti Open Source Network Monitoring System

Supermicro Server Monitoring with SuperDoctor 5 and Nagios Using SNMP Protocol. Version 1.1b

NetApp Storage System Plug-In for Oracle Enterprise Manager 12c Installation and Administration Guide

EMC Unisphere for VMAX Database Storage Analyzer

5-Bay Raid Sub-System Smart Removable 3.5" SATA Multiple Bay Data Storage Device User's Manual

Installation Manual for Grid Monitoring Tool

GUARD1 PLUS SE Administrator's Manual

VX Search File Search Solution. VX Search FILE SEARCH SOLUTION. User Manual. Version 8.2. Jan Flexense Ltd.

2-Bay Raid Sub-System Smart Removable 3.5" SATA Multiple Bay Data Storage Device User's Manual

WebSphere Business Monitor

KonyOne Server Installer - Linux Release Notes

RRDtool. Tobi Oetiker Sponsors: AboveNet CAIDA ETH Zurich RRDtool NANOG Nr.

IceWarp to IceWarp Server Migration

PANDORA FMS OFFICIAL TRAINING

SciTools Understand Flavor for Structure 101g

SCF/FEF Evaluation of Nagios and Zabbix Monitoring Systems. Ed Simmonds and Jason Harrington 7/20/2009

Server Management Tools (ASMT)

FITB. Network Graphing Done Right. Laurie Denness

MONyog White Paper. Webyog

Tobias Oetiker. OSCON 2005 Portland, Oregon, August 3, 2005

CN=Monitor Installation and Configuration v2.0

Network and Server Statistics Using Cacti

Predictive Analytics Client

Laserfiche Hardware Planning and Specifications. White Paper

A recipe using an Open Source monitoring tool for performance monitoring of a SaaS application.

TIBCO ActiveMatrix BusinessWorks Plug-in for TIBCO Managed File Transfer Software Installation

OpenGeo Suite for Linux Release 3.0

Network and Server Statistics Using Cacti

Nimsoft Monitor. sysloggtw Guide. v1.4 series

Oracle WebLogic Server

CAREN NOC MONITORING AND SECURITY

AfNOG Monitoring of IP Services. Ayitey Bulley Material generously borrowed from the NSRC NME course

Newton Linux User Group Graphing SNMP with Cacti and RRDtool

FreeForm Designer. Phone: Fax: POB 8792, Natanya, Israel Document2

Online Help StruxureWare Data Center Expert

Evaluation of standard monitoring tools(including log analysis) for control systems at Cern

Notes on network monitoring, by Oliver Gorwits

How Comcast Built An Open Source Content Delivery Network National Engineering & Technical Operations

Watch your Flows with NfSen and NFDUMP 50th RIPE Meeting May 3, 2005 Stockholm Peter Haag

PageR Enterprise Monitored Objects - AS/400-5

Pcounter Web Report 3.x Installation Guide - v Pcounter Web Report Installation Guide Version 3.4

Abstract. For notes detailing the changes in each release, see the MySQL for Excel Release Notes. For legal information, see the Legal Notices.

Getting Started with ESXi Embedded

Assets, Groups & Networks

Healthstone Monitoring System

/ Administrator Training PAT-2012

Toad for Oracle Installation Guide

Sisense. Product Highlights.

nfdump and NfSen 18 th Annual FIRST Conference June 25-30, 2006 Baltimore Peter Haag 2006 SWITCH

Embarcadero Performance Center 2.7 Installation Guide

The Nagios check_logfiles plugin helps you monitor your logfiles even if the logs rotate and change names.

System Monitoring Using NAGIOS, Cacti, and Prism.

Database Administration

Deploying Microsoft Operations Manager with the BIG-IP system and icontrol

Using SolarWinds Orion for Cisco Assessments

Open Source Monitoring

Dell Active Administrator 8.0

OnCommand Unified Manager

IGEL Universal Management. Installation Guide

NASA Workflow Tool. User Guide. September 29, 2010

Configuring and Monitoring Hitachi SAN Servers

SolarWinds Technical Reference

StreamServe Persuasion SP5 Ad Hoc Correspondence and Correspondence Reviewer

Tivoli Log File Agent Version Fix Pack 2. User's Guide SC

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Using The Paessler PRTG Traffic Grapher In a Cisco Wide Area Application Services Proof of Concept

Table of Contents. Introduction...9. Installation Program Tour The Program Components...10 Main Program Features...11

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

User Manual. Call Center - Statistics Application

Installing Platform Product Suite for SAS (Windows)

Network Probe User Guide

Transcription:

Advanced System Monitoring with Nagios, PNP4Nagios and NConf Josh Malone Systems Administrator National Radio Astronomy Observatory Charlottesville, VA

is great It checks your servers It tells you when there are problems

But Services keep expanding

We work in larger teams We all want to work on things at the same time

We all want to work on things at the same time Management demands data

You need the right tools

We Need to Engineer a Monitoring Solution That Goes to 11!

The Right Addons PNP4Nagios Graph the data from your service checks https://github.com/lingej/pnp4nagios https://docs.pnp4nagios.org/pnp-0.6/ 6

The Right Addons PNP4Nagios Graph the data from your service checks https://github.com/lingej/pnp4nagios https://docs.pnp4nagios.org/pnp-0.6/ 6

The Right Addons PNP4Nagios Graph the data from your service checks https://github.com/lingej/pnp4nagios https://docs.pnp4nagios.org/pnp-0.6/ 6

The Right Addons PNP4Nagios Graph the data from your service checks https://github.com/lingej/pnp4nagios https://docs.pnp4nagios.org/pnp-0.6/ NConf Web-based Nagios configurator http://www.nconf.org/dokuwiki/doku.php https://github.com/nconf/nconf 7

The Right Addons PNP4Nagios Graph the data from your service checks https://github.com/lingej/pnp4nagios https://docs.pnp4nagios.org/pnp-0.6/ NConf Web-based Nagios configurator http://www.nconf.org/dokuwiki/doku.php https://github.com/nconf/nconf 7

The Right Addons PNP4Nagios Graph the data from your service checks https://github.com/lingej/pnp4nagios https://docs.pnp4nagios.org/pnp-0.6/ NConf Web-based Nagios configurator http://www.nconf.org/dokuwiki/doku.php https://github.com/nconf/nconf 7

The Right Plugins Online plugin repositories Nagios Exchange Icinga Exchange Monitoring Plugins But. if you want something done write Write it yourself! and write it RIGHT! 8

PNP4Nagios Performance Data + Graphing

Nagios Performance Data Check plugins can optionally return performance data ( perfdata ) Perfdata is just any metric associated with a check Response time (seconds, ms) Web page size (bytes, kb) Network throughput (bits/sec, kb/sec, mb/s) Room temperature (F, C) 10

Perfdata Output./check_ping -H 184.6.0.1 -w 100,2% -c 200,5% PING OK - Packet loss = 0%, RTA = 56.56 ms rta=56.563000ms;100.000000;200.000000;0.000000 pl=0%;2;5;0 All output is on STDOUT 11

Perfdata Output./check_ping -H 184.6.0.1 -w 100,2% -c 200,5% PING OK - Packet loss = 0%, RTA = 56.56 ms rta=56.563000ms;100.000000;200.000000;0.000000 pl=0%;2;5;0 All output is on STDOUT Vertical bar separates screen output from performance data 11

12

Support By Plugins Not all plugins report performance data Some plugins require a command-line flag to activate perfdata output Some plugins output things that could be perfdata but they do it in the screen output Wrap these plugins in a script to parse screen output and reformat it as proper perfdata 13

Performance Data Handling Nagios does not natively do much with performance data Perfdata must be passed to an add-on for it to be useful Nagios comes with sample commands for processing perfdata process-host-perfdata process-service-perfdata 14

Getting Perfdata into PNP misccommands.cfg - redefine perdata commands define command { command_name process-service-perfdata command_line /usr/localł/nagios/libexec/ process_perfdata.pl } define command { command_name process-host-perfdata command_line /usr/localł/nagios/libexec/ process_perfdata.pl -d HOSTPERFDATA } 15

Understanding RRDs RRD is a Round Robin Database Data in an RRD is stored as sets of averages 1 minute, 5 min, 15 min, 1hr, 6 hr, 12 hr, etc. File never grows, but resolution is lost with time Maximum time to hold data is set when the RRD is created (number of slots for each time bin ) PNP4Nagios holds enough data for 4 years by default 16

Multi-value graphs Graphs can overlay multiple values from one RRD 17

Multi-value graphs Graphs can overlay multiple values from one RRD 18

Multi-value graphs Graphs can overlay multiple values from one RRD 19

Perfdata Processing Modes Easy Synchronous Not as Easy Bulk mode The PNP processor is invoked after each and every service check RRDs are updated immediately after each service check Number of perl execs can cause high load Perfdata is accumulated in a flat file after each service check PNP processor is called every 30 seconds and handles all data from file Reduced PNP load 20

Increase Graph Data Age PNP4Nagios shows graphs out to 1 year by default 21

Increase Graph Data Age PNP4Nagios shows graphs out to 1 year by default The default RRDs hold data for 4 years All that s missing is some links for older data Defined in the $views array in config_local.php $views[] = array( title => Two Years, start => (3600*24*740) ); Days in 2 years 22

Increase Graph Data Age 23

Using PNP4Nagios

PNP4Nagios Overview

PNP4Nagios Menus Switch to a different host right from PNP screen Select date range Create PDF export 26

Using the Basket Basket can be used to combine graphs from multiple hosts into a single page Use in combination with PDF export to generate printable/mailable summaries for others Management, vendors, etc 27

Using the Basket

Templates Templates define how the perfdata is displayed PNP4Nagios looks for a template with the same name as the check command Falls back to a default if not found Define how to present values from the RRDs Written in PHP so you can do any kind of processing you like (scaling, coloring, etc.) 29

Using templates to tune graphs 30

Using templates to tune graphs Define command line options to rrdtool $opt[$key] = -X 0 --height 200 --vertical-label foo --title Graph Title Tells rrgraph not to power-scale the Y axis, sets Y axis label and graph title and makes graphs taller Divide a value by 1024 and call the result gb $def[$key].= CDEF:gb=var1,1024,/ ; Converts MB to GB 30

Using templates to tune graphs 30

NConf Web-based GUI configurator

32

33

NConf Web-based GUI configurator for Nagios Stores config objects in MySQL database Generates Nagios config files from DB for deployment to Nagios servers Deployment is scriptable (SCP, rsync, etc.) NConf need not run on the Nagios server itself 34

Installation: Pre-requisites MySQL with InnoDB OS packages apt-get install libdbi-perl php5-mysql gcc yum install perl-dbi perl-dbd-mysql PHP short_open_tags = On register_globals = Off magic_quotes_gpc = Off 35

Install Un-tar files into web server document area config/mysql.php for database server/user/pass config/authentication.php - AD, sql, file or basic auth config/deployment.ini - How to deploy conf files to Nagios instance 36

Local Deployment [local deployment] type = local source_file = /etc/nconf/output/nagiosconfig.tgz target_file = /etc/nagios action = extract reload_command = sudo /etc/init.d/nagios reload 37

Importing Existing Configs NConf can import existing config files, but the process must be done in multiple steps Each type of object (hosts, services, commands, contacts, etc.) must be imported separately and in the correct order (contacts before contact groups) Nagios object cache lists all objects sorted by type See the Import Guide 38

Extending the Schema Some Nagios configuration attributes aren t supported by NConf out of the box Luckily, the configuration schema/data model used by NConf is extensible Administration Attributes Add Back up your database before changing the schema! 39

Extending the Schema Back up your database before changing the schema! 39

Extending the Schema contacts Contacts People to notify about this host host assign-many contact 40

Extending the Schema 41

Check Plug-Ins

Must-have plugins check_openmanage - Monitor Dell servers with OMSA 43

Must-have plugins check_openmanage - Monitor Dell servers with OMSA 43

Must-have plugins check_netappfiler.py Old, but still works great Uses SNMP, compatible with OnTap 7-Mode Comes with PNP templates https://github.com/wampire/check_netappfiler 44

Must-have plugins 45

Must-have plugins check_logfiles https://github.com/lausser/check_logfiles Scans logfiles for patterns indicating Warning, Critical or OK states Handles rotated logfiles Detects recovery strings as well Can use external config files for complex checks 46

Must-have plugins check-cisco.pl Cisco router / switch CPU, PSU, temp https://github.com/ranl/monitor-utils Synology status (check_snmp_synology) Check health, RAID, disk temps, storage Available on Nagios exchange 47

Writing Check Plug-ins Have no fear - Write exactly the plugin you need

Custom Plugins Nagios can monitor anything you can write a script to check Simple API You can write plugins in ANY language you choose! bash, python, tcl, expect perl (Nagios has embedded perl interpreter for speed) C, C++ 49

Plugin API Exit code determines check state 0 - OK 1 - Warning 2 - Critical 3 - Unknown Stdout is for human-readable notices; ignored by Nagios Perfdata written on stdout, after vertical bar Multiple lines allowed - up to 4 kb http://nagios.sourceforge.net/docs/3_0/pluginapi.html 50

Writing plugins in Perl Nagios provides utils.pm Provides %ERRORS hash Maps status names to exit codes $ERRORS{ CRITICAL } You can use my template as a starting point https://github.com/48kram/nagios-plugins/tree/ master/template Command-line parsing, threshold parsing, output formatting 51

Writing Good Plugins Keep default output short and to the point Suitable for SMS messages, pagers, etc. Easy to parse in a time-critical situation Remember: Nagios should help you fix the problem! Call external binaries by their full path Make it configurable on the cmdline or at the top of the script in a variable 52

Writing Good Plugins Watch out for long runtimes or hung processes Perl: Use alarm (standard function) Bash/Sh: Use timeout (coreutils) Avoid temp files in case your disk is full, out of file handles, etc. Validate your cmdline arguments Is it legal for warn to be higher than crit? Are numeric arguments really numeric? 53

Writing Safe Plugins Nagios s embedded Perl (epn) requires special care Plugins must work under use strict Perl should be run with -w Close all opened files (epn never exits) Initialize all variables before using them (epn caches) Don t use global variables in subroutines 54

When to Use Unknown Unknown is a special exit status in Nagios for when an error occurred in the plugin itself. Missing Perl module or client binary, etc. Also illegal command line options Do not use Unknown to indicate that the service is in an unknown state, hostname unknown, etc. Use Warning or Critical for this state because the service is not Ok! 55

Minimal Nagios Check Plugin #!/usr/bin/perl -w # Check runtime on APC Symmetra UPS use Net::SNMP qw (ticks_to_time); use lib qw (. /usr/lib/nagios/libexec ); use utils qw(%errors); use Getopt::Long qw(:config no_ignore_case); GetOptions( H host=s => \$host, C community=s => \$community, w warning=s => \$warn, c critical=s => \$crit ) or print_help(); if( $warn <= $crit ) { printf( Error: Warning must be > critical!\n ); exit $ERRORS{ UNKNOWN }; } 56

Minimal Nagios Check Plugin $runtimeoid=.1.3.6.1.4.1.318.1.1.1.2.2.3.0 ; ($s, $error) = Net::SNMP->session( -hostname => $host, -community => $community, -timeout => 10, -version => 1, -translate => [ -timeticks => 0x0 ] ) or do { print SNMP Error: $error; exit $ERRORS{ UNKNOWN }; }; $res = $s->get_request( -varbindlist => [$runtimeoid] ); # Check for errors in request here - omitted $runminutes=$res->{$runtimeoid}/100/60; 57

Minimal Nagios Check Plugin # Begin plugin logic $status= OK ; if ($runminutes <= $warn) { $status= WARNING ; } if ($runminutes <= $crit) { $status= CRITICAL ; } $screenout=sprintf( %s: %d minutes runtime, $status, $runminutes); $perfdata=sprintf( runtime=%dminutes;%d;%d, $runminutes, $warn, $crit ); print $screenout $perfdata\n ; exit $ERRORS{$status}; 58

Minimal Nagios Check Plugin : root@host;./check_apc_run -H 10.1.63.34 -C public -w 40 -c 20 OK: 64 minutes estimated runtime runtime=64minutes;40;20 : root@host; echo $? 0 59

One Final Word

Only You Can Change the Culture of Systems Administration No service is truly production-ready until it is Acceptance-tested Backed-up Monitored Documented 61

Credits Nagios, the Nagios logo, and Nagios graphics are the servicemarks, trademarks, or registered trademarks owned by Nagios Enterprises. APC and Symmetra are registered trademarks of American Power Conversion Corporation. This project is not affilliated with American Power Conversion Corporation. People image by netalloy. Public Domain. Courtesy openclipart.org Clock images by hypocore. Public Domain. Courtesy openclipart.org Profit Chart Curve by simpletutorials.net. Public Domain Tools image by sev. Public Domain. Courtesy openclipart.org Some images by unknown authors taken from http://clipart-finder.com/ 62