An Introduction to Monitoring with Nagios



Similar documents
Network Monitoring with Nagios. Matt Gracie, Information Security Administrator Canisius College, Buffalo, NY

Availability Management Nagios overview. TEIN2 training Bangkok September 2005

Nagios. cooler than it looks. Wednesday, 31 October 2007

DEPLOYMENT GUIDE Version 1.0. Deploying the BIG-IP LTM with the Nagios Open Source Network Monitoring System

Network Monitoring Systems / Nagios. 2/19/08 Michael Miller e mail: mike.mikemiller@gmail.com


Introduction to system monitoring with Nagios, Check_MK and Open Monitoring Distribution (OMD)

Monitoring a Linux Mail Server

How To Monitor Your Computer With Nagiostee.Org (Nagios)

NETWORK MONITOR. Some high-end network monitoring. Watching your systems with Nagios COVER STORY. What Is Nagios? Installing the Server and Plugins

NRPE Documentation CONTENTS. 1. Introduction... a) Purpose... b) Design Overview Example Uses... a) Direct Checks... b) Indirect Checks...

SapphireIMS Business Service Monitoring Feature Specification

Monitoring of computer networks and applications using Nagios

Expert Power Control NET 4x 8212 / 8213

Monitoring Windows Servers and Applications with GroundWork Monitor Enterprise 6.7. Product Application Guide October 8, 2012

Présentation de Nagios

Tk20 Network Infrastructure

SapphireIMS 4.0 BSM Feature Specification

Nagios Web Service Checker

September 21 st, Ethan Galstad.

Nagios XI - NRPE Troubleshooting and Common Solutions

Nagios and Cloud Computing

How To Install Nagios Network Monitoring Program On A Pc Or Macbook Or Ipad (For Mac) With A Web Browser (For A Mac)

A Brief. Introduction. of MG-SOFT s SNMP Network Management Products. Document Version 1.3, published in June, 2008

Nagios introduction. Dhruba Raj Bhandari (CCNA) Additions by Phil Regnauld.

Document d'installation FAN 2.1

GL254 - RED HAT ENTERPRISE LINUX SYSTEMS ADMINISTRATION III

NetCrunch 6. AdRem. Network Monitoring Server. Document. Monitor. Manage

WhatsUp Gold v11 Features Overview

How To Run Nrpe On Nagios On Windows 7.5 (Windows) On A Linux Computer On A Windows 7 (Windows 7) On An Ubuntu Computer On An Ipad Or Ipad (Windows 8) On Your Pc

GroundWork Monitor Open Source Readme

AusCERT Remote Monitoring Service (ARMS) User Guide for AusCERT Members

Network Management & Monitoring Overview

How To Monitor A Network With Nagios And Other Tools

User Manual op5 Monitor 3.0

Deployment and Monitoring. Pascal Robert MacTI

A SURVEY ON AUTOMATED SERVER MONITORING

MONITORING EMC GREENPLUM DCA WITH NAGIOS

WhatsUp Gold v11 Features Overview

Network Monitoring Tools

F5 Configuring BIG-IP Local Traffic Manager (LTM) - V11. Description

NetIQ Advanced Authentication Framework - MacOS Client

Specialized Programme on Internetworking Design and LAN WAN Administration

Collax Monitoring with Nagios

Tips, techniques and tools for remote monitoring

Network Monitoring. Dhruba Raj Bhandari (CCNA) Manager Systems Soaltee Crowne Plaza Kathmandu NEPAL

Deploying the BIG-IP LTM with the Cacti Open Source Network Monitoring System

MONITORING RED HAT GLUSTER SERVER DEPLOYMENTS With the Nagios IT infrastructure monitoring tool

Task Manager. Tasks. Starting Task Manager CHAPTER

Kaseya 2. User Guide. for Network Monitor 4.1

Network Interface Failover using FONA

Virtual Server and DDNS. Virtual Server and DDNS. For BIPAC 741/743GE

WÜRTHPHOENIX NetEye Version 3

Acronis Monitoring Service

User s Manual PowerPanel Shutdown Service Graceful Shutdown and Notification service to ensure power protection of your computer

Fifty Critical Alerts for Monitoring Windows Servers Best practices

MALAYSIAN PUBLIC SECTOR OPEN SOURCE SOFTWARE (OSS) PROGRAMME. COMPARISON REPORT ON NETWORK MONITORING SYSTEMS (Nagios and Zabbix)

USHA. Notification Setting. User Manual

Linux MDS Firewall Supplement

Do it Yourself System Administration

Detailed Revision History: Advanced Internet System Management (v5.07)

Topics. CIT 470: Advanced Network and System Administration. Why Monitoring? Why Monitoring? Historical Monitoring Processes. Historical Monitoring

A recipe using an Open Source monitoring tool for performance monitoring of a SaaS application.

WEBCONNECT INSTALLATION GUIDE. Version 1.96

VMware vcenter Log Insight Security Guide

Core Solutions of Microsoft Lync Server 2013

Details. Some details on the core concepts:

COURCE TITLE DURATION LPI-202 Advanced Linux Professional Institute 40 H.

Overview of Network Security The need for network security Desirable security properties Common vulnerabilities Security policy designs

Open Source Management Options

PANDORA FMS NETWORK DEVICE MONITORING

EventSentry Overview. Part I About This Guide 1. Part II Overview 2. Part III Installation & Deployment 4. Part IV Monitoring Architecture 13

Grinder in the Cloud. Get Loaded!

Firewall Builder Architecture Overview

Pandora FMS 3.0 Quick User's Guide: Network Monitoring. Pandora FMS 3.0 Quick User's Guide

Command lines displayed below are intended to be single line commands. Service Command Line Plugin Command Line Extended Info. Name=_Total!6000!

Syslog Monitoring Feature Pack

Automated System Monitoring

syslog-ng 3.0 Monitoring logs with Nagios

Network Management & Monitoring Overview

Evaluation of standard monitoring tools(including log analysis) for control systems at Cern

Course Description and Outline. IT Essential II: Network Operating Systems V2.0

OnCommand Unified Manager

Dual-stack IPv4+IPv6 monitoring with Nagios. Teemu Kiviniemi, CSC/Funet 6th June th TF-NOC meeting Dublin, Ireland

Nagios Core Version 3.x Documentation

Apache CloudStack 4.x (incubating) Network Setup: excerpt from Installation Guide. Revised February 28, :32 pm Pacific

Ein Unternehmen stellt sich vor. Nagios in large environments

shortcut Tap into learning NOW! Visit for a complete list of Short Cuts. Your Short Cut to Knowledge

HP Device Monitor (v 1.2) for Microsoft System Center User Guide

Talk Internet User Guides Controlgate Administrative User Guide

WhatsUp Gold 2016 Getting Started Guide

Orientation Course - Lab Manual

Features Overview Guide About new features in WhatsUp Gold v14

Network Management System (NMS) FAQ

Manual. By GFI Software Ltd. GFI Network Server Monitor

Course Outline. Course 20336B: Core Solutions of Microsoft Lync Server Duration: 5 Days

Course Outline. Core Solutions of Microsoft Lync Server 2013 Course 20336B: 5 days Instructor Led. About this Course.

Troubleshooting BlackBerry Enterprise Service 10 version Instructor Manual

Solarwinds Training Standard, Pro & Expert

Transcription:

An to Monitoring with Nagios Laurent Andrey Rémi Badonnel LORIA - INRIA Grand Est ISSNSM 2008, Zurich Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 1/32 Local checks Remote checks Conclusions Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 2/32

References Motivations References www.nagios.org: official distribution (core, plugins and documentation) www.nagiosexchange.org: lots of complementary plugins W. Barth. Nagios, System and Network Monitoring. Open Source Press GmbH, first edition, 2006. ISBN: 1-59327-070-4. Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 3/32 References Motivations What is Nagios? What is Nagios useful for? A widely-used monitoring tool for trouble-shooting Simple and open source Network, system, service levels With a sophisticated (?) notification system to inform administrators when something goes wrong Nagios provides support to administrator(s) for detecting problems before users (including the boss!) Mail server failure Hard drive overload Network outage Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 4/32

Colored area concept Green/Yellow/Red (Ok/Warning/Critical) No performance analysis or display (a priori) Checks using external commands (plugins) Various possibilities for remote checks Possibility for passive checks (from managed resources) Web interface + notifications Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 5/32 config Data Reaping Resource State Calculation Web Interface log Notification System Filters, Contact Delivery NAGIOS Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 6/32

Architecture at run-time Data reaping + notification system= nagios processes Can be run as a service (rcx.d, soft runlevel) Web interface External web server (Apache) Bunch of cgi scripts (part of Nagios) Configuration Simple text files Or a postgres database Logs (local files) Named pipe (unix domain socket) to enable nagios to receive commands (from cgi, passive asynchronous events) Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 7/32 Service, service check Service Service delivered by a software Percentage of free space on a partition Bandwith usage on a network interface... Service check Provides state information on a service Returns a value: OK, WARNING, CRITICAL (exit status 0, 1, 2), UNKNOWN (exit status 3, due to time out or plugin runtime trouble) to reflect the Nagios view about this service Can be local (OS calls) or remote (ICMP, NRPE, SNMP...) Is implemented by a plugin (external command/script) Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 8/32

Service states Service states are the mirror of what nagios observes States: OK, WARNING, CRITICAL, UNKNOWN Transitions from one state to another one based on results provided by checks Critical and warning states are shadowed by related soft states A service goes first to a soft state Attempt count mecanism to reach a definitive hard state User notification can only occur when hard states are reached Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 9/32 Service state diagram ok ac 1 OK warning ac ++ ac < mca ok ; ac 1 ok ; ac 1 critical ac ++ ac <mca ok ; ac 1; recovery warning ac ++ ac < mca warning ac ==mca problem warning Soft Warning Hard Warning warning ac ==mca problem critical warning ac ++ ac ++ warning critical ac <mca ac <mca critical ac ==mca problem Soft Critical Hard Critical critical ac ++ ac < mca critical ac ++ ac ==mca problem critical ok ; ac 1; recovery Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 10/32

Service state diagram legend HS hard state, using normal check interval between 2 checks S soft state, using retry check interval between 2 checks RS transition triggered by a check with a return status of RS {ok,warning,critical ac++ Attempt Count is incremented when transition is triggered Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 11/32 Service state diagram legend (ctd) ac<mca transition is triggered if Attempt Count is smaller than the service configuration attribute: max check attempts ac==mca transition is triggered if Attempt Count is equal to the service configuration attribute: max check attempts ac 1 Attempt Count is set to 1 then the transition is triggered notification a user notification {problem, recovery is generated when this transition is triggered Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 12/32

Object-oriented representation A nagios object describes a specific unit: a service, an host, a contact, a contactgroup... with attributes and values kind of inheritance mechanisms, dependencies amongst objets Set of configuration files Main file: nagios.cfg (ref. to other cfg. files) 1 file per object type: services.cfg, hosts.cfg, contacts.cfg, checkcommands.cfg, misccommands.cfg, timeperiods.cfg... Requires an aprioriknowledge Configuration can also stand into a database Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 13/32 Host A service have to be linked to an host Only UP and DOWN states User notification (problem, recovery) Same external checks than services UP = (WARNING or OK), DOWN = CRITICAL Typically: ICMP-based checks No active checks if related services are OK Host group (cosmetic...) Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 14/32

Other Nagios elements contact, contactgroup: to specify who to notify, how to notify andwhentonotify Used in host and service definitions command: to execute check plugins or to send user notifications Links Nagios attributes found in definitions (ex: host name) to plugin parameters Already provided for most common plugins (see checkcommands.cfg file) Basic wrapping to send emails (misccommands.cfg file) Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 15/32 dnsext 152.81.144.14 webloria 152.81.144.22 http://www.loria.fr 152.81.144.0/20 cat-481 Internet 152.81.0.0/20 nagios 152.81.9.211 dns1, preny 152.81.1.25 dns2, hugo 152.81.1.128 Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 16/32

Host definition (example) hosts.cfg define host { host name webloria alias webloria linux machine address 152.81.144.22 check command check host alive max check attempts 3 check period 24x7 notification interval 180 # 3 hours notification period 24x7 notification options d,r,f,u # down, recovery, flapping, unreachable contact groups administrators Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 17/32 Service definition (example 1) services.cfg define service{ host name webloria service description http service check command check http max check attempts 3 normal check interval 5 retry check interval 1 check period 24x7 notification interval 180 notification period 24x7 notification options w,c, r, f,u # warning, critical, recovery, flapping, unreachable contact groups administrators Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 18/32

Command definitions (example 1) checkcommands.cfg define command{ command name command line define command{ command name command line check host alive $USER1$/check icmp H $HOSTADDRESS$ check http $USER1$/check http H $HOSTADDRESS$ Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 19/32 Service definition (example 2) services.cfg define service{ host name dnsext service description dns service check command check name for given dns!www. loria. fr!152.81.144.22 max check attempts 3 normal check interval 5 retry check interval 1 check period 24x7 notification interval 180 notification period 24x7 notification options w,c, r, f,u contact groups administrators Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 20/32

Command definition (example 2) checkcommands.cfg define command{ command name check name for given dns command line $USER1$/check dns H $ARG1$ a $ARG2$ s $HOSTADDRESS$ # $ARG1$ : f u l l y q u a l i f i e d name # $ARG2$ : IP ad dress Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 21/32 Contact definition (example) contacts.cfg define contact{ contact name andrey alias laurent andrey service notification period 24x7 host notification period 24x7 service notification options w,u,c, r host notification options d,r service notification commands notify by email host notification commands host notify by email email andrey@loria. fr Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 22/32

Contactgroup definition (example) contacts.cfg define contactgroup{ contactgroup name administrators alias group of administrators members andrey Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 23/32 Local checks Remote checks Local checks Getting information about your local system Plugins based on system commands (such as ps, df, uptime) check_disk -w 30% -c 15% -p /var check_load -w 2.0,1.0,0.5 -c 4.0,2.0,1.0 check_procs -w 150 -c 250 --metric=procs Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 24/32

Local checks Remote checks Direct network checks Checking network services of remote hosts Plugins based on network protocols Directly executed from the Nagios machine check_icmp -H 1.14.1.2 -w 100.0,20% -c 200.0,40% check_tcp -H boston.loria.fr -p 7000 check_ftp -H ftp.inria.fr -p 21 -e 220 check_http -H http://www.esial.uhp-nancy.fr check_smtp, check_imap, check_pop Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 25/32 Local checks Remote checks Checks using SSH (Secure Shell) NAGIOS check_by_ssh (plugin) CLIENT sshd (daemon) NETWORK check_xyz (plugin) priv.key pub.key Nagios executes the check by ssh plugin To run a plugin deployed on the remote machine Based on asymetric keys to log without typing a password Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 26/32

Local checks Remote checks Checks using NRPE (Nagios Remote Plugin Executor) NAGIOS check_nrpe (plugin) ref tcp connection NETWORK CLIENT nrpe (daemon) ref1 check_xyz arg1 arg2 ref2 check_abc arg3 arg4...... check_xyz (plugin) Nagios executes the check nrpe plugin To interact with a dedicated daemon called nrpe Based on pre-configured plugin invocations Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 27/32 Local checks Remote checks Other remote checks Checks using SNMP Collecting management information from SNMP agents check_snmp plugin net-snmp snmpget SNMP reply + warning and critical limits service state Checks using NSCA (Nagios Service Check Acceptor) Passive method where checks are initiated by the resources themselves (close to SNMP traps) A NSCA daemon waits for incoming check results (on the Nagios machine), while the send_nsca program (on the remote machines) sends messages containing check results Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 28/32

Making configuration more simple Specifying Dependencies Making configuration more simple Monitoring the same service on several hosts setting the host name attribute of the service as a comma separated list of host names or setting an hostgroup name attribute for the service Defining template-based objects Notion of inheritance Factorizing many low-interest attributes register attribute to define a template use attribute to inherite from a template Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 29/32 Making configuration more simple Specifying Dependencies Service and hostgroup (example) define hostgroup{ hostgroup name dns hosts alias hosts supporting DNS members dns1, dns2,dnsext define service{ hostgroup name dns hosts service description dns service check command check name for given dns!www. loria. fr!152.81.144.22 max check attempts 3 normal check interval 5 retry check interval 1 check period 24x7 notification i n t e r v a l 180 notification period 24x7 notification options w,c,r,f,u contact groups administrators Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 30/32

Making configuration more simple Specifying Dependencies Specifying dependencies Remainder: service critical state host check HowdoesNagiosmakeadifferencebetween: case 1: webloria is really down case 2: webloria is unreachable due to the network? Solution: defining dependency relationships amongst objects (parents attribute) If an host is detected as down, the parent is checked If the parent is OK, the initial host is really declared down If not, the host is declared unreachable Ex: cat-481 router defined as a parent of webloria Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 31/32 Making configuration more simple Specifying Dependencies Conclusions Open source monitoring solution Based on simple concepts: checks, states, notifications Easily extensible and integrable No discovery mechanism Experience it during lab exercises! Monitoring hosts and their services Developing and testing our own plugin Experimenting state transitions and notifications Laurent Andrey, Rémi Badonnel An to Monitoring with Nagios 32/32