Network Monitoring Tools (Nagios, MRTG) CSD Fall 2010 Version: 1.3 Identifier: ISP-003 Project owners Björn Pehrson Sven Jonsson Amos Nungu Project coach Hans Eriksson Team members Contact ECTS credits Anand Kannan anandk@kth.se 15 cr. Biniam Goshu Mekonnen biniamgm@kth.se 30 cr. Boris Ristov ristov@kth.se 24 cr. Christina Sidiropoulou csid@kth.se 15 cr. Ekaterina Garbaruk garbaruk@kth.se 15 cr. Manxing Du manxing@kth.se 30 cr. Shabnam Sadat Jalalinia ssja@kth.se 15 cr. 01/07/2011
Version history Version Release date Changes Author(s) 1.3 01/07/2011 How to perform simulations was added to the Nagios configuration part Anand Kannan Shabnam Sadat Jalalinia 1.2 12/30/2010 Document is reviewed Boris Ristov 1.1 12/29/2010 More additions to the Nagios configuration part is done and the MRTG part is added to the document Anand Kannan Shabnam Sadat Jalalinia 1.0 11/22/2010 First version of the document was created Anand Kannan Shabnam Sadat Jalalinia 2 (10)
Table of contents 1. Introduction... 4 1.1. Purpose of document... 4 1.2. Scope of document... 4 1.3. Audience of document... 4 2. Nagios... 4 2.1. About Nagios... 4 2.2. Nagios configuration... 5 2.3. Nagios plugins... 6 3. Host and Service checks... 7 4. MRTG (The Multi Router Traffic Grapher)... 9 5. References... 10 3 (10)
1. Introduction A network monitoring system is the essential part of any network and is used to detect and report failures of devices or connections. Network bandwidth, host connectivity, DHCP offers, DNS response time and ping are some of the parameters which are reported by the network monitoring tools. In the CareNet network, Nagios is used for network monitoring and MRTG (the Multi Router Traffic Grapher) to check the routers status. 1.1. Purpose of document The purpose of this document is to give information on the network monitoring applications in the CareNet network, Nagios and MRTG. 1.2. Scope of document This document covers information about the Nagios and MRTG monitoring tools in the CareNet network. It provides information on the Nagios configuration, plugins and the host and service checks in the CareNet monitoring system. Information on the way MRTG is used in CareNet is also provided in this document. 1.3. Audience of document This document is aimed at the next CareNet technical team. 2. Nagios 2.1. About Nagios Nagios is a powerful open source network monitoring tool which is used in the CareNet network. It monitors the entire network infrastructure to ensure that systems, applications and network devices are functioning properly. In case of a failure, Nagios alerts the NOC team within 30 seconds of the failure. Some of the functionalities of Nagios which are used in the CareNet network are: 1) Scheduled host downtime for maintenance 2) Scheduled service downtime for maintenance 3) Process information to check the status of each process 4) Program-wide performance information 5) Daily, weekly and monthly availability reports 6) Real-time and archived event logs and notifications. 4 (10)
2.2. Nagios configuration The Nagios monitoring tool is installed in /etc/nagios3/ and most of the configuration files are located under this directory. The well-defined configuration files of Nagios are: 1) Main configuration file 2) Resource file(s) 3) Object definition file(s) 4) CGI configuration file(s) The main configuration file manipulates how the Nagios daemons operate. The configuration file is read by Nagios daemons and the CGI s. The main configuration file is named nagios.cfg and is located in the /etc/nagios3/ directory. Note: After any changes in the configurations a simulation of its consistency must be performed and when no errors are reported proceed with restarting nagios. In order to simulate the v option is used: /usr/sbin/nagios3 v /etc/nagios3/nagios.cfg Configuration file variables used in the CareNet network are: 1) Log files They define where Nagios should create its main log files. This must be the first variable to be defined since Nagios will write errors here while performing other checks. log_file=/var/log/nagios3/nagios.log 2) Object configuration files They contain the object definition that Nagios should use for monitoring. These files contain the definition for hosts, host groups, contacts, contact groups, services, commands, etc. For changes to the hosts: For changes to the services: For changes to the host groups: For changes to the service groups: cfg_file=/etc/nagios3/commands.cfg cfg_file=/etc/nagios3/conf.d/hosts.cfg cfg_file=/etc/nagios3/conf.d/services.cfg cfg_file=/etc/nagios/conf.d/hostgroups.cfg cfg_file=/etc/nagios/conf.d/servicegroups.cfg 3) Object cache file This contains the location where the object definitions are cached when the Nagios service is restarted. The CGI s read object definitions from the objects.cache file to prevent inconsistencies that may occur when the config files are modified after the service is started/restarted. object_cache_file=/var/cache/nagios3/objects.cache 5 (10)
4) Pre-cached object file This contains the location of the precached object file. If you run Nagios with the -p command line option, it will preprocess the object configuration file and write the cached configuration to this file. Nagios can then be started with the -u option to have it read the object definitions from this file, rather than the standard object configuration file. precached_object_file=/var/lib/nagios3/objects.precache 5) Resource file This contains the USER macro definitions. The CGI s will not attempt to read the contents of resource files, so information that is considered to be sensitive (usernames, passwords, etc.) can be defined as macros in this file and restrictive permissions (600) can be placed on this file. resource_file=/etc/nagios3/resource.cfg 6) Status file This contains the current status of all monitored hosts and services whose contents are preprocessed by the CGI. This file is refreshed every time the Nagios service is restarted. status_file=/var/cache/nagios3/status.dat 7) Notification file This contains the information regarding the notification when something behaves abnormally in the network. Here we define the e-mails and SMS alerts which has a real-time reaction time of 10 seconds. /etc/nagios3/commands.cfg is the location of this file. [1] The above-mentioned files are the main configuration files that a network admin will use regularly. There are many other features which are used in Nagios and these can all be observed in the nagios.cfg file. For more information on the usage of each command refer to the link below: http://nagios.sourceforge.net/docs/3_0/configmain.html#cfg_file 2.3. Nagios plugins Nagios has no internal mechanism to check the services and hosts. It is using Nagios plugins; hence this is the most essential part of Nagios. Nagios executes the plugins whenever there is a need to check the hosts and services that are scheduled, and these plugins return the results to Nagios. The scripts can be found in the /usr/lib/nagios/plugins/ directory and the commands to invoke these scripts can be found under the /etc/nagios-plugins/config/ directory. Configuration details related to the Nagios grapher which is an add-on can be found under the /etc/nagiosgrapher/ directory. More information on the plugins available and the add-ons can be found at: http://nagios.sourceforge.net/docs/3_0/plugins.html 6 (10)
All plugins follow the return code scheme described below: Plugin return code Service state Host state 0 OK UP 1 WARNING UP or DOWN/UNREACHABLE 2 CRITICAL DOWN/UNREACHABLE 3 UNKNOWN DOWN/UNREACHABLE Table 1: Nagios plugin return codes 3. Host and Service checks The monitored hosts in the CareNet monitoring system are the main Proxmox servers, ids, the virtual servers (sip, domain, management, portal), the Kista wireless LAN, the routers and the router interfaces. Below is a summary of the hosts monitored in Nagios: domain, hr, hr-eth0, hr-eth2, hr-eth3, hr-eth4, hr-eth5, ids, kis-wlan, kr, kr-eth0, kr-eth2, management, portal, server01, server02, sip, vr, vr-eth1, vr-eth2, vr-eth3, vr-eth5 and vr-eth7 According to the location and type, hosts are grouped in host groups, e.g. the host group Kista includes: domain, ids, kis-wlan, kr, kr-eth0, kr-eth2, management, portal, server01/2 and sip hosts. Host checking checks the reachability of each host and if the ping works fine the state of the host will be shown as up. In order to make any changes to the hosts and host groups in the CareNet monitoring system, /etc/nagios3/conf.d/hosts.cfg and /etc/nagios3/conf.d/hostgroups.cfg should be edited. Each host or host group has the related services to be checked. Some of them are the same for some hosts and some of them are specific to each host. Making changes to services and service groups should be done in the /etc/nagios3/conf.d/services.cfg and /etc/nagios3/conf.d/servicegroups.cfg files. 7 (10)
Below is a list of the hosts and their related services: Host server01 domain management sip portal ids kis-wlan kr kr-eth0 kr-eth2 vr vr-eth1 vr-eth2 Service(s) Aptitude HTTP Aptitude DHCP DNS NTP VPN Aptitude HTTP MySQL SNMP Aptitude TCP UDP Aptitude SNMP Bandwidth SNMP Bandwidth Bandwidth 8 (10)
vr-eth3 vr-eth5 vr-eth7 hr hr-eth0 hr-eth2 hr-eth3 hr-eth4 hr-eth5 SNMP Bandwidth Table 2: Hosts and related services 4. MRTG (The Multi Router Traffic Grapher) In the CareNet network MRTG is used to monitor the routers using SNMP and draws the corresponding graphs. MRTG is installed under /etc/mrtg/ and its configuration file can be found at /etc/mrtg/mrtg.cfg. MRTG will most likely not work properly when the environment variable LANG is set to UTF-8. Command to start: env LANG=C /usr/bin/mrtg/etc/mrtg/mrtg.conf A real-time presentation of MRTG for the CareNet network can be seen on our website by following the link below: http://vm-199.xen.ssvl.kth.se/csdlive/content/network-usage 9 (10)
5. References [1] Nagios. (08/28/2010). Main Configuration File Options. [Online]. Viewed 12/29/2010. Available: http://nagios.sourceforge.net/docs/3_0/configmain.html 10 (10)