log, syslog, logrotate SNMP tools for monitoring



Similar documents
MRTG / RRDTool. Network Management Workshop. June 2009 Papeete, French Polynesia

TÓPICOS AVANÇADOS EM REDES ADVANCED TOPICS IN NETWORKS

Syslog & xinetd. Stephen Pilon

CSE 265: System and Network Administration

Topics. CIT 470: Advanced Network and System Administration. Logging Policies. System Logs. Throwing Away. How to choose a logging policy?

CSE/ISE 311: Systems Administra5on Logging

syslog - centralized logging

TELE 301 Network Management

This watermark does not appear in the registered version - SNMP and OpenNMS. Part 1 SNMP.

NAS 272 Using Your NAS as a Syslog Server

Linux System Administration. System Administration Tasks

SNMP and Network Management

Remote Management. Vyatta System. REFERENCE GUIDE SSH Telnet Web GUI Access SNMP VYATTA, INC.

May PZ-0502A-WWEN Prepared by: Internet & E-Commerce Solutions

Simple Network Management Protocol

NMS300 Network Management System

Network Management. Jaakko Kotimäki. Department of Computer Science Aalto University, School of Science. 21. maaliskuuta 2016

The ABCs of SNMP. Info Sheet. The ABC of SNMP INTRODUCTION. SNMP Versions

Configuring System Message Logging

SIMPLE NETWORK MANAGEMENT PROTOCOL (SNMP)

SYSLOG 1 Overview... 1 Syslog Events... 1 Syslog Logs... 4 Document Revision History... 5

AXIGEN Mail Server Reporting Service

Simple Network Management Protocol

There are numerous ways to access monitors:

Configuring System Message Logging

Assignment One. ITN534 Network Management. Title: Report on an Integrated Network Management Product (Solar winds 2001 Engineer s Edition)

NNMi120 Network Node Manager i Software 9.x Essentials

SNMP Simple Network Management Protocol

Network Monitoring & Management Introduction to SNMP

SNMP Basics BUPT/QMUL

Determine the process of extracting monitoring information in Sun ONE Application Server

11.1. Performance Monitoring

Configuring SNMP Cisco and/or its affiliates. All rights reserved. 1

Security Correlation Server Quick Installation Guide

Deploying the BIG-IP LTM with the Cacti Open Source Network Monitoring System

Users Manual OP5 Logserver 1.2.1

GANGLIA INSTALLATION GUIDE

Network Management & Monitoring Introduction to SNMP

MRTG used for Basic Server Monitoring

Security Correlation Server Quick Installation Guide

Network Monitoring with SNMP

Presented by Henry Ng

OnCommand Unified Manager

MIB Explorer Feature Matrix

Configuring System Message Logging

CS615 - Aspects of System Administration

Tracking Network Changes Using Change Audit

Simple Network Management Protocol (SNMP) Primer

Simple Network Management Protocol

Network Monitoring & Management Log Management

A Guide to Understanding SNMP

Network Monitoring with SNMP

Cisco Setting Up PIX Syslog

SNMP Extensions for a Self Healing Network

STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM

Simple Network Management Protocol SNMP

SNMP. Simple Network Management Protocol

Network Monitoring. Dhruba Raj Bhandari (CCNA) Manager Systems Soaltee Crowne Plaza Kathmandu NEPAL

Demystifying SNMP. TruePath Technologies Inc 10/5/2015 2:11:14 PM Version 1.db. p.1

UNISOL SysAdmin. SysAdmin helps systems administrators manage their UNIX systems and networks more effectively.

Computer Security DD2395

Simple Network Management Protocol

SNMP Agent Plug-In Help Kepware Technologies

Newton Linux User Group Graphing SNMP with Cacti and RRDtool

WhatsUp Gold v11 Features Overview

Using the VCDS Application Monitoring Tool

WhatsUp Gold v11 Features Overview

TECHNICAL NOTES. Technical Notes P/N REV A01. EMC ITOI VoIP Management Suite 8.1. May, 2012

White Paper Case Study:

Wait, How Many Metrics? Monitoring at Quantcast

(Refer Slide Time: 1:17-1:40 min)

Monitoring System Status

Brocade Product Training

PageR Enterprise Monitored Objects - AS/400-5

Command Center :56:41 UTC Citrix Systems, Inc. All rights reserved. Terms of Use Trademarks Privacy Statement

Workflow Templates Library

Network Management (NETW-1001)

Technical Notes P/N Rev 01

Maintaining Non-Stop Services with Multi Layer Monitoring

System and Network Management

PANDORA FMS NETWORK DEVICES MONITORING

CS 392/CS Computer Security. Module 17 Auditing

HP LeftHand SAN Solutions

PANDORA FMS NETWORK DEVICE MONITORING

SNMP Adapter Installation and Configuration Guide

Cisco CMTS Router MIB Overview

FileNet System Manager Dashboard Help

TPAf KTl Pen source. System Monitoring. Zenoss Core 3.x Network and

Outline of the SNMP Framework

Network Monitoring. By: Delbert Thompson Network & Network Security Supervisor Basin Electric Power Cooperative

Table of Contents. Overview...2. System Requirements...3. Hardware...3. Software...3. Loading and Unloading MIB's...3. Settings...

Advanced Guide for Configuring SNMPc to Manage Any SNMP Enabled Device

A Brief. Introduction. of MG-SOFT s SNMP Network Management Products. Document Version 1.3, published in June, 2008

Network Probe User Guide

How To Monitor A Network With Snmp (Network Monitoring)

Transcription:

log, syslog, logrotate SNMP tools for monitoring ASI Master M2 ASR - Luiz Angelo STEFFENEL - L Steffenel 2008 1

Syslog and Log files L Steffenel 2008 2

Outline Log files What need to be logged Logging policies Finding log files Syslog: the system event logger how syslog works its configuration file the software that uses syslog debugging syslog L Steffenel 2008 3

What to be logged? The accounting system The kernel Various utilities all produce data that need to be logged most of the data has a limited useful lifetime, and needs to be summarized, compressed, archived and eventually thrown away L Steffenel 2008 4

Logging policies Throw away all data immediately Reset log files at periodic intervals Rotate log files, keeping data for a fixed time Compress and archive to tape or other permanent media L Steffenel 2008 5

Which one to choose Depends on : how much disk space you have how security-conscious you are Whatever scheme you select, regular maintenance of log files should be automated using cron L Steffenel 2008 6

Throwing away log files not recommend security problems ( accounting data and log files provide important evidence of break-ins) helpful for alerting you to hardware and software problems. In general, keep one or two months in a real world, it may take one or two weeks for SA to realize that site has been compromised by a hacker and need to review the logs L Steffenel 2008 7

Throwing away (cont.) Most sites store each day s log info on disk, sometimes in a compressed format These daily files are kept for a specific period of time and then deleted One common way to implement this policy is called rotation L Steffenel 2008 8

Rotating log files Keep backup files that are one day old, two days old, and so on. logfile, logfile.1, logfile.2, logfile.7 Each day rename the files to push older data toward the end of the chain script to archive three days files L Steffenel 2008 9

#! /bin/sh cd /var/log mv logfile.2 logfile.3 mv logfile.1 logfile.2 mv logfile logfile.1 cat /dev/null > logfile Some daemons keep their log files open all the time, this script can t be used with them. To install a new log file, you must either signal the daemon, or kill and restart it. L Steffenel 2008 10

#! /bin/sh cd /var/log mv logfile.2.z logfile.3.z mv logfile.1.z logfile.2.z mv logfile logfile.1 cat /dev/null > logfile kill -signal pid compress logfile.1 signal - appropriate signal for the program writing the log file pid - process id L Steffenel 2008 11

Archiving log files Some sites must archive all accounting data and log files as a matter of policy, to provide data for a potential audit Log files should be first rotate on disk, then written to tape or other permanent media L Steffenel 2008 12

Finding log files To locate log files, read the system startup scripts : /etc/rc* or /etc/init.d/* if logging is turned on when daemons are run where messages are sent Some programs handle logging via syslog check /etc/syslog.conf to find out where this data goes L Steffenel 2008 13

Finding log files Different operating systems put log files in different places: /var/log/* /var/cron/log /usr/adm /var/adm On linux, all the log files are in /var/log directory. L Steffenel 2008 14

Outline Log files What need to be logged Logging policies Finding log files Syslog: the system event logger how syslog works its configuration file debugging syslog the software that uses syslog L Steffenel 2008 15

What is syslog A comprehensive logging system, used to manage information generated by the kernel and system utilities. Allow messages to be sorted by their sources and importance, and routed to a variety of destinations: log files, users terminals, or even other machines. L Steffenel 2008 16

Syslog: three parts Syslogd and /etc/syslog.conf the daemon that does the actual logging its configuration file openlog, syslog, closelog library routines that programs use to send data to syslogd logger user-level command for submitting log entries L Steffenel 2008 17

syslog-aware programs Using syslog lib. Routines write log entries to a special file /dev/log /dev/klog reads syslogd consults /etc/syslog.conf dispatches Log files Users s Other terminalsmachines L Steffenel 2008 18

Configuring syslogd The configuration file /etc/syslog.conf controls syslogd s behavior. It is a text file with simple format, blank lines and lines beginning with # are ignored. Selector <TAB> action mail.info /var/log/maillog L Steffenel 2008 19

Configuration file - selector Identify source -- the program ( facility ) that is sending a log message importance -- the messages s severity level eg. mail.info /var/log/maillog Syntax facility.level facility names and severity levels must chosen from a list of defined values L Steffenel 2008 20

Configuration file - Facility names Facility kern user mail daemon auth lpr news Programs that use it The kernel User process, default if not specified The mail system System daemons Security and authorization related commands The BSD line printer spooling system The Usenet news system L Steffenel 2008 21

Configuration file - Facility names Facility Programs that use it uucp Reserved for UUCP cron The cron daemon mark Timestamps generated at regular intervals local0-7 Eight flavors of local message syslog Syslog internal messages authpriv Private or system authorization messages ftp The ftp daemon, ftpd * All facilities except mark L Steffenel 2008 22

Configuration file - Facility names Timestamps can be used to log time at regular intervals by default, every 20 minutes So you can figure out that your machine crashed between 3:00 and 3:20 am, not just sometime last night. This can be a big help if debugging problems occur on a regular basis. L Steffenel 2008 23

Configuration file - severity level Level emerg (panic) alert crit err warning notice info debug Approximate meaning Panic situation Urgent situation Critical condition Other error conditions Warning messages Unusual things that may need investigation Informational messages For debugging L Steffenel 2008 24

Configuration file - selector Can include multiple facilities separated with, commas daemon,auth,mail.level action Multiple selector can be combined with ; daemon.level1; mail.level2 action Selector are --ORed together, a message matching any selector will be subject to the action. Can contain * or none, meaning all or nothing. L Steffenel 2008 25

Configuration file - selector Levels indicate the minimum importance that a message must have in order to be logged mail.warning, would match all the messages from mail system, at the minimum level of warning Level of none will excludes the listed facilities regardless of what other selectors on the same line may say. *.level1;mail.none action all the facilities, except mail, at the minimum level 1 will subject to action L Steffenel 2008 26

Configuration file - action (Tells what to do with a message) Action Meaning filename Write message to a file on the local machine @hostname Forward message to the syslogd on hostname @ipaddress Forward message to the host at IP address user1, user2, Write message to users screens if they are logged in * Write message to all users logged in L Steffenel 2008 27

Configuration file - action If a filename action used, the filename must be absolute path. The file must exist, syslogd will not create it. /var/log/messages If a hostname is used, it must be resolved via a translation mechanism such as DNS or NIS While multiple facilities and levels are allowed in a selector, multiple actions are not allowed. L Steffenel 2008 28

Config file examples # Small network or stand-alone syslog.conf file # emergencies: tell everyone who is logged on *.emerg * # important messages *.warning;daemon,auth.info /var/adm/messages # printer errors lpr.debug /var/adm/lpd-errs L Steffenel 2008 29

# network client, typically forwards serious messages to # a central logging machine # emergencies: tell everyone who is logged on *.emerg;user.none * #important messages, forward to central logger *.warning;lpr,local1.none @netloghost daemon,auth.info @netloghost # local stuff to central logger too local0,local2,local7.debug # card syslogs to local1 - to boulder local1.debug # printer errors, keep them local lpr.debug @netloghost @boulder.colorado.edu /var/adm/lpd-errs # sudo logs to local2 - keep a copy here local2.info /var/adm/sudolog L Steffenel 2008 30

Sample syslog output Dec 27 02:45:00 x-wing netinfod [71]: cann t lookup child Dec 27 02:50:00 bruno ftpd [27876]: open of pid file failed: not a directory Dec 27 02:50:47 anchor vmunix: spurious VME interrupt at processor level 5 Dec 27 02:52:17 bruno pingem[107]: moose.cs.colorado.edu has not answered 34 times Dec 27 02:55:33 bruno sendmail [28040] : host name/address mismatch: 192.93.110.26!= bull.bull.fr L Steffenel 2008 31

Syslog s functions Liberate programmers from the tedious mechanics of writing log files Put SA in control of logging before syslog, SA had no control over what info was kept or where it was stored. Can centralize the logging for a network system L Steffenel 2008 32

Syslogd (cont.) A hangup signal (HUP, signal 1) cause syslogd to close its log files, reread its configuration file, and start logging again. If you modify the syslog.conf file, you must HUP syslogd to make your changes take effect. kill -1 pid L Steffenel 2008 33

Software that uses syslog Program Facility Levels Description amd auth err-info NFS automounter date auth notice Display and set date ftpd daemon err-debug ftp daemon gated daemon alert-info Routing daemon apache daemon err Internet info server halt/reboot auth crit Shutdown programs login/rlogind auth crit-info Login programs lpd lpr err-info BSD line printer daemon L Steffenel 2008 34

Software that uses syslog Program Facility Levels Description named daemon err-info Name sever (DNS) passwd auth err Password setting programs sendmail mail debug-alert Mail transport system rwho daemon err-notice romote who daemon su auth crit, notice substitute UID prog. sudo local2 notice, alert Limited su program syslogd syslog,mark err-info internet errors, timestamps L Steffenel 2008 35

Final words On linux, check following files: /etc/syslog.conf : syslog configuration file /etc/logrotate.conf : logging policy, rotate /etc/logrotate.d/* /var/log/* : log files try following commands to find out more... man logrotate man syslogd L Steffenel 2008 36

SNMP L Steffenel 2008 37

Overview Introduction Management Information Base (MIB) Simple Network Management Protocol (SNMP) SNMP Commands Tools - SNMPwalk (CLI) - MIB Browser (GUI) L Steffenel 2008 38

Introduction (1) SNMP - Application-layer protocol for managing TCP/IP based networks. - Runs over UDP, which runs over IP (2) NMS (Network Management Station) - Device that pools SNMP agent for info. (3) SNMP Agent - Device (e.g. Router) running software that understands SNMP language (4) MIB - Database of info conforming to SMI. (5) SMI Structure of Management Information - Standard that defines how to create a MIB. L Steffenel 2008 39

MIB Management Information Base MIB Breakdown - OBJECT-TYPE - String that describes the MIB object. - Object IDentifier (OID). - SYNTAX - Defines what kind of info is stored in the MIB object. - ACCESS - READ-ONLY, READ-WRITE. - STATUS - State of object in regards the SNMP community. - DESCRIPTION - Reason why the MIB object exists. Standard MIB Object: sysuptime OBJECT-TYPE SYNTAX Time-Ticks ACCESS read-only STATUS mandatory DESCRIPTION Time since the network management portion of the system was last reinitialised. ::= {system 3} L Steffenel 2008 40

MIB Management Information Base Object IDentifier (OID) 1 iso(1) org(3) - Example.1.3.6.1.2.1.1 3 dod(6) - iso(1) org(3) dod(6) internet(1) mgmt(2) mib-2(1) system(1) directory(1) 6 1 internet(1) 4 private(4) Note: -.1.3.6.1 ~100% present. - mgmt and private most common. - MIB-2 successor to original MIB. - STATUS mandatory, All or nothing in group 1 1 system(1) 2 mgmt(2) mib-2(1) 1 interfaces(2) 3 experimental(3) tcp(6) 6 ip(4) 2 4 L Steffenel 2008 41

MIB Management Information Base system(1) group - Contains objects that describe some basic information on an entity. - An entity can be the agent itself or the network object that the agent is on. 1 system(1) 2 1 mib-2(1) interfaces(2) system(1) group objects - sysdescr(1) Description of the entity. - sysobjectid(2) Vendor defined OID string. - sysuptime(3) Time since net-mgt was last re-initialised. - syscontact(4) Name of person responsible for the entity. L Steffenel 2008 42

MIB Management Information Base MIB - tree view MIB - syntax view 1 sysdesc(1) system(1) 1 sysobjectid(2) 2 mib-2(1) 1 syscontact(3) 4 sysuptime(3) 3 sysuptime OBJECT-TYPE SYNTAX INTEGER ACCESS read-only STATUS mandatory DESCRIPTION The time (in hundredths of a second) since the network management portion of the system was last re-initialized. ::= {system 3} L Steffenel 2008 43

MIB Management Information Base SNMP Instances - Each MIB object can have an instance. - A MIB for a router s (entity) interface information iso(1) org(3) dod(6) internet(1) mgmt(2) mib-2(1) interfaces(2) iftable(2) ifentry(1) iftype(3) - Require one iftype value per interface (e.g. 3) - One MIB object definition can represent multiple instances through Tables, Entries, and Indexes. L Steffenel 2008 44

MIB Management Information Base Tables, Entries, and Indexes. - Imagine tables as spreadsheets - Three interface types require 3 rows (index no.s) - Each column represents a MIB object, as defined by the entry node. ENTRY + INDEX = INSTANCE iftype(3) ifmtu(4) Etc Index #1 Index #2 Index #3 iftype.1[6] iftype.2:[9] iftype.3:[15] ifmtu.1 ifmtu.2 ifmtu.3 L Steffenel 2008 45

MIB Management Information Base Example MIB Query - If we queried the MIB on iftype we could get: - iftype.1 : 6 - iftype.2 : 9 - iftype.3 : 15 Which corresponds to - iftype.1 : ethernet - iftype.2 : tokenring - iftype.3 : fddi iftype OBJECT-TYPE SYNTAX INTEGER { other(1), ethernet(6), tokenring(9) fddi(15), } etc L Steffenel 2008 46

Simple Network Management Protocol Retrieval protocol for MIB. Can retrieve by - CLI (snmpwalk), - GUI (MIB Browser), or - Larger applications (Sun Net Manager) called Network Management Software (NMS). NMS collection of smaller applications to manage network with illustrations, graphs, etc. NMS run on Network Management Stations (also NMS), which can run several different NMS software applications. L Steffenel 2008 47

SNMP Commands SNMP has 5 different functions referred to as Protocol Data Units (PDU s), which are: (1) GetRequest, aka Get (2) GetNextRequest, aka GetNext (3) GetResponse, aka Response (4) SetRequest, aka Set (5) Trap L Steffenel 2008 48

SNMP Commands [Get] GetRequest [Get] - Most common PDU. - Used to ask SNMP agent for value of a particular MIB agent. - NMS sends out 1 Get PDU for each instance, which is a unique OID string. - What happens if you don t know how many instances of a MIB object exist? L Steffenel 2008 49

SNMP Commands [GetNext] GetNextRequest [GetNext] - NMS application uses GetNext to walk down a table within a MIB. - Designed to ask for the OID and value of the MIB instance that comes after the one asked for. - Once the agent responds the NMS application can increment its count and generate a GetNext. - This can continue until the NMS application detects that the OID has changed, i.e. it has reached the end of the table. L Steffenel 2008 50

SNMP Commands [GetResponse] GetResponse [Response] - Simply a response to a Get, GetNext or Set. - SNMP agent responds to all requests or commands via this PDU. L Steffenel 2008 51

SNMP Commands [SetRequest] SetRequest [Set] - Issued by an NMS application to change a MIB instance to the variable within the Set PDU. - For example, you could issue a - GetRequest against a KDEG server asking for syslocation.0 and may get ORI as the response. - Then, if the server was moved, you could issue a Set against that KDEG server to change its location to INS. - You must have the correct permissions when using the set PDU. L Steffenel 2008 52

SNMP Commands [Trap] Trap - Asynchronous notification. - SNMP agents can be programmed to send a trap when a certain set of circumstances arise. - Circumstances can be view as thresholds, i.e. a trap may be sent when the temperature of the core breaches a predefined level. L Steffenel 2008 53

SNMP Security SNMP Community Strings (like passwords) - 3 kinds: - READ-ONLY: You can send out a Get & GetNext to the SNMP agent, and if the agent is using the same read-only string it will process the request. - READ-WRITE: Get, GetNext, and Set. If a MIB object has an ACCESS value of read-write, then a Set PDU can change the value of that object with the correct read-write community string. - TRAP: Allows administrators to cluster network entities into communities. Fairly redundant. L Steffenel 2008 54

SNMP Tools Command Line Interface e.g. snmpwalk Graphical User Interface e.g. ireasoning s MIB Browser via www.ireasoning.com L Steffenel 2008 55

SNMP MIB Browser (1) Initial set-up... Breakdown - LHS is the SNMP MIB structure. - Lower LHS has details of MIB structure. - RHS will present MIB values. L Steffenel 2008 56

SNMP MIB Browser (2) Discovery - Subnet: 134.XXX.XXX.* - Read Community: public Start Note IP Address. Stop L Steffenel 2008 57

SNMP MIB Browser (3) Navigation - MIB Tree System sysuptime -Notice Lower LHS - Notice OID L Steffenel 2008 58

SNMP MIB Browser (4) SNMP PDU s (1) Get - Select Go Get - RHS has values. - OID Value L Steffenel 2008 59

SNMP MIB Browser (5) SNMP PDU s (2) GetNext -Selected OID is:.1.3.6.1.2.1.1.5 -Returned value: (.1.3.6.1.2.1.1.6) or DSG, O Reilly Institute, F.35 L Steffenel 2008 60

SNMP MIB Browser (6) SNMP (3) Get SubTree -Position of MIB:.1.3.6.1.2.1.1 (a.k.a. system) -RHS values: Returns all values below system. L Steffenel 2008 61

SNMP MIB Browser (7) SNMP (4) Walk -MIB Location:.1.3.6.1.2.1 (a.k.a. mib-2) - Returns *ALL* values under mib- 2 L Steffenel 2008 62

SNMP MIB Browser (8) Tables - MIB Location:.1.3.6.1.2.1.2.2 (or interfaces) - Select iftable, Go, then Table View. - Refresh/Poll L Steffenel 2008 63

SNMP MIB Browser (9) SNMP - Graph - Select a value from the RHS, say sysuptime - Highlight and select Go, then Graph. - Interval = 1s set. L Steffenel 2008 64

MRTG/RRDTool Ganglia L Steffenel 2008 65

MRTG The Multi Router Traffic Grapher (MRTG) is a tool to monitor the traffic load on network-links. MRTG generates HTML pages containing PNG images which provide an almost live visual representation of this traffic. Check http://oss.oetiker.ch/mrtg/ to see what it does. MRTG has been the most common network traffic measurement tool for all Service Providers MRTG uses simple SNMP queries on a regular interval to generate graphs L Steffenel 2008 66

MRTG External readers for MRTG graphs can create other interpretation of data. MRTG software can be used not only to measure network traffic on interfaces, but also build graphs of anything that has an equivalent SNMP MIB - like CPU load, Disk availability, Temperature, etc... Data sources can be anything that provides a counter or gauge value not necessarily SNMP. For example, graphing round trip times MRTG can be extended to work with RRDTool L Steffenel 2008 67

Running MRTG Get the required packages Compile and install the packages Make cfg files for router interfaces with cfgmaker Create html pages from the cfg files with indexmaker Trigger MRTG periodically from Cron or run it in daemon mode L Steffenel 2008 68

L Steffenel 2008 69

RRDtool Round Robin Database for time series data storage Command line based From the author of MRTG Made to be faster and more flexible Includes CGI and Graphing tools, plus APIs Solves the Historical Trends and Simple Interface problems L Steffenel 2008 70

Define Data Sources (Inputs) DS:speed:COUNTER:600:U:U DS:fuel:GAUGE:600:U:U DS = Data Source speed, fuel = variable names COUNTER, GAUGE = variable type 600 = heart beat UNKNOWN returned for interval if nothing received after this amount of time U:U = limits on minimum and maximum variable values (U means unknown and any value is permitted) L Steffenel 2008 71

Define Archives (Outputs) RRA:AVERAGE:0.5:1:24 RRA:AVERAGE:0.5:6:10 RRA = Round Robin Archive AVERAGE = consolidation function 0.5 = up to 50% of consolidated points may be UNKNOWN 1:24 = this RRA keeps each sample (average over one 5 minute primary sample), 24 times (which is 2 hours worth) 6:10 = one RRA keeps an average over every six 5 minute primary samples (30 minutes), 10 times (which is 5 hours worth) Clear as mud! all depends on original step size which defaults to 5 minutes L Steffenel 2008 72

RRDtool Database Format Recent data stored once every 5 minutes for the past 2 hours (1:24) Old data averaged to one entry per day for the last 365 days (288:365) --step 300 (5 minute input step size) RRA 1:24 RRA 6:10 RRA 288:365 RRD File Medium length data averaged to one entry per half hour for the last 5 hours (6:10) L Steffenel 2008 73

Isn't it simple?! rrdtool create /var/nagios/rrd/host0_load.rrd -s 600 DS:1MIN-Load:GAUGE:1200:0:100 DS:5MIN- Load:GAUGE:1200:0:100 DS:15MIN-Load:GAUGE:1200:0:100 RRA:AVERAGE:0.5:1:50400 RRA:AVERAGE:0.5:60:43800 rrdtool create /var/nagios/rrd/host0_disk_usage.rrd -s 600 DS:root:GAUGE:1200:0:U DS:home:GAUGE:1200:0:U DS:usr:GAUGE:1200:0:U DS:var:GAUGE:1200:0:U RRA:AVERAGE:0.5:1:50400 RRA:AVERAGE:0.5:60:43800 rrdtool create /var/nagios/rrd/apricot-intl_ping.rrd -s 300 DS:ping:GAUGE:600:0:U RRA:AVERAGE:0.5:1:50400 RRA:AVERAGE:0.5:60:43800 rrdtool create /var/nagios/rrd/host0_total.rrd -s 300 DS:IN:COUNTER:1200:0:U DS:OUT:COUNTER:600:0:U RRA:AVERAGE:0.5:1:50400 RRA:AVERAGE:0.5:60:43800 L Steffenel 2008 74

Ping Latency Graph L Steffenel 2008 75

Agenda Ganglia Monitoring Introduction and Overview Ganglia Architecture Apache Web Frontend Gmond & Gmetad Extending Ganglia GMetrics Module Development L Steffenel 2008 76

Introduction and Overview Scalable Distributed Monitoring System Targeted at monitoring clusters and grids Multicast-based Listen/Announce protocol Depends on open standards XML XDR compact portable data transport RRDTool - Round Robin Database APR Apache Portable Runtime Apache HTTPD Server PHP based web interface http://ganglia.sourceforge.net or http://www.ganglia.info L Steffenel 2008 77

Ganglia Architecture Gmond Metric gathering agent installed on individual servers Gmetad Metric aggregation agent installed on one or more specific task oriented servers Apache Web Frontend Metric presentation and analysis server Attributes Multicast All gmond nodes are capable of listening to and reporting on the status of the entire cluster Failover Gmetad has the ability to switch which cluster node it polls for metric data Lightweight and low overhead metric gathering and transport Ported to various different platforms (Linux, FreeBSD, Solaris, others) L Steffenel 2008 78

Ganglia Architecture Apache Web Frontend Web Client GMETAD Poll Poll GMETAD Failover Poll Failover Failover Poll Cluster 1 Cluster 2 Cluster 3 GMOND Node GMOND Node GMOND Node GMOND Node GMOND Node GMOND Node GMOND Node GMOND Node GMOND Node L Steffenel 2008 79

Ganglia Web Frontend Built around Apache HTTPD server using mod_php Uses presentation templates so that the web site look and feel can be easily customized Presents an overview of all nodes within a grid vs all nodes in a cluster Ability to drill down into individual nodes Presents both textual and graphical views L Steffenel 2008 80

Ganglia Customized Web Front-end L Steffenel 2008 81

Deploying Ganglia Monitoring See http://ganglia.sourceforge.net/docs/ganglia.html Install Gmond on all monitored nodes Edit the configuration file Add cluster and host information Configure network upd_send_channel, udp_recv_channel, tcp_accept_channel Start gmond Installing Gmetad on an aggregation node Edit the configuration file Add data and failover sources Add grid name Start gmetad Installing the web frontend Install Apache httpd server with mod_php Copy Ganglia web pages and PHP code to appropriate location Add appropriate authentication configuration for access control L Steffenel 2008 82

Gmond Gathering & Gmetad Aggregation Agents L Steffenel 2008 83

Gmond Metric Gathering Agent Built-in metrics Various CPU, Network I/O, Disk I/O and Memory Extensible Gmetric Out-of-process utility capable of invoking command line based metric gathering scripts Loadable modules capable of gathering multiple metrics or using advanced metric gathering APIs Built on the Apache Portable Runtime Supports Linux, FreeBSD, Solaris and more L Steffenel 2008 84

Gmond Metric Gathering Agent Automatic discovery of nodes Adding a node does not require configuration file changes Each node is configured independently Each node has the ability to listen to and/or talk on the multicast channel Can be configured for unicast connections if desired Heartbeat metric determines the up/down status Thread pools Collection threads Capable of running specialized functions for gathering metric data Multicast listeners Listen for metric data from other nodes in the same cluster Data export listeners Listen for client requests for cluster metric data L Steffenel 2008 85

Gmond Metric Collection Groups Specify as many collection groups as you like Each collection group must contain at least one metric section List available metrics by invoking gmond -m Collection_group section: collect_once Specifies that the group of static metrics collect_every Collection interval (only valid for non-static) time_threshold Max data send interval Metric section: Name Metric name (see gmond m ) Value_threshold Metric variance threshold (send if exceeded) L Steffenel 2008 86

Gmetad Metric Aggregation Agent Polls a designated cluster node for the status of the entire cluster Data collection thread per cluster Ability to poll gmond or another gmetad for metric data Failover capability RRDTool Storage and trend graphing tool Defines fixed size databases that hold data of various granularity Capable of rendering trending graphs from the smallest granularity to the largest (eg. Last hour vs last year) Never grows larger than the predetermined fixed size Database granularity is configurable through gmetad.conf L Steffenel 2008 87

Gmetad Configuration Data source and and failover designations data_source "my cluster" [polling interval] address1:port addreses2:port... RRD database storage definition RRAs "RRA:AVERAGE:0.5:1:244" "RRA:AVERAGE:0.5:24:244" "RRA:AVERAGE:0.5:168:244" "RRA:AVERAGE:0.5:672:244" "RRA:AVERAGE:0.5:5760:374" Access control trusted_hosts address1 address2 DN1 DN2 all_trusted OFF/on RRD files location rrd_rootdir "/var/lib/ganglia/rrds" Network xml_port 8651 interactive_port 8652 L Steffenel 2008 88

Gmetad Configuration Example data_source "my cluster" 10 localhost my.machine.edu:8649 1.2.3.5:8655 data_source "my grid" 50 1.3.4.7:8655 grid.org:8651 grid-backup.org:8651 data_source "another source" 1.3.4.7:8655 1.3.4.8 trusted_hosts 127.0.0.1 169.229.50.165 my.gmetad.org xml_port 8651 interactive_port 8652 rrd_rootdir "/var/lib/ganglia/rrds" L Steffenel 2008 89