How To Monitor Your Computer With Nagiostee.Org (Nagios)

Similar documents

Introduction to system monitoring with Nagios, Check_MK and Open Monitoring Distribution (OMD)

MONITORING EMC GREENPLUM DCA WITH NAGIOS

AFS File Servers Dos and Don'ts

Nagios. cooler than it looks. Wednesday, 31 October 2007

A recipe using an Open Source monitoring tool for performance monitoring of a SaaS application.

Availability Management Nagios overview. TEIN2 training Bangkok September 2005

Nagios Web Service Checker

Monitoring of computer networks and applications using Nagios

An Introduction to Monitoring with Nagios

Network Monitoring with Nagios. Matt Gracie, Information Security Administrator Canisius College, Buffalo, NY

EMC Virtual Infrastructure for SAP Enabled by EMC Symmetrix with Auto-provisioning Groups, Symmetrix Management Console, and VMware vcenter Converter

Maintaining Non-Stop Services with Multi Layer Monitoring

NRPE Documentation CONTENTS. 1. Introduction... a) Purpose... b) Design Overview Example Uses... a) Direct Checks... b) Indirect Checks...

Online Help StruxureWare Data Center Expert

Automated System Monitoring

OMNITURE MONITORING. Ensuring the Security and Availability of Customer Data. June 16, 2008 Version 2.0

AGENDA: INTRODUCTION: 1. How is our cloud monitoring setup? 2. Which are the tools used? 3. How do we access monitoring dashboard?

Advanced System Monitoring

alcatel-lucent vitalqip Appliance manager End-to-end, feature-rich, appliance-based DNS/DHCP and IP address management

Monitoring Windows Servers and Applications with GroundWork Monitor Enterprise 6.7. Product Application Guide October 8, 2012

Evaluation of standard monitoring tools(including log analysis) for control systems at Cern

Database Services for CERN

Tools and strategies to monitor the ATLAS online computing farm

Notes on network monitoring, by Oliver Gorwits

Preparing an IIS Server for EmpowerID installation

WhatsUp Gold v11 Features Overview

Amahi Instruction Manual

Supermicro Server Monitoring with SuperDoctor 5 and Nagios Using SNMP Protocol. Version 1.1b

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

A Brief. Introduction. of MG-SOFT s SNMP Network Management Products. Document Version 1.3, published in June, 2008

VMware vsphere: [V5.5] Admin Training

Nagios and Cloud Computing

Uptime Infrastructure Monitor. Installation Guide

MRTG used for Basic Server Monitoring

N /150/151/160 RAID Controller. N MegaRAID CacheCade. Feature Overview

How To Set Up Foglight Nms For A Proof Of Concept

How to Restore a Windows System to Bare Metal

StruxureWare TM Center Expert. Data

Controls System UNIX Computing Environment And Data Management Facilities. Solutions Production data staging and distribution Local NFS system

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Enterprise Edition. Hardware Requirements

Backup Exec Infrastructure Manager 12.5 FAQ

VMware vsphere: Install, Configure, Manage [V5.0]

A SURVEY ON AUTOMATED SERVER MONITORING

Features Overview Guide About new features in WhatsUp Gold v14

Hardware Recommendations

Network Management and Monitoring Software

UN 4013 V - Virtual Tape Libraries solutions update...

White Paper. Recording Server Virtualization

Grids & networks monitoring - practical approach

PANDORA FMS NETWORK DEVICES MONITORING

StruxureWare Data Center Expert Release Notes

System Monitoring Using NAGIOS, Cacti, and Prism.

Nagios in High Availability Environments

How To Fix A Fault Notification On A Network Security Platform (Xc) (Xcus) (Network) (Networks) (Manual) (Manager) (Powerpoint) (Cisco) (Permanent

TSM Studio Server User Guide

Configuring VMware vsphere 5.1 with Oracle ZFS Storage Appliance and Oracle Fabric Interconnect

JOB ORIENTED VMWARE TRAINING INSTITUTE IN CHENNAI

Virtualizing Microsoft SQL Server 2008 on the Hitachi Adaptable Modular Storage 2000 Family Using Microsoft Hyper-V

How To Use 1Bay 1Bay From Awn.Net On A Pc Or Mac Or Ipad (For Pc Or Ipa) With A Network Box (For Mac) With An Ipad Or Ipod (For Ipad) With The

PANDORA FMS NETWORK DEVICE MONITORING

Microsoft Operations Manager 2005

Fluke Networks NetFlow Tracker

WhatsUp Gold v11 Features Overview

PARALLELS SERVER BARE METAL 5.0 README

Network Monitoring. Dhruba Raj Bhandari (CCNA) Manager Systems Soaltee Crowne Plaza Kathmandu NEPAL

Heroix Longitude Quick Start Guide V7.1

NEXTGEN v5.8 HARDWARE VERIFICATION GUIDE CLIENT HOSTED OR THIRD PARTY SERVERS

SUN ORACLE EXADATA STORAGE SERVER

Capacity Planning for Microsoft SharePoint Technologies

1. Product Information

PARALLELS SERVER 4 BARE METAL README

Sage CRM Technical Specification

Online Backup Client User Manual Linux

Implementing an Automated Digital Video Archive Based on the Video Edition of XenData Software

Install Cacti Network Monitoring Tool on CentOS 6.4 / RHEL 6.4 / Scientific Linux 6.4

Parallels Cloud Server 6.0

May PZ-0502A-WWEN Prepared by: Internet & E-Commerce Solutions

Terminal Server Software and Hardware Requirements. Terminal Server. Software and Hardware Requirements. Datacolor Match Pigment Datacolor Tools

TheraDoc v4.6.1 Hardware and Software Requirements

Managing your Red Hat Enterprise Linux guests with RHN Satellite

Sage CRM Technical Specification

Aras Innovator 11. Platform Specifications

EMC Backup and Recovery for Microsoft SQL Server

RecoveryVault Express Client User Manual

Architecting ColdFusion For Scalability And High Availability. Ryan Stewart Platform Evangelist

Rally Installation Guide

Nagios. Contents. From SME Server. Maintainer

USING OPEN SOURCE SOFTWARE IN DAILY ISP OPERATIONS

UNISOL SysAdmin. SysAdmin helps systems administrators manage their UNIX systems and networks more effectively.

IP Video Management Solutions

Monitoring the Oracle VM Server

Sage Grant Management System Requirements

Transcription:

Host and Service Monitoring at SLAC Alf Wachsmann Stanford Linear Accelerator Center alfw@slac.stanford.edu DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 1

Monitoring at SLAC: Does not really exist ranger formerly known as patrol modified at DESY: scout History local checks and fixes on Unix machines emails to users or administrators about problems BaBar started some monitoring using Ganglia (pretty graphs but no alarming) Monitoring at Fermi Lab: NGOP Monitoring at CERN: LHC Era Monitoring (Lemon) DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 2

Why: Monitoring Health status at current time Alarming in case of problems (ideally: fix the problem) Long term trend analysis What: Systems being alive and healthy Services are running and functional Service level agreements are met How: Run a probe against or on a device Gather data in central place Display data and allow data mining DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 3

What to Monitor Networked devices: network gear, printers, computers, appliances, UPS, thermal elements Services provided by those devices: OS, disk/memory space, CPU NFS, AFS, printing, DHCP, electrical power, temperature reading Service levels provided by those services: 50% free /tmp space, 90% batch computing, 95% AFS capacity, temperature 25 C DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 4

Nagios Open-source framework for Monitoring http://www.nagios.org/ Scheduler for data collection; alarming engine Not part of Nagios core: plug-ins for measurements ( probes ) plug-ins for data storage and graphing Extensive collection of both as contributed source DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 5

Nagios' Architecture... DESY Ethan Zeuthen, Galstad May 17, 2005 Monitoring at SLAC Alf Wachsmann 6

...Nagios' Architecture Performance Data Processing Web Front End Web Front End MySQL rrd rrd rrd PerfParse nagiostee.sh NagiosGraph Nagios Process (Core Logic) Monitoring Host DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 7

Alarming: Built into Nagios Visual Email Pager Execute external program Internal Event Handler DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 8

Data Storage... Handler for Performance Data from Services #!/bin/sh Nagios can have only one I wrote my own that tees data to multiple ones: nagiostee.sh # add these lines to your Nagios checkcommands.cfg file: # define command { # command_name process-service-perfdata # command_line /path/to/nagiostee.sh "$HOSTNAME$" "$HOSTALIAS$" "$HOSTADDRESS$" "$HO # } HOSTNAME=$1 HOSTALIAS=$2 HOSTADDRESS=$3 # etc. etc. # Now call the real perfdata handlers: # Nagiosgraph: /opt/nagios/nagiosgraph/insert.pl "$LASTCHECK\\t$HOSTNAME\\t$SERVICEDESC\\t$OUTPUT\\t$P # PerfParse: $USER2/bin/perfparse_nagios_pipe_command.pl $USER2/var/perfparse.pipe "$TIMET" "$HOSTNA DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 9

...Data Storage... NagiosGraph one rrd file for each service check on each host older data gets coarser DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 10

...Data Storage PerfParse MySQL database full resolution for entire data collection basic data mining web interface easy to write scripts with SQL queries for more No problem Reboot! DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 11

Local Data Measurements Nagios master pulls (requests) data Most probes work locally on a machine Each probe decides whether measurement is OK, WARNING, CRITICAL, UNKNOWN Each probe can (but does not have to) return performance data : %./check_users -w 5 -c 10 USERS OK - 3 users currently logged in users=3;5;10;0 DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 12

Remote Data Measurements Several mechanisms to remotely execute a probe: Nagios Remote Plugin Executor (NRPE) at SLAC NRPE executes probe on a remote machine on behalf of Nagios master: %./check_nrpe -H noric10 -c check_users USERS OK - 37 users currently logged in users=37;110;150;0 Also works with Windows (haven't tried Mac yet) Can run over SSL Can accept parameters from Nagios master (insecure!) DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 13

Passive Data Measurements External programs can write measurements into file which Nagios checks regularly ( push ) Client-Server facility for remotely pushed data (Nagios Service Check Acceptor; NSCA) Enables hierarchical monitoring DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 14

Own Data Measurements Probes for Unix, Windows, Mac OS X, SNMP, HP's JetDirect etc. already exist Plug-in repository: http://www.nagiosexchange.org/ Simple to write your own: AFS::Monitor Perl module check_afscm.pl, check_ubik.pl, check_afsfs.pl Thermal sensors connected via terminal server check_thermal.pl More general SNMP plug-in Put OID and name in configuration file Net::SNMP to do the query Output in Nagios format (incl. performance data) Will probably have to write my own Windows probes DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 15

Own Web Front-End BaBar uses Ganglia to monitor some of their computers and services Effort independent from SCS Presentation of same data seems more pleasing than Nagios' Project with 2 Stanford graduate students: Nagios-to-Ganglia data converter which allows Ganglia web front-end to display Nagios data It is in production now Only some minor things are not working DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 16

Exporting to the Grid SLAC participates in the Open Science Grid (OSG) Web services interface to Nagios exists: NagiosWS No web services interface to PerfParse MySQL DB Summer student will tackle this DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 17

Problems SLAC or Nagios have some problems Lot of probes are not usable in (large scale) production Works on Windows but need to write our own probes Like to have action that simply reboots a machine DB support for configuration was ripped out in version 2.x Web interface in version 1.x does not scale very well Much better in version 2.x (still beta) Still browser rendering performance problems DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 18

Scalability Nagios master machine at SLAC: Sun V20z: dual 1.8GHz Opteron, 2GB main memory Sun StorEdge 6120 Array T4 : hardware RAID 5, 2Gb/s FC, ~1.6TB usable disk space Qlogic QLA2312 Fibre Channel Adapter 64bit Fedora Core 3 XFS for MySQL DB and RRDtool files partitions Currently 407 servers with 1096 services configured Test with all ~2000 SLAC batch machines and 4 services on each showed no problems on master DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 19

Nagios: http://www.nagios.org/ References NRPE and other plug-ins (click on Categories tab) http://www.nagiosexchange.org/ NagiosGraph http://nagiosgraph.sourceforge.net/ PerfParse http://perfparse.sourceforge.net/ NagiosWS http://www.i-xs.de/nagiosws/ DESY Zeuthen, May 17, 2005 Monitoring at SLAC Alf Wachsmann 20