Automated System Monitoring



Similar documents
A SURVEY ON AUTOMATED SERVER MONITORING

Advanced System Monitoring

AGENDA: INTRODUCTION: 1. How is our cloud monitoring setup? 2. Which are the tools used? 3. How do we access monitoring dashboard?

Network Monitoring. Lance Rea. Davis & Gilbert LLP lrea@dglaw.com

How To Monitor Your Computer With Nagiostee.Org (Nagios)

Introduction to system monitoring with Nagios, Check_MK and Open Monitoring Distribution (OMD)

MALAYSIAN PUBLIC SECTOR OPEN SOURCE SOFTWARE (OSS) PROGRAMME. COMPARISON REPORT ON NETWORK MONITORING SYSTEMS (Nagios and Zabbix)

Availability Management Nagios overview. TEIN2 training Bangkok September 2005

OMNITURE MONITORING. Ensuring the Security and Availability of Customer Data. June 16, 2008 Version 2.0

Tk20 Network Infrastructure

OS Installation Guide Red Hat Linux 9.0

IceWarp to IceWarp Server Migration

PANDORA FMS NETWORK DEVICE MONITORING

PANDORA FMS NETWORK DEVICES MONITORING

WhatsUp Gold v11 Features Overview

MONITORING EMC GREENPLUM DCA WITH NAGIOS

Nagios and Cloud Computing

Online Help StruxureWare Data Center Expert

Features Overview Guide About new features in WhatsUp Gold v14

Monitoring Windows Servers and Applications with GroundWork Monitor Enterprise 6.7. Product Application Guide October 8, 2012

Monitor all of your critical infrastructure from a single, integrated system.

Monitoring of computer networks and applications using Nagios

securityprobe 5E Standard

NetBotz Integration Package for WhatsUp Professional 2006

StruxureWare TM Center Expert. Data

How To Get Started With Whatsup Gold

Getting Started with PRTG Network Monitor 2012 Paessler AG

Technical Overview CM-16 Climate Monitor. Get yours direct at:

SOA Software API Gateway Appliance 7.1.x Administration Guide

This document is intended to make you familiar with the ServersCheck Monitoring Appliance

Remote Monitoring Unit SC8100. Monitoring Unit SC8100

Rally Installation Guide

Best of Breed of an ITIL based IT Monitoring. The System Management strategy of NetEye

Can You Afford Network Downtime?

Nagios. cooler than it looks. Wednesday, 31 October 2007

VMware vcenter Log Insight Getting Started Guide

PARALLELS SERVER BARE METAL 5.0 README

ENC Enterprise Network Center. Intuitive, Real-time Monitoring and Management of Distributed Devices. Benefits. Access anytime, anywhere

Deployment and Monitoring. Pascal Robert MacTI

StruxureWare TM Data Center Expert

SCF/FEF Evaluation of Nagios and Zabbix Monitoring Systems. Ed Simmonds and Jason Harrington 7/20/2009

Version 3.8. Installation Guide

DEPLOYMENT GUIDE Version 1.0. Deploying the BIG-IP LTM with the Nagios Open Source Network Monitoring System

VMware vcenter Log Insight Getting Started Guide

WhatsUp Gold v11 Features Overview

MONITORING RED HAT GLUSTER SERVER DEPLOYMENTS With the Nagios IT infrastructure monitoring tool

Application Discovery Manager User s Guide vcenter Application Discovery Manager 6.2.1

Panorama PANORAMA. Panorama provides centralized policy and device management over a network of Palo Alto Networks next-generation firewalls.

Amahi Instruction Manual

SapphireIMS Business Service Monitoring Feature Specification

DS SERIES SOLUTIONS ALL AT ONCE

Application Performance Monitoring for WhatsUp Gold v16.1 User Guide

Maintaining Non-Stop Services with Multi Layer Monitoring

AVG 8.5 Anti-Virus Network Edition

The new services in nagios: network bandwidth utility, notification and sms alert in improving the network performance

Compatible with all of the Ravica range of Intelligent Sensors it provides a complete environmental, access control and security monitoring solution.

Symantec Database Security and Audit 3100 Series Appliance. Getting Started Guide

Web Application Firewall

Sensor Monitoring and Remote Technologies 9 Voyager St, Linbro Park, Johannesburg Tel: ;

GroundWork Monitor Open Source Readme

FileMaker Server 15. Getting Started Guide

ENTERPRISE-CLASS MONITORING SOLUTION FOR EVERYONE ALL-IN-ONE OPEN-SOURCE DISTRIBUTED MONITORING

ISPadmin. by Robert Haskins SYSADMIN. Robert D. Haskins is currently employed by Renesys Corporation in Hanover, NH.

3M Command Center. Installation and Upgrade Guide

F-Secure Messaging Security Gateway. Deployment Guide

Salesforce Integration

SevOne NMS Download Installation and Implementation Guide

WhatsUp Gold 2016 Getting Started Guide

A Scalable Network Monitoring System as a Public Service on Cloud

ReadyNAS Setup Manual

NetIQ Sentinel Quick Start Guide

Dell Active Administrator 8.0

Technical Overview CM-2 Climate Monitor. Get yours direct at:

Quick Start Guide. for Installing vnios Software on. VMware Platforms

PRTG NETWORK MONITOR. Installed in Seconds. Configured in Minutes. Masters Your Network for Years to Come.

CSS ONEVIEW G-Cloud CA Nimsoft Monitoring

Deploying the BIG-IP LTM with the Cacti Open Source Network Monitoring System

User Manual. Onsight Management Suite Version 5.1. Another Innovation by Librestream

Strengths and Limitations of Nagios as a Network Monitoring Solution

SGI NAS. Quick Start Guide a

An Introduction to Monitoring with Nagios

Ganglia & Nagios. Maciej Lasyk 11. Sesja Linuksowa Wrocław, /25. Maciej Lasyk, Ganglia & Nagios

Enterprise Manager. Version 6.2. Installation Guide

Vistara Lifecycle Management

Evaluation of standard monitoring tools(including log analysis) for control systems at Cern

Heroix Longitude Quick Start Guide V7.1

LOCKSS on LINUX. Installation Manual and the OpenBSD Transition 02/17/2011

Dell OpenManage Network Manager Version 6.0. Quickstart Guide

11 Everyday Network Issues PRTG Network Monitor Helps Resolve

Dell SupportAssist Version 2.0 for Dell OpenManage Essentials Quick Start Guide

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Using New Relic to Monitor Your Servers

Firewall VPN Router. Quick Installation Guide M73-APO09-380

HP Operations Smart Plug-in for Virtualization Infrastructure

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

Enterprise Application Monitoring with

PANDORA FMS OFFICIAL TRAINING

PRTG NETWORK MONITOR. Installed in Seconds. Configured in Minutes. Masters Your Network for Years to Come.

Installing, Uninstalling, and Upgrading Service Monitor

Transcription:

Automated System Monitoring Josh Malone jmalone@nrao.edu Systems Administrator National Radio Astronomy Observatory Charlottesville, VA https://blogs.nrao.edu/jmalone 2 One night, about 8 or 9 years ago, the chiller in our DC failed. Co-worker arrive in the morning to find room was 90F ambient. Quickly set up fans to vent the room. Checked servers - found that main web server had lost both disks in its OS RAID mirror. (15k disks, ran hot) Main page was being served from memory, but the OS was freaking out. We had minimal monitoring scripts. No environment monitoring. No disk health checks. Failure caught us completely by surprise. We decided that we weren t going to let this happen ever again. Over the next year or so we implemented 2 independent monitoring systems - one for servers/ services and one for environmentals. Set up each system to also monitor the other.

WHAT IS AUTOMATED MONITORING? 7 Some sort of dedicated, automatic instrumentation to check services and/or servers Detect and report service problems, server hardware issues Usually provides a central dashboard to track problems Can be distributed; but still under control of a central daemon * Differentiates it from a bunch of scripts used to check on things; that doesn t have the ability to determine cause or eliminate false alarms.

Automated Monitoring Workflow 8 Most packages implement this type of workflow Not all packages provide event handlers ack ing page is important - let s other admins know that someone is working on the problem so they don t step on each other s toes

Monitoring Packages: Open Source Pandora FMS Opsview Core Naemon Captialware ServerStatus Core Sensu All Trademarks and Logos are property of their respective trademark or copyright holders and are used by permission or fair use for education. Neither the presenter nor the conference organizers are affiliated in any way with any companies mentioned here. 9 Service monitoring is a very crowded space

Monitoring Packages: Commercial Nagios XI Groundwork Sensaphone (IMS 4000) Statseeker PRTG network monitor CopperEgg WhatsUp Gold PRTG network monitor op5 (Naemon) All Trademarks and Logos are property of their respective trademark or copyright holders and are used by permission or fair use for education. Neither the presenter nor the conference organizers are affiliated in any way with any companies mentioned here. 10 Your ideal monitoring solution may consist of multiple monitoring platforms. I mentioned in beginning that we set up 2 parallel monitors: NRAO uses a combination of * network monitoring - StatSeeker * server / service monitoring - Nagios * environment monitoring - IMS4000 & Nagios

What can monitoring do for you? Spot small problems before they become big ones Checklist when restoring from a power outage Learn about outages before your users do Gives you better problem reports than users Problems you might never spot otherwise Failed HDDs in RAIDs Full /var partitions Logs not rotating System temperature rising 11 Monitoring gives you warnings: things are still *working* but they re gonna break soon unless u fix it

Without Monitoring With Monitoring dhcp out of leases The Internet s down - fix it!!! dhcp server down dns server not responding ethernet switch down ISP link down / saturated 12 Takes a typical problem report like the internet s down! Proper monitoring knows the difference between these possible causes. Can easily narrow the scope of the problem

Without Monitoring ZOMG! Our web site is down! O Noes!!! With Monitoring connectivity issues web server down apache not running web server disk full server load too high 13 With some thing like the infamous Oh No - our website is down! Again, a monitor can often pinpoint the root cause of the problem.

What can monitoring do for you? Capacity planning Performance data can generate graphs of utilization RAM, Disk, etc. Availability reports - CAUTION Easy to generate -- even easier to generate wrong Make sure your configurations actually catch problems Will also include problems with Nagios itself :( If you re going to quote your availability numbers (SLAs, etc.) make sure you understand what you re actually monitoring. 14 Beyond just spotting problems, monitoring can be good for capacity planning. In Nagios, graphing requires add-on (PNP4Nagios); many other packages include it in the base install Nagios builds a wealth of historical data about your services. PNP graphs that data so you can visualize it. availability: make sure Nagios is being honest; audit your plugins

ENVIRONMENT MONITORING 15 Before we get to host and service monitoring, take a quick look at options for environment monitoring.

Temperature Smoke Water Humidity Motion Door / closure Mains power Environment Monitoring 16 What do we mean by environment monitoring? Any of these, plus perhaps many more. Basically - anything about your servers or server room other than the services.

Environment Monitoring Sensaphone IMS-4000 Connect sensors to measure desired metrics IP-based Nodes can connect remote sensors Wireless sensors available Notification via POTS line and voice dialer as well as email SNMP support Use my plugin w/ Nagios! 17 IMS-4000 is a standalone env monitoring solution. In order to centralize monitoring and track long-term temp data I developed a plugin for Nagios; Nagios can pull status and perfdata from IMS. https://github.com/48kram/nagios-plugins/tree/master/ims4000

Environment Monitoring ServersCheck Temp, Humidity Wireless (2.4GHz) NetBotz Temp, humidity, smoke, water, vibration, doors, cameras 18 NetBotz 200 ~$450; plus cost of sensors (Temp ~$100) Plenty of plugins for Nagios NetBotz integration

NAGIOS 19

Nagios Open source host / service monitoring package Nagios Ain't Gonna Insist On Sainthood Originally released in 1999 as NetSaint Available in 2 versions: Core and XI Nagios Core: Open-source, freely available Nagios XI: Commercial Free license for up to 7 hosts Available as source installer or VMware appliance 20 XI available with support contracts if your company likes having those :) Easy to install no excuses not to be running good monitoring software

Nagios Architecture 21 Nagios gets its super powers from it s plugins. 3rd-party Add-ons: NConf is a GUI configurator for Nagios

What s a plugin? Plugins actually run the service or host checks. Each plugin monitors a different type of service Data from plugin is communicated to Nagios using a (very) simple API Plugins can also report Performance Data (perfdata) to be graphed or tracked Requires a perfdata add-on (or Nagios XI) Plugins can be written in any language Perl plugins can run using Nagios s embedded perl interpreter for increased performance 22

Where to Monitor a Service? Host ping TCP port 443 SSL handshake HTTP return code Page load time Page content Is server host alive? Is Apache listening? Is SSL functional? Is the page found? Does page load quickly? Is it the right page? 23 when you re setting up a service monitor, consider how you really want to monitor it. If I m monitoring a web server; here s 6 different places I could potentially monitor. * Is server listening on a TCP port? - bind probs, web server configured for wrong IP * Is SSL working? - expired SSL certificate High load time? - server overloaded, DoS attack.. HTTP 200: OK just means it found the page you wanted. But is that the Welcome to Apache default page? (pkg update might overwrite your config file) One service might require multiple checks to monitor everything you care about. You might also use a local agent to check the web server process itself (number of children, memory usage, etc.) Each point tells you different things about that service can answer a different question. Consider what you want to know about a service.

Custom Plugins Nagios can monitor anything you can write a script to check Simple API: just write text to stdout and exit with a value You can write plugins in ANY language you choose! bash, python, tcl, expect perl (Nagios has embedded perl interpreter for speed) C, C++ Huge collection of plugins available at: http://exchange.nagios.org https://www.monitoringexchange.org Be wary of some community plug-ins! Test first!!! 24 Plugins are the lifeblood of a Nagios system. Nagios is literally useless without them. That script you have to check X turn it into a Nagios plugin. Some plugins even contributed by companies like Dell Don t be afraid to inspect the code (you might be afraid of what you find, though)

Performance Data Metrics about the state of the service Can be used to generate graphs showing trends, etc. Performance data processing requires some external add-on like PNP4Nagios 25 Example of a perfdata graph in PNP4Nagios

My Plugins Framework https://github.com/48kram/nagios-plugins Perl Net::SNMP Plugin for APC Smart-UPS, 26 Constantly evolving. Check regularly

Agent-less vs Agent-full Checks Agent-less No agent installed on the monitored host All check plugins run on the monitoring server Service to be monitored must be network-accessible Default mode of Nagios Agent-full Must install agent on server to be monitored Check logic runs on monitored host Can access services nonnetwork services SNMP can be a powerful agent for checks Server-specific agents 27 These plugins implement 2 basic types of checks Agents: NRPE (remote plugin executor), NSClient++ (Windows system monitor agent) Many, if not most, devices and operating systems provide an SNMP agent. * printers Dell OMSA - agent for Dell server health info

USING NAGIOS 28

About Nagios Replacements When Nagios went commercial, the open-source community decided that it needed not one, not two, but three replacements for Nagios: Icinga and Naemon (forks of Nagios) and Shinken (a drop-in replacement). Most linux distros are now shipping one or more of these compatible replacements rather than the official Nagios Core. Not a single distro I checked is shipping Nagios 4. Either Shinken, Naemon or Icinga should work the same as the material covered here, but I have only briefly tested Icinga and have not tested Shinken or Naemon at all. 29

Navbar Overview Main window 30 Nagios 4.x interface

The Tactical Overview Displays overview of monitored services and hosts Shows if Any services / hosts have notifications disabled Any services / hosts are flapping Active / passive checks enabled / disabled Warning / Critical / Okay breakdown 31

The Tactical Overview 32 (We don t use passive checks - that s why they are disabled) Useful when lots of people are using Nagios and it s easy to forget you disabled something.

Host summary Services View Service summary 33 Shows 2 service warnings and 3 critical services

Click on Services - Critical 34 Shows very full lustre filesystem and a DB server with HW and SW issues HW problem report coming from Dell OMSA agent DB replication report from agent-less MySQL check

Host and Service Groups Organize services or hosts into groups by function, etc. Can disable alerts, schedule downtime, etc. on whole group Can show availability report for a whole group Group services by desired reporting capability Groups get a unique URL so you can send a single link to check on a group of hosts Great for PHBs! Also great for delegated IT departments 35

Service Groups 36

Acknowledging an Outage Click on service name (or hostname) that has the problem Under Service Commands Click Acknowledge this service problem You must enter a comment about why you are acknowledging the problem (i.e., Bob is working on it ) Click Commit 37

Acknowledging an Outage Click Here 38 Here I m going to acknowledge an SSL cert about to expire.

Acknowledging an Outage 39 Note that I m waiting to find out if it needs to be renewed.

Configure a contact to use an email-to-sms gateway Some carriers require an MMS gateway to process the From address SMS pages 40 This is an example of an SMS page from my monitoring system that I received this morning. Our backup generator is running its monthly exercise / self-test

Add-ons to Consider PNP4Nagios - Performance data graphing NConf - Web-based configurator for Hosts, Services, etc. NagiosQL - Web-based admin tool for Nagios NDOUtils - Export data from Nagios to MySQL 41

THANK YOU! Previous talks available at: https://blogs.nrao.edu/jmalone/talks/ 42