Supermicro Server Monitoring with SuperDoctor 5 and Nagios Using SNMP Protocol. Version 1.1b



Similar documents
Supermicro Server Monitoring with SuperDoctor 5 and SCOM 2012

SuperDoctor 5 User's Guide. Version 1.2

SuperDoctor 5. User's Guide. Version 1.2e

Intel PCH RAID Configuration Utility

UEFI BIOS Recovery Instructions

Web-based Management Utility

Supermicro Server Management Utilities

System Management Software Suite

Supermicro Server Management Utilities

JAMF Software Server Installation Guide for Linux. Version 8.6

AOC-TBT-DSL5320. User Guide 1.0

LSI 2308 SAS Configuration Utility

NRPE Documentation CONTENTS. 1. Introduction... a) Purpose... b) Design Overview Example Uses... a) Direct Checks... b) Indirect Checks...

JAMF Software Server Installation Guide for Windows. Version 8.6

AOM-TPM-9655V AOM-TPM-9655H


TECHNICAL CONDITIONS REGARDING ACCESS TO VP.ONLINE. User guide. vp.online

Web-based Management Utility

DEPLOYMENT GUIDE Version 1.0. Deploying the BIG-IP LTM with the Nagios Open Source Network Monitoring System

Enterprise Manager. Version 6.2. Installation Guide

XenClient Enterprise Synchronizer Installation Guide

LSI 2108/2208 SAS MegaRAID Configuration Utility

McAfee Firewall Enterprise

Installing and Configuring vcenter Multi-Hypervisor Manager

MSM Software Feature Difference Between the MR Controller and the IR Controller

Installing and Configuring vcenter Support Assistant

MONITORING EMC GREENPLUM DCA WITH NAGIOS

Deploying the BIG-IP LTM system and Microsoft Windows Server 2003 Terminal Services

HP LeftHand SAN Solutions

TANDBERG MANAGEMENT SUITE 10.0

OnCommand Performance Manager 1.1

McAfee Asset Manager Console

User Manual. Onsight Management Suite Version 5.1. Another Innovation by Librestream

Deploying EMC Documentum WDK Applications with IBM WebSEAL as a Reverse Proxy

Monitoring Software Services registered with science.canarie.ca

Gigabyte Management Console User s Guide (For ASPEED AST 2400 Chipset)


How To Upgrade A Websense Log Server On A Windows 7.6 On A Powerbook (Windows) On A Thumbdrive Or Ipad (Windows 7.5) On An Ubuntu (Windows 8) Or Windows

Configuring and Monitoring Hitachi SAN Servers

Synchronizer Installation

SyAM Software, Inc. Server Monitor Desktop Monitor Notebook Monitor V3.2 Local System Management Software User Manual

There are numerous ways to access monitors:

AlienVault Unified Security Management (USM) 4.x-5.x. Deploying HIDS Agents to Linux Hosts

Sharp Remote Device Manager (SRDM) Server Software Setup Guide

LOCKSS on LINUX. CentOS6 Installation Manual 08/22/2013

Installation & Configuration Guide

Deploying the BIG-IP LTM with the Cacti Open Source Network Monitoring System

WhatsUp Gold v16.3 Installation and Configuration Guide

13.1 Backup virtual machines running on VMware ESXi / ESX Server

F-Secure Messaging Security Gateway. Deployment Guide

NSi Mobile Installation Guide. Version 6.2

Installing, Uninstalling, and Upgrading Service Monitor

Getting Started. Symantec Client Security. About Symantec Client Security. How to get started

IUCLID 5 Guidance and support. Installation Guide Distributed Version. Linux - Apache Tomcat - PostgreSQL

Novell Access Manager

Dell Server Management Pack Suite Version 6.0 for Microsoft System Center Operations Manager User's Guide

Pcounter Web Report 3.x Installation Guide - v Pcounter Web Report Installation Guide Version 3.4

Reboot the ExtraHop System and Test Hardware with the Rescue USB Flash Drive

Configuration Manager Error Messages

SECURITY DOCUMENT. BetterTranslationTechnology

RSM Web Gateway RSM Web Client INSTALLATION AND ADMINISTRATION GUIDE

Rev 7 06-OCT Site Manager Installation Guide

Installing and Configuring the Intel Server Manager 8 SNMP Subagents. Intel Server Manager 8.40

AOC-SAT2-MV8 USER'S GUIDE

VMware Horizon FLEX User Guide

PowerChute TM Network Shutdown Security Features & Deployment

How To Install Powerpoint 6 On A Windows Server With A Powerpoint 2.5 (Powerpoint) And Powerpoint On A Microsoft Powerpoint 4.5 Powerpoint (Powerpoints) And A Powerpoints 2

SNMP Adapter Installation and Configuration Guide

Local Caching Servers (LCS): User Manual

FileMaker Server 15. Getting Started Guide

Freshservice Discovery Probe User Guide

Compiere 3.2 Installation Instructions Windows System - Oracle Database

Studio 5.0 User s Guide

Semantic based Web Application Firewall (SWAF - V 1.6)

Witango Application Server 6. Installation Guide for OS X

Microsoft Hyper-V Server 2008 R2 Getting Started Guide

Compiere ERP & CRM Installation Instructions Linux System - EnterpriseDB

Rebasoft Auditor Quick Start Guide

Interworks. Interworks Cloud Platform Installation Guide

Predictive Analytics Client

VMware vcenter Log Insight Getting Started Guide

Setting Up SSL on IIS6 for MEGA Advisor

NEC Express5800 Series NEC ESMPRO AlertManager User's Guide

Setting Up a Unisphere Management Station for the VNX Series P/N Revision A01 January 5, 2010

SyncThru TM Web Admin Service Administrator Manual

WhatsUp Gold v16.1 Installation and Configuration Guide

SQL Server 2008 R2 Express Installation for Windows 7 Professional, Vista Business Edition and XP Professional.

Management, Logging and Troubleshooting

ez Agent Administrator s Guide

Symantec Protection Engine for Cloud Services 7.0 Release Notes

RealPresence Platform Director

Up to 4 PCI-E SSDs Four or Two Hot-Pluggable Nodes in 2U

12Gb/s MegaRAID SAS Software

WhatsUp Gold v16.2 Installation and Configuration Guide

UBS KeyLink Quick reference WEB Installation Guide

Sample Configuration: Cisco UCS, LDAP and Active Directory

FileMaker Server 13. Getting Started Guide

CA Spectrum and CA Service Desk

Cisco SSL Encryption Utility

Smart Cloud Integration Pack. For System Center Operation Manager. v User's Guide

Transcription:

Supermicro Server Monitoring with SuperDoctor 5 and Nagios Using SNMP Protocol Version 1.1b

Supermicro Server Monitoring with SuperDoctor 5 and Nagios Using SNMP Protocol Release: v 1.1b Document release date: 11/15/2013 Copyright 2013 Super Micro Computer, Inc. All Rights Reserved. Legal Notices This software and documentation is the property of Super Micro Computer, Inc., and supplied only under a license. Any use or reproduction of this software is not allowed, except as expressly permitted by the terms of said license. Information in this document is subject to change without notice. Trademark Notice All trademarks and copyrights referred to are the property of their respective owners.

Supermicro Server Monitoring with SuperDoctor 5 and Nagios Using SNMP Protocol Revision History Date Rev Description Jul-4-2011 1.0 1. Initial Document. Sep-20-2012 1.1 1. Reorganize MIB structures. Jul-12-2013 1.1a 1. Change product name to SuperDoctor 5 (SD5). Nov-15-2013 1.1b 1. Changed default install folder of SD5. ii

Supermicro Server Monitoring with SuperDoctor 5 and Nagios Using SNMP Protocol Contents 1. Introduction... 4 2. Prerequisites... 5 2.1 Installing Java Runtime Environment (JRE)... 5 2.2 Installing the check_snmp_health Plug-in... 5 2.3 SuperDoctor 5 (SD5)... 5 2.4 Seting Up SNMP Service in Linux... 5 2.5 Installing Smartctl Utility... 6 3. Getting Started... 7 3.1 Defining the Hosts... 7 3.2 Defining a Command... 7 3.3 Defining the Services... 8 3.4 Validating the Nagios Configurations... 9 3.5 Restarting Nagios Service... 10 3.6 Connecting to the Nagios Web UI... 10 4. Using check_snmp_health... 11 4.1 h or --help... 11 4.2 bc... 11 4.3 cn... 12 4.4 co... 12 4.5 d... 13 4.6 i... 13 4.7 t... 13 4.8 to... 14 5. Appendix... 15 5.1 SD5 FAQ... 15 5.2 How to Reset Memory Error Status?... 15 5.3 Can I Disable the SD5 Web?... 16 5.4 Can I Disable the NRPE Protocol?... 17 5.5 No Health Information from SNMP Was Fetched... 17 Contacting Supermicro... 19 iii

1. Introduction This Nagios plug-in, named check_snmp_health, uses SNMP to talk to SuperDoctor 5 and check the health of the following hardware components: Fan Processor temperature System temperature DDR3 temperature Power supply failure Voltage Chassis intrusion Physical disk failure Memory failure (Linux platform only) 1 Processor failure (Linux platform only) 2 RAID health (LSI MegaRAID 2108 and 2208 controllers only) The results of executing the check_snmp_health plug-in are shown on the Nagios Web UI. 1 The memory health check includes CECC and UECC. Both kinds must be BIOS supported, and this function is currently only available on Linux platforms. 2 The processor failure checks must be BIOS supported and is currently only available on Linux platforms. 4

2. Prerequisites 2.1 Installing Java Runtime Environment (JRE) The check_snmp_health plug-in is written in Java. To run the plug-in, install JRE 1.6 or above in your Nagios server. Please set the JAVA_HOME environment variable to the JRE installation path. 2.2 Installing the check_snmp_health Plug-in 1. Unzip the package file SSMServerPlugin-1.0-build.[xyz].zip to the /usr/local/nagios/libexec/ssmserverplugin folder, assuming your Nagios is installed on the /usr/local/nagios location. 2. Using the command chmod +x check_snmp_health.sh to make the check_snmp_health plug-in executable. 3. Execute the check_snmp_health.sh program without providing any argument. If the JRE and the plug-in are installed correctly, the error message appears: Invalid options. Three options must be provided for -i (--ip). 2.3 SuperDoctor 5 (SD5) The check_snmp_health plug-in is designed to work with the SuperDoctor 5, which implements an SNMP extension to support Supermicro MIBs (see 5.3 Supermicro MIB in SuperDoctor 5 User's Guide for details). For the installation of the SuperDoctor 5, please refer to Chapter 2 Setting Up SD5 in SuperDoctor 5 User's Guide. For the quick installation of multiple SD5s, see 2.1.4 Tips for Deploying a Large Number of SD5s in SuperDoctor 5 User's Guide. 2.4 Seting Up SNMP Service in Linux To support SNMP, the NET-SNMP service needs to be installed and configured on your Linux. For the installation and configuration of the NET-SNMP service, please refer to 5.2 Setup SNMP Service in Linux in SuperDoctor 5 User's Guide. 5

2.5 Installing Smartctl Utility SD5 uses an open source program named smartctl to check the health of physical disks. To enable this function, you need to manually install the smartctl program. Download the program from: http://sourceforge.net/apps/trac/smartmontools/wiki/download Many Linux distributions provide pre-compiled packages to simplify the installation of smartctl. For example, on the CentOS 5.x, you can install smartctl by using the yum command as shown below. 6

3. Getting Started 3.1 Defining the Hosts Define a host for each of the SD5s. Suppose that the host is written in the host1.cfg file. define host { host_name 10.134.12.36 alias 10.134.12.36 address 10.134.12.36 use linux-server } Edit the $NAGIOS_HOME$/etc/nagios.cfg file to include the host1.cfg. 3.2 Defining a Command Define a command for check_snmp_health. Suppose that the command is written in the commands.cfg file. define command { command_name check_snmp_health_all command_line /usr/local/nagios/libexec/ssmserverplugin/check_snmp_health.s h i $HOSTADDRESS$ -t $ARG1$ } 7

3.3 Defining the Services Define the service to be checked by Nagios. Suppose that the service is written in the host1.cfg file. define service { use local-service service_description check_snmp_health host_name 10.134.12.36 check_command check_snmp_health_all!a } You can also define a service to check a particular type of monitored items, e.g., fan, disk, or memory, by specifying the type argument: a: all (like checking all health) w: power, f: fan c: current d: disk m: memory t: temperature v: voltage s: switch p: processor r: raid For example, the service used to check fan status is shown below: define service { use local-service host_name 10.134.12.36 service_description check fan status check_command check_snmp_health_all!f } You can also check multiple types of monitored items by enumerating each of the type. A service used to check disk, fan, and voltage status is shown below. 8

define service { use local-service host_name 10.134.12.36 service_description check disk, fan, and voltage status check_command check_snmp_health_all!dfv } 3.4 Validating the Nagios Configurations 9

3.5 Restarting Nagios Service 3.6 Connecting to the Nagios Web UI The results are shown on the Nagios Web UI as below. 10

4. Using check_snmp_health 4.1 h or --help The -h or --help option shows the help menu, as shown below. 4.2 bc Use the bc option to specify user-defined thresholds for memory and processor checking. The argument format is as follows: [type][duration][fail count],. [type]: m: correctable single bit ECC errors. M: uncorrectable ECC errors. p: processor failures. [duration]: d: day h: hour m: minute s: second [fail count]: The acceptable number of failures. To trigger a critical status, the failure counts must be greater than this value. 11

Example: To specify a threshold for memory that indicates four single bit ECC errors per 1GB RAM within one day (24 hours) is allowed (i.e., m1d4), and 0 uncorrectable ECC error is allowed within 1 hour (i.e., M1h0). -bc m1d4,m1h0 Note: To reset the memory, refer to 5.2 How to Reset Memory Error Status? for more information. Example: To specify a threshold for processor that indicates 2 correctable processor failures within 30 days is allowed (i.e., p30d2), and 0 uncorrectable processor failure is allowed within 1 hour (i.e., P1h0). -bc p30d2,p1h0 4.3 cn Use the cn option to specify user-defined thresholds for checking the number of processors, memory and hard disks. The argument format is as follows: [type][number],. [type]: p: processor. m: memory. d: hard disk drives. [number]: The expected number of processors, memory, or hard disks. To trigger an OK status, the assigned number must be equal to the number of the processors, memory, or hard disks installed on the system under monitoring. Example: The arguments are specified for the cn option, and this option indicates the system under monitoring has one processor, four memory DIMMs, and one hard disk drive. -cn p1,m4,d1 4.4 co Use the co option to specify an SNMP community string. 12

4.5 d Use the d option to show detailed information regarding the monitoring logics, which is used for debugging propose only. This option should not be used in Nagios. 4.6 i Use the i option to specify the host name or IP address to be checked. 4.7 t Use the t option to specify the type of monitored items to be checked. The default value is all. Use the "-t r" option as shown below to check the health status of a RAID controller, including the states of its components such as battery backup units, virtual drives and hard disks. 13

The following figure indicates one virtual drive and one hard disk are alerted, and the health status of the RAID controller is thus critical. The following figure shows the RAID controller is critical due to the absent BBU. 4.8 to Use the to option to specify the SNMP timeout value. The default value is 15 seconds. You may need to increase the timeout value if the check_snmp_health plug-in cannot retrieve all MIBs. For example, a host to be checked has several hard disks. Checking its physical disk failures may be longer than 15 seconds and times out the check_snmp_health plug-in. To avoid such a situation, specify a larger timeout value by using the to option. 14

5. Appendix 5.1 SD5 FAQ Q: I see some error messages in the [SD5 install folder]/wrapper.log file. Do you have a list of all error messages and solutions? A: Here are the known error messages. NO Message Root Cause Solution A0001 HealthInfo The SD5 is run on a Install SD5 on Supermicro servers. initialization error. com.supermicro.ss m.tmhealth.model. MotherboardModel NotExistException: non-supermicro server. Health information is only available on Supermicro servers. A0002 Unable to start JVM: No such file or directory The SD5 cannot find the required Java Virtual Machine (JVM) located in the [SD5 install folder]/jre folder. Reinstall the SD5. 5.2 How to Reset Memory Error Status? Q: An uncorrectable ECC error has been raised on a server and I have manually changed the pragmatic memory module. However, the check_snmp_health plug-in still shows a critical status. A: The check logic of memory errors is based on these: 1. There are memory error logs in the BIOS event log. 2. The log's generated time is in the check time period. For example, suppose that you use the -bc M1d0 option (i.e., any uncorrectable ECC error occurring in one day will cause a critical state) to check memory error. Once an uncorrectable ECC error has been found, the status will remain critical for one day even after the problematic memory is manually changed. To get an OK status immediately after manually repairing the memory, you need to follow these steps: 15

1. Clear BIOS event logs from the BIOS setup menu. 2. Delete the file [SD5 install folder]/config/bioslogs.txt 5.3 Can I Disable the SD5 Web? Q: I only use the check_snmp_health plug-in to check the health of a host and do not use a browser to view the sensor readings via the SD5 Web. Can I disable it? A: Yes, the SD5 Web can be disabled during installation. At the Setup SuperDoctor 5 Web step, select 2- No to disable the SD5 Web. See the figure below. You can also manually disable the SD5 Web after installation. Use a text editor to open the [SD5 install folder]/plugins/builtin/web/plugin.cfg file, as shown below. 16

Change the enabled attribute from 1 to 0 and save the document. Exit the text editor and restart the SD5 to apply the setting. If the SD5 Web is disabled, the TCP ports 8181 and 8444 are not used. 5.4 Can I Disable the NRPE Protocol? Q: I only use the check_snmp_health plug-in to check the health of a host and do not use the NRPE protocol to talk with the SD5. Can I disable the support of the NRPE protocol? A: The SD5 supports three NRPE connection modes: Mode A: Plain text with allowed IP (port 5333) Mode B: Anonymous SSL connection with allowed IP (port 5666) Mode C: SSL encryption with a public key infrastructure (port 5999) Because the NRPE protocol is the default connection protocol provided by the SD5, it cannot be completely turned off. At least one connection mode must be specified. For modifying the connection mode settings, refer to 3.2 SuperDoctor 5 Connection Modes in SuperDoctor 5 User's Guide. 5.5 No Health Information from SNMP Was Fetched Q: I execute the command check_snmp_health.sh -i [host_ip] and the result shows No health information from SNMP was fetched. What is the problem? A: Usually this message indicates that the host to be checked does not support Supermicro MIB. Possible reasons include: 17

The operating system s built-in SNMP service (i.e., the Net-SNMP) does not start. The SD5 does not start. The SD5 SNMP extension is not correctly installed. The SNMP port is blocked by firewall. The default timeout value is not long enough for a health check. 18

Contacting Supermicro Headquarters Address: Super Micro Computer, Inc. 980 Rock Ave. San Jose, CA 95131 U.S.A. Tel: +1 (408) 503-8000 Fax: +1 (408) 503-8008 Email: marketing@supermicro.com (General Information) support@supermicro.com (Technical Support) Web Site: www.supermicro.com Europe Address: Super Micro Computer B.V. Het Sterrenbeeld 28, 5215 ML 's-hertogenbosch, The Netherlands Tel: +31 (0) 73-6400390 Fax: +31 (0) 73-6416525 Email: sales@supermicro.nl (General Information) support@supermicro.nl (Technical Support) rma@supermicro.nl (Customer Support) Asia-Pacific Address: Super Micro Computer, Inc. 3F, No. 150, Jian 1st Rd. Zhonghe Dist., New Taipei City 23511 Taiwan (R.O.C) Tel: +886-(2) 8226-3990 Fax: +886-(2) 8226-3992 Web Site: www.supermicro.com.tw Technical Support: Email: support@supermicro.com.tw Tel: +886-(2)-8226-3990 19

This page is intentionally left blank 20