Nagios Introduction
AGENDA: INTRODUCTION: 1. How is our cloud monitoring setup? 2. Which are the tools used? 3. How do we access monitoring dashboard? 4. What are the user id / password? 5. How to check the alerts received is fixed or still exist? 6. Nagios Tactical View? 7. Host Trends or Status History? 8. Check_mk Status Detail Screen?
How is our cloud Monitoring setup? https://cloudmgmt.metricstream.com/nagios/ Cybercon P2P Link URL: https://cloudmgmt.metricstream.com vxchnge SET UP CentOS 6.X Nagios Core Version:4.0.3 Nagios Graph Version: 1.5 Nagios https://nocnaviste.metricstream.com/n agios/ Navisite
Which are the tools used? Nagios Nagios is a powerful, enterprise-class host, service, application, and network monitoring program. Designed to be fast, flexible, and rock-solid stable. Nagios runs on *NIX hosts and can monitor Windows, Linux/Unix/BSD, Netware, and network devices. Check_mk Check_mk is an extension to the Nagios monitoring system. Offloading work from the Nagios core to make it scale better, allowing more systems to be monitored from a single Nagios server. OSSEC HIDS OSSEC is an Open Source Host-based Intrusion Detection System that performs log analysis, file integrity checking, policy monitoring, rootkit detection, real-time alerting and active response. ALERTBOT AlertBot is an industry leader in URL monitoring service. AlertBot monitors URL`s with the goal of gathering performance data and alerting the company of failures. DELL Open Manage Server Administrator Server Administrator is designed for system administrators to manage systems locally and remotely on a network. Which we Integrated with Nagios for Hardware health monitoring.
How do we access Monitoring Dashboard? URL: https://cloudmgmt.metricstream.com Click on NOC, It will redirect to Datacenter wise Monitoring page : For example, if we will click on Cybercon and Vxchnge, it will redirect to login page: User ID : nocguest Password: M3tr1c*321
How do we access Monitoring Dashboard? After providing the details, it will land to Nagios Dashboard: https://cloudmgmt.metricstream.com/nagios/ https://cloudmgmt.metricstream.com/check_mk/
How to check the alerts Received is fixed or not? For Services: https://cloudmgmt.metricstream.com/nagios/ Problems Services (Unhandled) For Hosts: https://cloudmgmt.metricstream.com/nagios/ Problems Hosts (Unhandled) To check Individual Servers Type <Hostname> in Search Option Example : Sterlingapp*
Nagios Tactical View:
Host Trends or Status History:
Event Logs:
Check_mk Status Detail Screen: Check_mk Dashboard Check_mk Hosts Check_mk Individual Host Check_mk Alerts
Check_mk Status Detail Screen: Check_mk 1. It can be used as a front-end and extension of a Nagios monitoring systems, for monitoring performance and health of networking devices, servers and infrastructure systems. 2. Auto detection of configuration of data points in a monitored system (inventory) 3. special checks in addition to standard Nagios plugins. 4. Agentless (SNMP-based) monitoring 5. Replacement of standard Nagios GUI and centralized monitoring 6. Graphical administration of the monitoring system 7. Filtering, viewing and alerting for log files and event data like SNMP traps.
Cloud Monitoring
Agenda: 1. What are the System / Application / Database alerts enabled? 2. What are the hardware monitoring alerts enabled? 3. Nagios setup for alert mechanism and which are DL s enabled for action? 4. How Nagios helps cloud applications monitoring? 5. What can it do? 6. Whom to reach if its Server related, application, SSL, Network, False alert? 7. How PE can create their own SOP s to handle such alerts to improve their work efficiency? 8. Road map?
What are the System / App / DB alerts Enabled? System Monitoring Enabled: 1. Disk usage 2. System CPU Load 3. System UP time 4. System Memory Usage Application Monitoring Enabled: Service Monitoring like Apache Tomcat Open office Oracle DB Tns Listener SSL expiry Monitoring Oracle health (tnsping, tablespace, invalid objects)
What are the Hardware alerts Enabled? Hardware Monitoring Enabled: Plugin Features: Storage components checked: Controllers Physical drives Logical drives Cache batteries Connectors (channels) Enclosures Enclosure fans Enclosure power supplies Enclosure temperature probes Enclosure management modules (EMMs)
Nagios Setup For alert Mechanism: Harddisk Space Issue If space consumption is below 85% If space consumption is between 85%-95% If space consumption is above 95% Memory Usage Issue If RAM usage is below 85% If RAM usage is between 85%-90% If RAM usage is above 90% System Uptime System Running Perfectly Host Down Host Up
Nagios Setup For alert Mechanism: Application Monitoring Apache Tomcat Open office Oracle If Service is running If the service was stopped or if there was any issue. SSL Monitoring More than 30 Days 15 to 30 Days 0 to 15 Days
How Nagios helps cloud applications monitoring? Avoidance of Too many red flashing lights Just the facts only want root cause failures to be reported, not cascade of every downstream failure. Avoids unnecessary checks e.g. HTTP responds, therefore no need to ping e.g. power outage, no ping response, so don t bother trying anything else Services are running fine no need to do check if the host itself is alive What Can it Do? Individual node status Is it up? What is its load? What is the memory and swap usage? NFS and network load? Are the partitions full? Are applications and services running properly? How about ping latency? Aggregated node status Same info, but across groups of nodes
Notification Examples
How PE can Create their own SOP? Relook at the Existing SOP`s of PE followed already for outages or Customer cases. Based on the alert received and action to be taken existing SOP`s to be updated with the Monitoring Capabilities by PE. Road Map: Nagios upcoming features. 1. Merging both Nagios site Vxchnge and Navisite in Check_mk for single view. 2. MetricStream ECP license expire alert intimation 3. Oracle schema user expire alert intimation. 4. Graphical report on alerts for easy sort. 5. Grouping the host base customer for easy view. 6. Account wise grouping for easy view.