Configuring an OpenNMS Stand-by Server



Similar documents
Backing up the Embedded Oracle database of a Red Hat Network Satellite

2. Boot using the Debian Net Install cd and when prompted to continue type "linux26", this will load the 2.6 kernel

Managing SAS Web Infrastructure Platform Data Server High-Availability Clusters

Chapter 1. Backup service

This appendix describes the following procedures: Cisco ANA Registry Backup and Restore Oracle Database Backup and Restore

Configuring High Availability for VMware vcenter in RMS Distributed Setup

Implementing Failover Capabilities in Red Hat Network Satellite

IBM Pure Application Create Custom Virtual Image Guide - Part 1 Virtual Image by extending

Backup and Restore FAQ

NAS 259 Protecting Your Data with Remote Sync (Rsync)

Step One: Installing Rsnapshot and Configuring SSH Keys

EISOO AnyBackup 5.1. Detailed Features

OpenGeo Suite for Linux Release 3.0

Module 7. Backup and Recovery

Automated Offsite Backup with rdiff-backup

HP Device Manager 4.6

Use QNAP NAS for Backup

TCB No March Technical Bulletin. GS FLX and GS FLX+ Systems. Configuration of Data Backup Using backupscript.sh

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

Administration Guide GroupWise Mobility Service 2.1 February 2015

Upgrading to advanced editions of Acronis Backup & Recovery 10. Technical white paper

LifeKeeper for Linux PostgreSQL Recovery Kit. Technical Documentation

High Availability for Informatica Data Replication in a Cluster Environment

Backing Up Your System With rsnapshot

DSView 4 Management Software Transition Technical Bulletin

MySQL Backups: From strategy to Implementation

FioranoMQ 9. High Availability Guide

A SHORT INTRODUCTION TO DUPLICITY WITH CLOUD OBJECT STORAGE. Version

RUGGEDCOM NMS for Linux v1.6

IGEL Universal Management. Installation Guide

HAOSCAR 2.0: an open source HA-enabling framework for mission critical systems

OnCommand Performance Manager 1.1

Installing Virtual Coordinator (VC) in Linux Systems that use RPM (Red Hat, Fedora, CentOS) Document # 15807A1-103 Date: Aug 06, 2012

Moving to Plesk Automation 11.5

IBM Sterling Control Center

IMPLEMENTATION OF CIPA - PUDUCHERRY UT SERVER MANAGEMENT. Client/Server Installation Notes - Prepared by NIC, Puducherry UT.

Open-Xchange Server Backup Whitepaper

Disaster Recovery Configuration Guide for CiscoWorks Network Compliance Manager 1.8

K2 TX/MAM Server User Manual

Monitoring Clearswift Gateways with SCOM

Sophos Anti-Virus for Linux configuration guide. Product version: 9

Installation Guide. Copyright (c) 2015 The OpenNMS Group, Inc. OpenNMS SNAPSHOT Last updated :19:20 EDT

PZVM1 Administration Guide. V1.1 February 2014 Alain Ganuchaud. Page 1/27

NetNumen U31 R06. Backup and Recovery Guide. Unified Element Management System. Version: V

Parallels Virtuozzo Containers 4.7 for Linux

Netezza PureData System Administration Course

Single Node Setup. Table of contents

Disaster Recovery Solutions for Oracle Database Standard Edition RAC. A Dbvisit White Paper

SmartFiler Backup Appliance User Guide 2.0

Technical Paper. Configuring the SAS Web Infrastructure Platform Data Server for High Availability

Preparing for the Installation

Nagios in High Availability Environments

Postgres Plus xdb Replication Server with Multi-Master User s Guide

Implementing Highly Available OpenView. Ken Herold Senior Integration Consultant Melillo Consulting

Postgres Enterprise Manager Installation Guide

Sophos Anti-Virus for Linux configuration guide. Product version: 9

Extending Remote Desktop for Large Installations. Distributed Package Installs

Redmine Installation on Debian. v1.1

HIGH AVAILABILITY SETUP USING VERITAS CLUSTER SERVER AND NETAPP SYNCHRONOUS SNAPMIRROR. Jorge Costa, NetApp June 2008

for QSS/OASIS Presented by Bill Genske Session 135

Pervasive PSQL Meets Critical Business Requirements

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

This section is intended to provide sample configurations and script examples common to long-term operation of a Jive SBS installation.

How To Install An Aneka Cloud On A Windows 7 Computer (For Free)

NMS300 Network Management System

Redundant Servers. APPolo Redundant Servers User Guide. User Guide. Revision: 1.2 Last Updated: May 2014 Service Contact:

White Paper. Fabasoft on Linux Cluster Support. Fabasoft Folio 2015 Update Rollup 2

CA ARCserve and CA XOsoft r12.5 Best Practices for protecting Microsoft Exchange

Secure Web Gateway Version 11.7 High Availability

SnapManager 1.0 for Virtual Infrastructure Best Practices

PostgreSQL Clustering with Red Hat Cluster Suite

Monitoring an HP platform based solution and SNMPv2/v3 alarm forwarding and synchronization with an existing NMS

Backup and Recovery using PITR Mark Jones EnterpriseDB Corporation. All rights reserved. 1

1Y0-250 Implementing Citrix NetScaler 10 for App and Desktop Solutions Practice Exam

Configuring Failover

GoGrid Implement.com Configuring a SQL Server 2012 AlwaysOn Cluster

SETTING UP RASPBERRY PI FOR TOPPY FTP ACCESS. (Draft 5)

SIP-DECT Knowledge Base SIP-DECT System Update

OpenCart. SugarCRM CE (Community Edition Only) Integration. Guide

How to Backup XenServer VM with VirtualIQ

Panorama High Availability

Seagate NAS OS 4 Reviewers Guide: NAS / NAS Pro / Business Storage Rackmounts

Applications Manager Best Practices document

Rsync: The Best Backup System Ever

Newton Linux User Group Graphing SNMP with Cacti and RRDtool

PostgreSQL 9.0 Streaming Replication under the hood Heikki Linnakangas

Database Backup and Restore Mechanism. Presented by : Mary Meladath

Architecture and Mode of Operation

Preface Introduction... 1 High Availability... 2 Users... 4 Other Resources... 5 Conventions... 5

Basic System Administration ESX Server and Virtual Center 2.0.1

Configure Cisco Emergency Responder Disaster Recovery System

QStar SNMP installation, configuration and testing. Contents. Introduction. Change History

3M Command Center. Installation and Upgrade Guide

Implementing Microsoft Windows Server Failover Clustering (WSFC) and SQL Server 2012 AlwaysOn Availability Groups in the AWS Cloud

VMware vcenter Configuration Manager Backup and Disaster Recovery Guide vcenter Configuration Manager 5.4.1

Linux Extension for AIDA64

Transcription:

WHITE PAPER Conguring an OpenNMS Stand-by Server Version 1.2 The OpenNMS Group, Inc. 220 Chatham Business Drive Pittsboro, NC 27312 T +1 (919) 533-0160 F +1 (503) 961-7746 david@opennms.com URL: http://blogs.opennms.org/david

Executive Summary A large corporation contacted the OpenNMS Group for a consulting engagement requesting assistance with their new network and system management platform, OpenNMS. This engagement was scheduled for 3 days with the primary focus on developing and implementing a continuity of operations plan in the event of a system failure or localized disaster impacting OpenNMS. Continuity of Operations Plan This plan includes automated and manual instructions for creating, maintaining, and implementing two OpenNMS instances in primary, standby, and failover modes. For ease of communication we ll call this the HA conguration. The HA conguration concepts The OpenNMS software and software dependencies (i.e. PostgreSQL) will be installed on 2 geographically remote Debian Linux 4.0 servers: nms-01 and nms-02. The nms-01 server is the primary system that is responsible for monitoring the corporate network and the nms-02 server is the backup system that, when in stand-by mode, will be monitoring nms-01 and congured to notify operations staff when a failover should be initiated. The nms-02 server will monitor the corporate network in the failover mode. 01 monitors 02 to alert on the status of the standby 02 monitors 01 to alert on the status of the primary nms-01 02 exports opennms DB from 01 via cron 02 rsyncs RRD and /etc/opennms from 01 via cron nms-02 01 actively manages corporate network and network devices notify via SNMP Traps Corporate NET 02 manages corporate Network only in failover mode, however, network devices should always send SNMP Traps to each system White Paper: OpenNMS Stand-by Server 1

There is a monitoring conguration on nms-02 and two main scripts located on nms-02 that support this conguration. Script: sync.sh This script is responsible for maintaining consistency of the live nms-01 OpenNMS instance with the stand-by OpenNMS failover conguration resident on nms-02. This script is designed to be ran as a cron job and the crontab schedule should be maintained in the sync-on.crontab le in the $SYNC_HOME/etc/ directory and the on/off state is maintained by the sync-state.sh script found in the $SYNC_HOME/scripts directory. (Note: This synchronization can be simplied tremendously if a shared le system can be mounted from a SAN and a DB server or DB server cluster is in place. The synchronization in this document assumes geographically separate and stand-alone OpenNMS instances.) Phase 1 (Database Sync): The script is responsible for synchronizing the OpenNMS database and conguration les on nms-01 without effecting nms-02 s stand-by mission of only monitoring nms-01. This script is executed on a schedule dened in a crontab and is located on the le system in the ~opennms/ scripts/failover directory. This script exports the live opennms database instance running on nms-01 to the le system on nms-02 and imports that data to a PostgreSQL database instance called opennms_failover running on nms-02. This instance name is used to not conflict with the active opennms instance that is used by OpenNMS in standby mode to only monitor nms-01. Phase 2 (Conguration Sync): The script is responsible for synchronizing the OpenNMS conguration les on nms-01 for use during failover mode of operation. The conguration les in $OPENNMS_ETC located on nms-01 are rsync d, via and ssh tunnel, to $OPENNMS_ETC/opennms-failover directory on nms-02. Phase 3 (RRD Repository Sync): The script is responsible for synchronizing the OpenNMS RRD repository on nms-01 for use on nms-02 during failover. The RRD les in /var/lib/opennms/rrd located on nms-01 are rsync d, via and ssh tunnel, to /var/lib/opennms/rrd-failover/ nms-02. Script: failover.sh This script is responsible for changing the state of OpenNMS running on nms-02 from standby mode to failover mode. The script requires that either start or stop is passed as a parameter. The parameter start stops the automatic synchronization and initiates the failover sequence causing the standby OpenNMS instance to begin actively monitoring the corporate network. The stop parameter deactivates the failover mode instance of OpenNMS and resumes OpenNMS in the standby mode of monitoring the nms-01 OpenNMS instance and reschedules the automatic synchronization. This script when passed the start parameter: Stops OpenNMS on nms-02 White Paper: OpenNMS Stand-by Server 2

Changes symbolic links /etc/opennms and /var/lib/opennms/rrd to point to the respective failover directories created and synchronized by the sync.sh script so that OpenNMS conguration les synchronized from nms-01 will operate properly. Copies a special opennms-datasources.xml le to /etc/opennms/ (this le causes OpenNMS to use the opennms_failover database in PostgreSQL instance instead of the default opennms database) Starts OpenNMS This script when passed the stop parameter restores the symbolic links and the opennmsdatasources.xml le and resumes OpenNMS into standby mode. Script: sync-state.sh This script is used to control the state of the automatic synchronization maintained in the system crontab of the $SYNC_USER on nms-02. Change the sync-on.crontab le in the $SYNC_HOME/ etc/ directory to dene the schedule. HA Operational Modes: For the nms-02 server, the OpenNMS instance s normal state is to operate in standby mode. While in stand-by mode, OpenNMS is congured to monitor only nms-01 server via 2 of its services: the OpenNMS Web application (OpenNMSWebApp) and the ICMP service of the nms-01 ethernet interface. When nms-02 detects that the nms-01 node is down (all services being monitored are down), it should generate a notication to an appropriate destination path. An OpenNMS administrator should take the following steps to begin the failover mode of operation on nms-02: Verify the nms-01 is actually down and not monitoring the corporate network Disable synchronization of nms-01 from nms-02 via the opennms user s crontab on nms-02 Run the failover.sh script on nms-02 with the start parameter When nms-01 is recovered, the OpenNMS administrator should take the following steps to resume the stand-by mode of operation on nsm-02: Run the failover.sh script on nms-02 with the stop parameter Re-nable synchronization of nms-01 from nms-02 via the opennms user s crontab Required Conguration Access control to PostgreSQL on nms-01 to allow access from nms-02 (pg_hba.conf and iptables) The rsync package installed on both nms-01 and nms-02 A synchronization user (SYNC_USER) account be created on both NMSs as a best practice as opposed to using root access and OpenNMS conguration and RRD structures changed to be owned by this user. The sudo package should be installed and/or congured on both NMSs to provide the SYNC_USER account root privileges when needed during failover. A pass-phrase-less RSA key to be used by the SYNC_USER and the sync.sh script. This is created so that the SYNC_USER user account on nms-02 could rysnc via an ssh tunnel without interaction. White Paper: OpenNMS Stand-by Server 3

NOTE: The private key stored in the ~opennms/.ssh directory should be strictly controlled. As part of normal procedures taken following changes in personnel, this key should be regenerated and the opennms user account password changed. White Paper: OpenNMS Stand-by Server 4

Appendix A Failover Scripts The scripts and les supporting this feature are installed using the following directory structure: /Users/david/Documents/Business/Engineering/Support/kcpl/failover../db./etc./etc/opennms-datasources-failover.xml./etc/opennms-datasources-standby.xml./etc/sync-off.cron./etc/sync-on.cron./scripts./scripts/failover.sh./scripts/sync-state.sh./scripts/sync.sh $SYNC_HOME/scripts/sync.sh #!/bin/sh # export data from other server # pg_hba.conf on $PRIMARY_NMS must be changed # to allow connections from $STANDBY_NMS. Also # postgresql.conf on $PRIMARY_NMS must be changed # so that the postmaster process is congured # to listen on all IP interfaces including # localhost. SYNC_USER=opennms SYNC_HOME=~$SYNC_USER/failover/ PRIMARY_NMS=nms-01 STANDBY_NMS=`host` OPENNMS_ETC=/etc/opennms OPENNMS_ETC_STANDBY=/etc/opennms-standby OPENNMS_ETC_FAILOVER=/etc/opennms-failover OPENNMS_RRD=/var/lib/opennms/rrd OPENNMS_RRD_STANDBY=/var/lib/opennms/rrd-standby OPENNMS_RRD_FAILOVER=/var/lib/opennms/rrd-failover FAILOVER_DB=opennms_failover if [ -f $SYNC_HOME/bin/sync-envvars ];. $SYNC_HOME/etc/sync-envvars # export db data dbsync() { echo "Exporting data from $PRIMARY_NMS..." if! test -d "${SYNC_HOME}/db" mkdir $SYNC_HOME/db White Paper: OpenNMS Stand-by Server 5

pg_dump -i -h $PRIMARY_NMS -U opennms opennms > $SYNC_HOME/db/opennms.sql if [ $? -eq 0 ] echo "Dropping and recreating $FAILOVER_DB Database locally..." psql -U opennms template1 -c "DROP DATABASE $FAILOVER_DB;" psql -U opennms template1 -c "CREATE DATABASE $FAILOVER_DB ENCODING 'unicode';" else } if [ $? -eq 0 ] echo "Importing data exported from $PRIMARY_NMS into recreated opennms_failover DB..." psql -U opennms $FAILOVER_DB < $SYNC_HOME/db/opennms.sql else # Now it is important that the opennms conguration les # rsync'd from $PRIMARY_NMS etcsync() { echo "Rsync'ing conguration les from $PRIMARY_NMS..." rsync -e 'ssh -i ~$SYNC_USER/.ssh/id-rsa-key' -avz $SYNC_USER@$PRIMARY_NMS:$OPENNMS_ETC* $OPENNMS_ETC_FAILOVER/ } # Sync the RRD les rrdsync() { echo "Rsync'ing RRD data les from $PRIMARY_NMS..." rsync -e 'ssh -i ~$SYNC_USER/.ssh/id-rsa-key' -azv $SYNC_USER@$PRIMARY_NMS:$OPENNMS_RRD/* $OPENNMS_RRD_FAILOVER/ } # Synchronize the DB dbsync echo "Failed db sync of $PRIMARY_NMS!" # Synchronize the failover $OPENNMS_ETC_FAILOVER with the $PRIMARY_NMS:$OPENNMS_ETC/ etcsync echo "Failed etc sync of $PRIMARY_NMS!" White Paper: OpenNMS Stand-by Server 6

# Synchronize the failover $OPENNMS_RRD_FAILOVER/ with the primary $OPENNMS_RRD/ rrdsync echo "Failed etc sync of $PRIMARY_NMS!" $SYNC_HOME/scripts/failover.sh #!/bin/bash # This script will start opennms # in in a conguration that has been # duplicated from the primary opennms # system. # Must rst stop OpenNMS that is running and # monitoring the primary server. SYNC_USER=opennms SYNC_HOME=~$SYNC_USER/failover PRIMARY_NMS=nms-01 STANDBY_NMS=`hostname` OPENNMS_ETC=/etc/opennms OPENNMS_ETC_STANDBY=/etc/opennms-standby OPENNMS_ETC_FAILOVER=/etc/opennms-failover OPENNMS_RRD=/var/lib/opennms/rrd OPENNMS_RRD_STANDBY=/var/lib/opennms/rrd-standby OPENNMS_RRD_FAILOVER=/var/lib/opennms/rrd-failover OPENNMS_BIN=/usr/share/opennms/bin if [ -f $SYNC_HOME/etc/sync-envvars ];. $SYNC_HOME/etc/sync-envvars case "$1" in start) echo "Stopping automatic synchronization..." $SYNC_HOME/scripts/sync-state.sh off echo "ERROR:Failed to stop synchronization cron job; not failing over!" echo "Stopping OpenNMS in preparation of failover..." $OPENNMS_BIN/opennms stop White Paper: OpenNMS Stand-by Server 7

echo "ERROR:Failed to stop standby OpenNMS instance; not failing over!" $SYNC_HOME/scripts/sync-state.sh on echo "Changing the $OPENNMS_ETC symbolic link to $OPENNMS_ETC_FAILOVER..." rm $OPENNMS_ETC echo "ERROR:Failed to remove symbolic link to the $OPENNMS_ETC; not failing over!\nverify OpenNMS state." $SYNC_HOME/scripts/sync-state.sh on ln -s $OPENNMS_ETC_FAILOVER $OPENNMS_ETC echo "ERROR:Failed to create symbolic link to the $OPENNMS_ETC_FAILOVER; not failing over!\nverify OpenNMS state." $SYNC_HOME/scripts/sync-state.sh on echo "Changing the $OPENNMS_RRD symbolic link to $OPENNMS_RRD_FAILOVER..." rm $OPENNMS_RRD echo "ERROR:Failed to remove symbolic link to the $OPENNMS_RRD; not failing over!\nverify OpenNMS state." $SYNC_HOME/scripts/sync-state.sh on ln -s $OPENNMS_RRD_FAILOVER $OPENNMS_RRD echo "ERROR:Failed to create symbolic link to the $OPENNMS_RRD_FAILOVER; not failing over!\nverify OpenNMS state." $SYNC_HOME/scripts/sync-state.sh on # echo "Setting failover data sources conguration..." # cp -p ~$SYNC_HOME/etc/opennms-datasources-failover.xml $OPENNMS_ETC/opennmsdatasources.xml echo "Restarting OpenNMS..." $OPENNMS_BIN/opennms start ;; stop) echo "Stopping OpenNMS in failover mode, returning to standby mode..." $OPENNMS_BIN/opennms stop White Paper: OpenNMS Stand-by Server 8

echo "ERROR:Failed to stop failover OpenNMS instance; not failing over!" $SYNC_HOME/scripts/sync-state.sh off echo "Changing the $OPENNMS_ETC symbolic link to $OPENNMS_ETC_STANDBY..." rm $OPENNMS_ETC echo "ERROR:Failed to remove symbolic link to the $OPENNMS_ETC; not starting standby mode!\nverify OpenNMS state." $SYNC_HOME/scripts/sync-state.sh off ln -s $OPENNMS_ETC_STANDBY $OPENNMS_ETC echo "ERROR:Failed to create symbolic link to the $OPENNMS_ETC_STANDBY; not starting standby mode!\nverify OpenNMS state." $SYNC_HOME/scripts/sync-state.sh off echo "Changing the $OPENNMS_RRD symbolic link to $OPENNMS_RRD_STANDBY..." rm $OPENNMS_RRD echo "ERROR:Failed to remove symbolic link to the $OPENNMS_RRD; not starting in standby mode!\nverify OpenNMS state." $SYNC_HOME/scripts/sync-state.sh off ln -s $OPENNMS_RRD_STANDBY $OPENNMS_RRD echo "ERROR:Failed to create symbolic link to the $OPENNMS_RRD_STANDBY; not starting in standby mode!\nverify OpenNMS state." $SYNC_HOME/scripts/sync-state.sh off ln -s $OPENNMS_RRD_STANDBY $OPENNMS_RRD # echo "Chaning failover data sources conguration to stand-by conguration..." # cp -p $SYNC_USER/failover/etc/opennms-datasources-standby.xml $OPENNMS_ETC/opennmsdatasources.xml echo "Restarting OpenNMS..." $OPENNMS_BIN/opennms start echo "Restarting automatic synchronization..." $SYNC_HOME/scripts/sync-state.sh on ;; White Paper: OpenNMS Stand-by Server 9

*) echo "Usage: $0 {start stop}" ;; esac synch-state.sh #!/bin/sh SYNC_USER=opennms SYNC_HOME=~$SYNC_USER/failover PRIMARY_NMS=nms-01 STANDBY_NMS=`hostname` OPENNMS_ETC=/etc/opennms OPENNMS_ETC_STANDBY=/etc/opennms-standby OPENNMS_ETC_FAILOVER=/etc/opennms-failover OPENNMS_RRD=/var/lib/opennms/rrd OPENNMS_RRD_STANDBY=/var/lib/opennms/rrd-standby OPENNMS_RRD_FAILOVER=/var/lib/opennms/rrd-failover OPENNMS_BIN=/usr/share/opennms/bin if [ -f $SYNC_HOME/etc/sync-envvars ];. $SYNC_HOME/etc/sync-envvars case "$!" in on) echo "Enabling cron..." crontab -u $SYNC_USER ~$SYNC_HOME/etc/sync-on.cron crontab -l -u $SYNC_USER ;; off) *) esac echo "Enabling cron..." crontab -u $SYNC_USER ~$SYNC_HOME/etc/sync-on.cron crontab -l -u $SYNC_USER ;; echo "Usage: $0 {on off}" ;; $SYNC_HOME/etc/sync-envvars SYNC_USER=opennms SYNC_HOME=~$SYNC_USER/failover/ PRIMARY_NMS=nms-01 STANDBY_NMS=`host` OPENNMS_ETC=/etc/opennms OPENNMS_ETC_STANDBY=/etc/opennms-standby OPENNMS_ETC_FAILOVER=/etc/opennms-failover OPENNMS_ETC_FAILOVER=/etc/opennms-failover OPENNMS_RRD=/var/lib/opennms/rrd White Paper: OpenNMS Stand-by Server 10

OPENNMS_RRD_STANDBY=/var/lib/opennms/rrd-standby OPENNMS_RRD_FAILOVER=/var/lib/opennms/rrd-failover OPENNMS_BIN=/usr/share/opennms/bin FAILOVER_DB=opennms_failover $SYNC_HOME/etc/sync-on.crontab #Crontab entry for automatic synchronization of OpenNMS MAILTO=root SYNC_USER=opennms 0 0 * * * ~$SYNC_USER/failover/scripts/sync.sh $SYNC_HOME/etc/sync-off.crontab #Crontab entry for automatic synchronization of OpenNMS MAILTO=root SYNC_USER=opennms #0 0 * * * ~$SYNC_USER/failover/scripts/sync.sh White Paper: OpenNMS Stand-by Server 11