Implementing Highly Available OpenView. Ken Herold Senior Integration Consultant Melillo Consulting

Similar documents

EMS Peripheral Status Monitor: Definition

High Availability Solutions for the MariaDB and MySQL Database

HP OpenView Storage Data Protector

Cisco Active Network Abstraction Gateway High Availability Solution

Contents. SnapComms Data Protection Recommendations

HP OpenView Network Node Manager

Event Monitoring Service Version A Release Notes

ManageEngine Desktop Central Training

Oracle Data Guard for High Availability and Disaster Recovery

HP OpenView Smart Plug-in for Microsoft Exchange Server

Creating A Highly Available Database Solution

HP OpenView Smart Plug-in for Databases

Vicom Storage Virtualization Engine. Simple, scalable, cost-effective storage virtualization for the enterprise

HP OpenView Storage Data Protector

EMC DOCUMENTUM xplore 1.1 DISASTER RECOVERY USING EMC NETWORKER

IBM Sterling Control Center

CA ARCserve and CA XOsoft r12.5 Best Practices for protecting Microsoft SQL Server

Antelope Enterprise. Electronic Documents Management System and Workflow Engine

SyAM Software Management Utilities. Creating Templates

Availability Guide for Deploying SQL Server on VMware vsphere. August 2009

DATAWORX32 V9.2 REDUNDANCY. An ICONICS Whitepaper

White Paper ClearSCADA Architecture

FactoryTalk View Site Edition V5.0 (CPR9) Server Redundancy Guidelines

Getting Started with Endurance FTvirtual Server

Oracle Application Server

HP OpenView Network Node Manager

Neverfail for Windows Applications June 2010

Intelligent Power Protector User manual extension for Microsoft Virtual architectures: Hyper-V 6.0 Manager Hyper-V Server (R1&R2)

Planning and Administering Windows Server 2008 Servers

OpenView Operations Native Agent and Smart Plug-In for OpenVMS

Ecomm Enterprise High Availability Solution. Ecomm Enterprise High Availability Solution (EEHAS) Page 1 of 7

Veritas CommandCentral Disaster Recovery Advisor Release Notes 5.1

Deployment Topologies

ORACLE DATABASE HIGH AVAILABILITY STRATEGY, ARCHITECTURE AND SOLUTIONS

HP Education Services Course Overview

Redundant Servers. APPolo Redundant Servers User Guide. User Guide. Revision: 1.2 Last Updated: May 2014 Service Contact:

CA ehealth. High Availability and Disaster Recovery Administration Guide. r6.1

Disaster Recovery Configuration Guide for CiscoWorks Network Compliance Manager 1.8

Parallels Containers for Windows 6.0

Rajesh Gupta Best Practices for SAP BusinessObjects Backup & Recovery Including High Availability and Disaster Recovery Session #2747

Maximum Availability Architecture. Oracle Best Practices For High Availability. Backup and Recovery Scenarios for Oracle WebLogic Server: 10.

Pharos Uniprint 8.4. Maintenance Guide. Document Version: UP84-Maintenance-1.0. Distribution Date: July 2013

Storage Health and Event Monitoring

Building Active/Passive Clusters with Oracle Fusion Middleware 11g

Load Balancing and Clustering in EPiServer

ActiveVOS Clustering with JBoss

Upgrading Cisco UCS Central

OpenView and Hostname Resolution

W H I T E P A P E R : V E R I T A S C L U S T E R S E R V ER. Veritas Cluster Server:

: HP HP0-X02. : Designing & Implementing HP Enterprise Backup Solutions. Version : R6.1

RES ONE Automation 2015 Task Overview

NEC Corporation of America Intro to High Availability / Fault Tolerant Solutions

WhatsUp Gold v16.3 Installation and Configuration Guide

Blackboard Managed Hosting SM Disaster Recovery Planning Document

Postgres Plus xdb Replication Server with Multi-Master User s Guide

Disaster Recovery Solutions for Oracle Database Standard Edition RAC. A Dbvisit White Paper

Backup and Restore the HPOM for Windows 8.16 Management Server

Maximum Availability Architecture

INUVIKA TECHNICAL GUIDE

HP OpenView Storage Data Protector

Oracle Database 10g: Backup and Recovery 1-2

Oracle VM Server Recovery Guide. Version 8.2

Adatbázis hibrid felhő - egyszerűbb, mint gondolná

Running Successful Disaster Recovery Tests

Best Practices Guide Revision B. McAfee epolicy Orchestrator Software

Tushar Joshi Turtle Networks Ltd

High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach

Contingency Planning and Disaster Recovery

Dell UPS Local Node Manager USER'S GUIDE EXTENSION FOR MICROSOFT VIRTUAL ARCHITECTURES Dellups.com

NNMi120 Network Node Manager i Software 9.x Essentials

DeltaV Virtualization High Availability and Disaster Recovery

Application Continuity with BMC Control-M Workload Automation: Disaster Recovery and High Availability Primer

IT-Pruefungen.de. Hochwertige Qualität, neueste Prüfungsunterlagen.

Panorama High Availability

Configuring an OpenNMS Stand-by Server

NView NNM Network Management System

INTRODUCTION TO CLOUD MANAGEMENT

Veritas Cluster Server

How To Fix A Powerline From Disaster To Powerline

WHITE PAPER: TECHNICAL. Enhancing Microsoft SQL Server 2005 Availability with Veritas Storage Foundation for Windows

Data Replication INSTALATION GUIDE. Open-E Data Storage Server (DSS ) Integrated Data Replication reduces business downtime.

1Y0-250 Implementing Citrix NetScaler 10 for App and Desktop Solutions Practice Exam

Course Description and Outline. IT Essential II: Network Operating Systems V2.0

Ultimate Guide to Oracle Storage

Symantec NetBackup OpenStorage Solutions Guide for Disk

Kaseya 2. Quick Start Guide. for Network Monitor 4.1

Maximum Availability Architecture

Juniper Networks Secure Access. Initial Configuration User Records Synchronization

Network Management Card. User Manual

MySQL Enterprise Monitor

High Availability & Disaster Recovery Development Project. Concepts, Design and Implementation

Managing and Maintaining Windows Server 2008 Servers (6430) Course length: 5 days

Parallels Virtuozzo Containers 4.7 for Linux

Transcription:

Implementing Highly Available OpenView Ken Herold Senior Integration Consultant Melillo Consulting

Outline Overview of High Availability (HA) Highly Available NNM Highly Available ITO Management of HA clusters Integrated products Performance products

Demand for HA solutions Shift in focus on NSM solutions in the marketplace NSM core IT department function IT departments accountable to business units Driven by Service Level Agreements (SLAs) Impact of reporting

Highly Available NNM Collection station failover Implemented in Distributed Internet Discovery & Monitoring (DIM) Allows a Management Station (MS) to pick up status polling responsibility for a failed Collection Station (CS) Highly Available cluster Allows NNM to run on a cluster of 2 or more servers that provide continuous availability

Distributed Architecture Overview NNM Station A (Management Station) Changes in status and topology are relayed from the collection stations (stations B through n) to management station A. NNM Station B (Collection Station) NNM Station C (Collection Station) Discovery & Status Polling occurs locally NNM Station n (Collection Station)

Failover Configuration Failover filter Allows specified objects to be polled by MS during CS failure Enabling CS failover xnmtopoconf -failover {stationlist} Applying failover filter xnmtopoconf -failoverfilter {filter} {stationlist}

Limitations of DIM for HA Scaling Limited by number of devices polled and the polling interval Practical size of object database (primary & secondary objects) Capacity of network connection between MS & CS Event data may be lost since SNMP traps are not forwarded when CS fails Multiple CS failures may create problems for MS

Highly Available ITO Multiple Managers Manager of Managers Peer Managers Follow-the-Sun Hardware Backup Cold Standby Highly Available Cluster

Multiple Managers Manager of Managers (MoM) Single ITO server manages multiple ITO servers Peer Managers Multiple managers with a designated responsibility providing redundancy to one another Follow-the-Sun Management responsibility moves based on time of day

Limitations of Multiple Managers Scalability Number of managed nodes per ITO server Number of messages received in the ITO browser Number of operator logons Speed of network connections to remote sites NNM configuration issues Agents must be told to report to new ITO server (time issue)

Hardware Redundancy Cold Backup Cost effective 1 server backs up multiple ITO servers Shared storage device not required Downtime may be unacceptable Configuration of failed server loaded after failover Message issues No synchronization of current message data Latency detecting events while message buffers are cleared Need to implement configuration synchronization

Hardware Redundancy Hot Standby ITO servers implemented in HA cluster Rapid, automated failover of a failed ITO server Configuration and messages data shared between nodes Upgrades & patches require more effort to install Costly solution Requires shared disk array MC / ServiceGuard software required Can provide LAN failover

Basic HA Definitions Package - application and associated processes can only run on 1 node in the cluster Service - a process monitored by MC/SG Original Node - node where package existed before failover Adoptive Node - node that takes control of a package

Overview of a Cluster Bridge LAN 1 IP: 10.1.50.25 LAN 2 Virtual IP: 10.1.50.27 IP: 10.1.50.26 Node 1 Heart Beat LAN Node 2 Package A Originating Node Shared Data Adoptive Node

Steps to Implement HA Create a volume group on shared device Create logical volumes in that group /etc/opt/ov/share /var/opt/ov/share /opt/ov/opc_sg /u01/oradata/openview (DB files) /u01/app/oracle/product (DB binaries) Create fully qualified hostname & IP address for ITO package Activate & mount volume group on primary node vgchange -a y /dev/{volume_group}

Steps to Implement HA Install Oracle binaries on primary node Install ITO binaries on primary node Install latest ITO/NNM patches on primary node Configure the ITO database (opcconfig) Select MC/SG installation Use fully qualified name for package Shared lvol is /opt/ov/opc_sg Configure DB automatically Do not enable startup at boot time Configure startup of ITO processes manually in $OV_LRF directory: ovaddobj ovoacomm.lrf ovaddobj opc.lrf

Steps to Implement HA Modify /etc/oratab to enable autostart Verify ITO/NNM starts Modify ov.conf file Clean copy - /opt/ov/newconfig/ovnnm-run/conf/ov.conf Create ov.conf.host1 & ov.conf.host2 Modify HOSTNAME= field in each to match the local hostname Modify NNM_INTERFACE= to match floating IP Modify USE_LOOPBACK= to ON Modify NNM auth files ovw.auth ovwdb.auth ovspmd.auth

Steps to Implement HA Verify ITO/NNM starts First copy ov.conf.host1 to ov.conf Install bits on 2nd node Do not run opcconfig Verify operation of NNM Modify opcsvinfo file on both to include lines: OPC_SG TRUE OPC_SG_NNM TRUE Switch shared volume to 2nd node Shutdown OpenView & Oracle Unmount all 5 volumes Deactivate the shared volume group Activate VG & mount lvols on 2nd node

Steps to Implement HA Copy the following to the 2nd node: /etc/oratab /etc/opt/ov/share/conf/ovdbconf Create new server registration file: cd /etc/opt/ov/share/conf/opc/mgmt_sv mv svreg svreg.old touch svreg opt/ov/bin/opc/install/opcsvreg -add itosvr.reg rm svreg.old Remove ovserver file: rm /var/opt/ov/share/databases/openview/ovwdb/ovserver

Steps to Implement HA Run opcconfig Do not configure the DB automatically Create ITO package Modify monitor scripts to watch necessary OpenView processes Configure package control scripts Verify operation of ITO on both cluster nodes independently Test failover by killing monitored process

Configuration Files ito.ctl Package control file, contains steps necessary to activate/deactivate ITO in failover ito.mon Describes processes that the package monitors ito_create_new_svreg Creates new server reg file on failover, invoked by ito.ctl ito_start_sgtrapi & ito_stop_sgtrapi Starts & stops trap interceptor on ITO server

Common Problems Failure to modify opcsvinfo file OPC_SG_NNM_STARTUP TRUE OPC_SG TRUE Failure to remove ovserver file If in doubt remove & let it get re-created Failure to modify NNM auth files Failure to modify the ov.conf file HOSTNAME, NNM_INTERFACE fields

Keep in Mind Patches must be applied to each node Key commands cmhaltpkg ITO cmrunpkg -n {node} -v ITO cmviewcl cmmodpkg -e ITO cmrunnode {node} An ovstop will cause switchover Processes to monitor for failover

Managing HA Clusters Managing the Management Cluster Trap Template Log files Process monitors Managing HA packages Process monitors Log file issues

Managing HA Clusters Shared log files Monitored from active node Copy contents to log.node on failover, start with clean slate Use close after read to avoid corruption Use read from last file position Do not use message on no logfile Monitors Intelligent monitors that read output of cmviewcl to determine state

Managing HA Clusters Trap interceptor Assign to virtual node Started up on active node with ito_start_sgtrapi Stopped on inactive node with ito_stop_sgtrapi

Notification & Trouble Ticketing in HA clusters Connect to TT using virtual hostname Databases will be synched Intelligence needed to move SPI monitoring Include startup/shutdown of SPI in package control scripts HW Notification requires connection to each server SW based use virtual hostname to connect

PerfView & MeasureWare in HA Environments Management Server PerfView binaries loaded on every cluster node Use shared data repository as MW data collection Each node connects to MW agent as necessary Managed Node MW agent runs on each cluster node to collect performance data Application data collected on cluster node that currently runs the package

Thank You for Attending Ken Herold Melillo Consulting, Inc.