Monitoring an HP platform based solution and SNMPv2/v3 alarm forwarding and synchronization with an existing NMS Roberto Pulvirenti (roberto.pulvirenti@gfmnet.it) Powered by Antonio Russo March 15th, 2013
Background A vendor/system integrator provided a Customer Experience Management (CEM) solution to one of the biggest mobile operator, able to load, aggregate and transform the data from the OpCos in a central repository where OpCos can schedule ad-hoc reports. The solution included an Element Management System offering fault and performance management functions based on OpenNMS, which was configured and customized in order to: monitor with the adequate depth Oracle processes, HP Brocade SAN switches, Cisco switches, HP Blade servers, HP EVA 4400 storage, OS resources (disks, memory ), other relevant processes and services. be installed and deployed in high availability exploiting the clustering functionality offered by the operating system (Rhel Cluster). act as single point of integration for the northbound Umbrella Management System (Netcool) forward to Netcool only the alarms that are evaluated as relevant for VF according to configurable filter criteria forward traps via SNMPv3 or SNMPv2 and according to a MIB wrapping the alarm format managed by OpenNMS. feature a trap based heartbeat functionality just to periodically notify Netcool that OpenNMS is alive implement a trap based synchronization functionality that allows Netcool to synchronize its alarms on demand www.gfmintegration.com 2
Hardware Architecture SAI CORE Reporting Server BOE Server ETL Server BODS Server SAI CORE Reporting Server BOE Server ETL Server BODS Server OpenNMS/FTM LAN switches SNMP Oracle Real Application Cluster OpenNMS/FTM SAN switches SAN Disk 6 x Blade: 507864-B21 HP BL460c G6 2 x Cisco Catalyst 3020 Blade Switch for HP c-class BladeSystem 1x StorageWorks: HP EVA4400 2 x HP Brocade 8/24c SAN Switch for HP c-class BladeSystem 1 x HP OnBoard Admin Console www.gfmintegration.com 3
High level logical architecture IBM netcool/ ITNM Global NOC in Germany Traps Local NOC in Spain OpenNMS Traps/polling SAI Application incl. Oracle File Transfer Application VGE Application Infrastructure Components e.g. ETL jobs, reporting, B&R e.g. retry limit exceeded, user unknown e.g. filter failed e.g. HP Server, storage, cisco, tapes www.gfmintegration.com 4
Deployment 2 Blade servers in active-standby availability exploiting RHEL5.3 cluster OCFS2 as general-purpose shared-disk cluster file system Clustered resources: o o opennms-rg (including PostgreSQL and OpenNMS) with its own VIP ftm-rg (a file transfer/parsing application required by the project) with its own VIP OpenNMS Version 1.8.10 [root@nms opennms]# ll $OPENNMS_HOME total 36 drwxr-xr-x 2 root root 4096 May 17 10:23 bin drwxr-xr-x 8 root root 4096 May 23 13:44 contrib lrwxrwxrwx 1 root root 16 Mar 24 16:28 etc -> /app/opennms/etc drwxrwxr-x 7 root root 4096 May 18 15:05 etc.orig drwxr-xr-x 5 root root 4096 Mar 22 19:45 jetty-webapps drwxr-xr-x 8 root root 20480 May 17 00:50 lib lrwxrwxrwx 1 root root 22 Mar 24 16:30 logs -> /app/opennms/data/logs lrwxrwxrwx 1 root root 23 Mar 24 16:30 share -> /app/opennms/data/share Filesystem Size Mounted on Notes /dev/mapper/fc5p1 /dev/mapper/fc5p2 /dev/mapper/fc5p3 12 GB /app/opennms/etc OpenNMS configuration files 51 GB /app/opennms/data OpenNMS logs and rrd files 60 GB /app/opennms/pgsql PostgreSQL database www.gfmintegration.com 5
Delivered packages and main installation steps NSN provides the following packages for installing OpenNMS for CEM solution: Opennms_1.8.10.tar.gz. This contains the the packages for installing OpenNMS (release 1.8.10) without any customized upgrades FWSYNC_1.8.10_2.0.5.tar.gz. This contains the jar files that upgrade OpenNMS to cover customer s requirements. config_template_1.0.0.tar.gz. This provides some configuration files proposed as template for the customized solution. VFGUI.tar.gz. This provides few files that update the OpenNMS WEB GUI with logo and colours that better recall the customer style. Setup yum repository with ISO RHEL CDROM image on OpenNMS nodes just to make easy installation of OS packages. Install net-snmp package on all Cem servers Setup yum local repository related to OpenNMS on OpenNMS nodes Install HP SNMP agents on all Cem servers Install SUN JDK package on OpenNMS nodes Install Postgresql-9.0 on OpenNMS nodes Install OpenNMS on OpenNMS nodes Install the NSN customization for forwarding and synchronize alarms with Netcool on OpenNMS nodes. Deploy configuration files proposed as templates in order to speed up the required configuration and following provisioning process. This should be applied on the shared storage so needs to be executed only from one node. Add Vodafone logo in the GUI on OpenNMS nodes Cluster installation and configuration www.gfmintegration.com 6
OpenNMS features exploited for monitoring CEM solution Eventd: All HW and applications MIBs have been properly analyzed and an accurate file excel has been written to describe relevant traps to be properly alarmed and deduplicated or cleared. Event XML files have been added or heavily changed to reflect the excel file: CPQHPIM.events.xml (CPQHLTH-MIB, CPQRACK-MIB, CPQRPM-MIB, CPQHOST.MIB, CPQSTSYS.MIB, CPQSINFO.MIB, CPQSTDEQ.MIB, CPQCMC.MIB, CPQSM2.MIB, CPQNIC.MIB, CPQIDA.MIB, CPQFCA.MIB, CPQIODRV.MIB, EVA4400_ABM.MIB) Brocade.fcmgmt.events.xml Cisco.events.xml / Cisco2.events.xml FTM.events.xml (File Transfer Manager application) SAI.events.xml (Serve atonce Intelligence application) Capsd (capsd-configuration.xml), Pollerd (poller-configuration.xml), Collectd (collectd-configuration.xml, snmp-config.xml, datacollectionconfig.xml), Threshd (threshd-configuration.xml, thresholds.xml, programmatic.events.xml), Event Translator (translator-configuration.xml, Service.translator.events.xml ), Provisiond. www.gfmintegration.com 7
What is monitored exactly? Group name in opennms NodeLabel Description - OS services (NTPd, SSH, rgmanager, cman, HP SNMP agents...). specemusnm01p - HW/OS system traps of this server. - Threshold event for RAM, CPU, disks, eth port utilization OpenNMSFTM - OS services (NTPd, SSH, rgmanager, cman, HP SNMP agents). specemusnm02p - HW/OS system traps of this server. - Threshold event for RAM, CPU, disks, eth port utilization - Monitoring FTM services availability (services associated to ftm-rg specemusftp00p like LDAP, GUI service). No traps VIPs SAI specemusmgmtnm00p specemusbo01p specemusbo02p - Monitoring the availability of the OpenNMS VIP. No traps - OS services (NTPd, SSH, HP SNMP agents...). - HW/OS system traps of this server. - Threshold event for RAM, CPU, disks, eth port utilization - OS services (NTPd, SSH, HP SNMP agents...). - HW/OS system traps of this server. - Threshold event for RAM, CPU, disks, eth port utilization specemusrep00p - Monitoring some Cem services (SAI admin) in high availability VIPs specemusadm00p - Monitoring other Cem services (SAI reporting) in high availability - OS services (NTP, SFTP ), Oracle services specemusdb01p - HW/OS system traps of this server - Threshold event for RAM, CPU, disks, eth port utilization Oracle - OS services (NTP, SFTP ), Oracle services specemusdb02p - HW/OS system traps of this server - Threshold event for RAM, CPU, disks, eth port utilization specemusfcsw01p - Monitoring Brocade SAN switches (SNMP, ICMP) FcSwitches - system traps of this device specemusfcsw02p - Threshold event for RAM, CPU, disks, eth port utilization CiscoSwitches EvaStorage Console specemusobadm01p - Monitoring Admin console (SNMP, ICMP) specemussw302001p specemussanadm01p - - Monitoring Cisco switches (SNMP, ICMP) Monitoring Eva Storage (SNMP, ICMP) specemussw302002p - - Threshold event for RAM, CPU, disks, eth port utilization Threshold event for RAM, CPU, disks, eth port utilization www.gfmintegration.com 8
SNMP v2/v3 alarm forwarding & synchronization: Netcool integration alarmtrap (normal forwarding) heartbeat trap OpenNMS syncrequesttrap startsynctrap alarmtrap (synchronization) endsynctrap Netcool www.gfmintegration.com 9
Issues analyzed and addressed in OpenNMS 1.8.10 (as per customer reqs) Forwarding and synchronization alarms according to a Event/alarm filterable criteria Additional opennms.scriptd helper classes developed opennms-services-1.8.10.jar Forwarding traps should support SNMP v2 and v3 but until now Traps are forwarded according to SNMP v1 extension jars developed to support snmp v2c informs and snmp v3 traps org.opennms.lib.snmp.api-2.0.5.jar org.opennms.lib.snmp.joesnmp-2.0.5.jar org.opennms.lib.snmp.snmp4j-2.0.5.jar Provide to Netcool evidence of the deduplication, alarms raised/cleared automatically in OpenNMS, but OpenNMS just forwards the events without reduction-key! New traps to be defined Implement logic for the integration with Netcool according to reqs Heartbeat trap from the active instance OpenNMS to Netcool new opennms mib and events definition configure Bean Shell script Check OpenNMS status via crontab opennms.mib opennmsmib.events.xml scriptd-configuration.xml opennms_status.sh www.gfmintegration.com 10
Opennms scriptd helper classes used in scriptd-configuration.xml EventMatch. Interface that is able to specify criteria to match Events. EventPolicyRule. Its implementation classes allow to decide if an event should be forwarded or not thanks to the following three methods: o adddroprule(eventmatch eventmatch) o addforwardrule(eventmatch eventmatch) o filter(org.opennms.netmgt.xml.event.event event) EventSynchronization. Its implementation class performs synchronization sending all the active alarms defined on opennms. SnmpTrapHelper. This "helper" class provides a convenience interface for generating and forwarding SNMP traps. www.gfmintegration.com 11
OpenNMS MIB The opennms mib version 1.3 was only able to send opennms events as snmp v1 traps. It has been now upgraded (and productized) to fully support Snmp v2c and the following traps to support snmp based alarm synchronization: alarmtrap (oid.1.3.6.1.4.1.5813.1, generic 6, specific 3) - This is the definition of the generic OpenNMS trap with the addiction for alarm information. Two new varbinds have been added: alarmid an alarm identifer used for alarm reduction and correlation, synchronization to specify if the trap comes from a sync request; heartbeattrap (oid.1.3.6.1.4.1.5813.1, generic 6, specifc 4) - Trap sent periodically by OpenNMS to keep alive external SNMP Manager; startsynctrap (oid.1.3.6.1.4.1.5813.1, generic 6, specifc 5) - Synchronization Process is started Trap sent by OpenNMS station; endsynctrap (oid.1.3.6.1.4.1.5813.1, generic 6, specifc 6) - Synchronization Process is successful ended Trap sent by OpenNMS station; syncrequesttrap (oid.1.3.6.1.4.1.5813.1, generic 6, specifc 7) - Trap sent to OpenNMS to start a Synchronization. This was also added in in opennmsmib.events.xml as <uei>uei.opennms.org/traps/syncrequesttrap</uei> www.gfmintegration.com 12
From Eventd to Scriptd www.gfmintegration.com 13
scriptd-configuration.xml (1/3) <?xml version="1.0"?> <scriptd-configuration> <engine language="beanshell" classname="bsh.util.beanshellbsfengine" extensions="bsh"/> <start-script language="beanshell"> import org.opennms.netmgt.scriptd.helper.ueieventmatch; import org.opennms.netmgt.scriptd.helper.ueialarmmatch; import org.opennms.netmgt.scriptd.helper.eventpolicyruledefaultimpl; import org.opennms.netmgt.scriptd.helper.alarmeventsynchronization; import org.opennms.netmgt.scriptd.helper.dbhelper; import org.opennms.netmgt.scriptd.helper.snmptraphelper; import org.opennms.netmgt.snmp.snmptrapbuilder; import org.opennms.netmgt.xml.event.event; log = bsf.lookupbean("log"); snmptraphelper = new SnmpTrapHelper(); internaleventmatch = new UeiEventMatch("~^uei.opennms.org/internal/.*$"); alleventmatch = new UeiEventMatch("~^uei.opennms.org/.*$"); allalarmmatch = new UeiAlarmMatch("~^uei.opennms.org/.*$"); policy = new EventPolicyRuleDefaultImpl(); policy.adddroprule(internaleventmatch); policy.addforwardrule(allalarmmatch); policy.adddroprule(alleventmatch); sync= new AlarmEventSynchronization(); www.gfmintegration.com 14
scriptd-configuration.xml (2/3) void forward(event event, boolean sync) { try { long traptimestamp = 0; SnmpTrapBuilder trap = snmptraphelper.createv2trap(".1.3.6.1.4.1.5813.1.3",long.tostring(traptimestamp)); if (event.alarmdata!= null ) { if (event.alarmdata.alarmtype == 2) { severity = "Cleared"; alarmid=event.alarmdata.clearkey; } else { severity=null; alarmid=event.alarmdata.reductionkey; } We are still forwarding an event, but only event with alarmdata!! We are just getting reductionkey or clearkey from alarmdata object attribute of event class!!! } t_dbid = new Integer(event.dbid).toString(); if (t_dbid!= null ) snmptraphelper.addvarbinding(trap, ".1.3.6.1.4.1.5813.20.1.1.0", "OctetString", "text", t_dbid); else snmptraphelper.addvarbinding(trap, ".1.3.6.1.4.1.5813.20.1.1.0", "OctetString", "text", "null"); if (event.distpoller!= null) snmptraphelper.addvarbinding(trap, ".1.3.6.1.4.1.5813.20.1.2.0", "OctetString", "text", event.distpoller); else <!--add other varbind of the trap--> <!--.--> trap.send("xx.xxx.xxx.xxx", 162, "public"); } catch (e) { } } </start-script> www.gfmintegration.com 15
scriptd-configuration.xml (3/3) <stop-script language="beanshell"> snmptraphelper.stop(); <!--executing a stop script--> </stop-script> <event-script language="beanshell"> event = bsf.lookupbean("event"); event = policy.filter(event); if (event == null) { log.debug("event is filtered: not forwarding"); } else { forward(event,false); <!--forwarding event--> } </event-script> <event-script language="beanshell"> <uei name="uei.opennms.org/traps/syncrequesttrap" /> long traptimestamp = 0; <!--sending start sync trap--> SnmpTrapBuilder trap1 = snmptraphelper.createv2trap(".1.3.6.1.4.1.5813.1.5",long.tostring(traptimestamp)); trap1.send( xx.xxx.xxx.xxx", 162, "public"); for (e: sync.events ) { <!--for each synchronized event (current active alarms)--> e = policy.filter(e); if (e == null) { log.debug("sync event is filtered: not forwarding"); } else { forward(e,true); <!--forwarding active alarm during synchronization session--> } } <!--sending end sync trap--> SnmpTrapBuilder trap2 = snmptraphelper.createv2trap(".1.3.6.1.4.1.5813.1.6",long.tostring(traptimestamp)); trap2.send( xx.xxx.xxx.xxx", 162, "public"); </event-script> www.gfmintegration.com 16
Forwarding Snmp v3 Alarm Traps import org.opennms.netmgt.snmp.snmpv3trapbuilder; void forward(event event, boolean sync) { try { SnmpV3TrapBuilder trap = snmptraphelper.createv3trap(".1.3.6.1.4.1.5813.1.3",long.tostring(traptimestamp)); snmptraphelper.addvarbinding(trap,..) trap.send( xx.xxx.xxx.x", 162, 2, "traptest", "mypassword", "SHA", "mypassword2", "AES"); <!-- the arguments are: IP, port. Authpriv (snmpv3 security level), username, authentication passphrase, authentication protocol, privacy passphrase, privacy encryption protocol --> } www.gfmintegration.com 17
Final considerations Manual cleared alarms on OpenNMS alarm view page cannot be forwarded automatically (event is forwarded to scriptd and not alarms) But synchronization is requested every day and after that the issue is healed on Netcool, where the customer implemented a logic to automatically reconciliate alarms after synchronization I know alarms could be forwarded via REST API, but the customer didn t want to implement this simple client The customized OpenNMS solution is currently working in production without any known issues (even if by using snmp v2) During Testing acceptance phase we didn t get any relevant fault www.gfmintegration.com 18