Tivoli Netcool Performance Manager 1.3.1 Wireline Component Document Revision R2E1 IBM SNMP Inventory Management and Troubleshooting Guide
Contents Preface........................................................................................ 4 Audience.......................................................................... 4 Organization....................................................................... 4 The Tivoli Netcool Performance Manager Product Suite................................... 4 Tivoli Netcool Performance Manager Documentation..................................... 5 Chapter 1: Introduction..................................................................... 1 Overview......................................................................... 1 Discovery......................................................................... 2 Metrics and Properties............................................................ 2 Inventory Synchronization and Change Management...................................... 2 Change Management for Elements.................................................. 2 Change Management for Sub-Elements.............................................. 3 Grouping Sub-Elements.............................................................. 4 Where to Go From Here............................................................. 4 Chapter 2: SNMP Inventory Troubleshooting............................................ 5 Overview.......................................................................... 5 Discovery Troubleshooting........................................................... 7 Discovery Does Not Start......................................................... 7 Discovery Starts But Issues Warning Messages....................................... 13 Discovery Seems to Hang or Never Finishes......................................... 18 Synchronization Troubleshooting..................................................... 21 Synchronization (Elements)....................................................... 21 Synchronization (Sub-elements).................................................... 26 Grouping Troubleshooting........................................................... 28 Monitoring the Tivoli Netcool Performance Manager Log File............................. 29 Burned Subelements................................................................ 30 Scenario 1 - Instance Shift Causes Disconnect........................................ 30 Scenario 2 - Instance Shift Causes Burn............................................. 31 Where to Go From Here............................................................ 32 Netcool/Proviso SNMP Inventory Management and Troubleshooting Guide, Version 4.4.1 2
Contents Chapter 3: SNMP Inventory Management................................................33 Overview........................................................................ 33 Routine SNMP Inventory Management Tasks........................................... 33 Finding Elements and Sub-elements About to Reach Their Retry Limit................... 33 Finding Elements and Sub-elements That Have Been Retired............................ 35 Where to Go From Here............................................................ 36 3 Netcool/Proviso SNMP Inventory Management and Troubleshooting Guide, Version 4.4.1
Preface The purpose of this guide is to help you manage and troubleshoot problems with the Tivoli Netcool Performance Manager SNMP Inventory. Audience The audience for this guide is the Tivoli Netcool Performance Manager administrator. Organization This guide is organized as follows: Chapter General information about the audience, guide, conventions, documentation, and technical support. Chapter 1, Introduction on page 1 Provides an overview of the Tivoli Netcool Performance Manager SNMP Inventory process. Chapter 2, SNMP Inventory Troubleshooting on page 5 Provides a list of troubleshooting tasks that the Tivoli Netcool Performance Manager Administrator might be called upon to perform during an SNMP Inventory. Chapter 3, SNMP Inventory Management on page 33 Provides a list of recurring management tasks that the Tivoli Netcool Performance Manager Administrator should perform on a regular basis. The Tivoli Netcool Performance Manager Product Suite Tivoli Netcool Performance Manager is made up of the following components: Tivoli Netcool Performance Manager DataMart is a set of management, configuration and troubleshooting GUIs that the Tivoli Netcool Performance Manager System Administrator uses to define policies and configuration, as well as verify and troubleshoot operations. Tivoli Netcool Performance Manager DataLoad provides flexible, distributed data collection and data import of SNMP and non-snmp data to a centralized database. Tivoli Netcool Performance Manager DataChannel aggregates the data collected through Tivoli Netcool Performance Manager DataLoad for use by the Tivoli Netcool Performance Manager DataView reporting functions. It also processes on-line calculations and detects real-time threshold violations. Tivoli Netcool Performance Manager DataView is a reliable application server for on-demand, webbased network reports. Tivoli Netcool Performance Manager Technology Packs extend the Tivoli Netcool Performance Manager system with service-ready reports for network operations, business development, and customer viewing. The following figure shows the different Tivoli Netcool Performance Manager modules. Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 4
Preface Figure 1: Tivoli Netcool Performance Manager Modules DataLoad collects network data. DataChannel computes aggregations and stores data in DataMart. DataMart provides data management and applications. DataView produces and manages reports. Tivoli Netcool Performance Manager Documentation Tivoli Netcool Performance Manager documentation consists of the following: release notes configuration recommendations user guides technical notes online help The documentation is available for viewing and downloading on the infocenter at: http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.netcool_pm.doc/welcome_tnpm.html 5 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
Chapter 1: Introduction This chapter provides an introduction to the Tivoli Netcool Performance Manager SNMP Inventory, and is made up of the following topics: Topic Page Overview 1 Discovery 2 Inventory Synchronization and Change Management 2 Grouping Sub-Elements 4 Where to Go From Here 4 Overview Tivoli Netcool Performance Manager allows the operator to decide how much the Tivoli Netcool Performance Manager DataMart will rely upon the OSS Inventory system. The Inventory system can be virtually anything from a full-featured commercial Inventory package, to an EMS or Node Manager like HP Open View, to a flat file like /etc/hosts. The minimum required is a list of the IP addresses of resources to monitor. Tivoli Netcool Performance Manager can discover both elements (resources that have an IP address, such as a router or a switch), and the sub-elements associated or contained with them, such as an interface or a port. Tivoli Netcool Performance Manager supports the following three modes of element and sub-element discovery: Mode Inventory Contains Tivoli Netcool Performance Manager Discovers 1 Nothing Elements, sub-elements 2 Elements Sub-elements 3 Elements, Sub-Elements Nothing Most Tivoli Netcool Performance Manager deployments are in mode two. In this mode, Tivoli Netcool Performance Manager imports a list of elements and then walks through the MIB to discover the sub-elements. In the first mode, Tivoli Netcool Performance Manager sweeps the network to discover the elements and their associated sub-elements. Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 1
Chapter 1: Introduction Discovery Tivoli Netcool Performance Manager s Discovery capabilities include some powerful and flexible tools that allow you to determine exactly what Tivoli Netcool Performance Manager will monitor, and how the sub-elements will be labeled and grouped. These capabilities make it possible to initiate automatically data collection, threshold monitoring, and reporting on discovered elements. Using a formula language, Tivoli Netcool Performance Manager can be configured to walk through an element s MIBs to discover particular MIBs representing users, tunnels, protocols, service classes or other sub-elements. Particular OIDs can be used to automatically create a label for the sub-element. For example, the sub-element label could be a combination of the element name, the interface, the port and the customer name, all taken from the MIB. Metrics and Properties In addition to the identifier of the sub-element and the metrics collected for it, Tivoli Netcool Performance Manager allows the operator to create any number of user-defined properties. There are two main differences between metrics and properties. Metrics come from a monitored resource and are used to calculate statistics that are the basis of performance reports and alarm thresholds. Metrics are generally numeric values that change frequently, like the number of packets transmitted or a resource s availability. Properties, by contrast, are values that change less frequently, such as the CIR (committed information rate) or the location of the element. Properties consist of metadata-like identifiers or labels for such things as the customer and/or the services using a particular sub-element. The values for properties can be discovered automatically from the monitored resource, or they can be imported from Inventory, provisioning or from another OSS component. Inventory Synchronization and Change Management Sub-element properties such as the CIR or customer name can change. Tivoli Netcool Performance Manager tracks the change and the time of the change, so that reports are displayed correctly. For example, utilization may be calculated against CIR. After the CIR is updated, reports must reflect the new value for utilization calculations. But reports that show dates prior to the CIR update must use the old CIR value. Tivoli Netcool Performance Manager manages this without error. If a sub-element is assigned to a new customer, the customer property will change. If the sub-element is in a particular customer s group, this can cause the sub-element to move to a new group. This can change the collection, alarm thresholds and reporting for that sub-element, automatically. Change Management for Elements Unfortunately, Inventory is not as simple as sweeping a range of IP addresses to identify the network elements. That is just the beginning of the process. The Inventory must track changes so that continuity of meta-data associated with the elements (such as associations to customers, VPNs and services) can be maintained. At least one additional challenge remains to keep the element Inventory accurate, as shown with these two problem statements: IP Address changes 2 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
Chapter 1: Introduction Problem: If you are tracking a router by its IP address, and you discover a router at a new IP address, how do you know if it is a new router, or an existing router with a changed IP address? Tivoli Netcool Performance Manager solves this problem by associating additional properties with each element which provide additional continuity and trace-ability in the face of IP address changes. These additional properties can be discovered from the device itself, like SNMP sysname, or gathered externally, like the name resolved from the IP address of the element s management interface. Name changes Problem: If you are tracking a router by its name, and you discover that the name has changed, how do you know if it is a new router with that IP address, or an existing router with a changed name? Tivoli Netcool Performance Manager does not track elements by their name or any other single property. Instead, by tracking a combination of properties, Tivoli Netcool Performance Manager is able to provide continuity to inventory even when any of these properties change. By automatically tracking changes to an element, rather than discovering it as a new element or forcing the operator to manually update the database, Tivoli Netcool Performance Manager helps reduce operating costs as follows: Performance and trend reports for the element show the entire history of the element, without interruption. Changes to the element are shown in historical reports so they can be correlated to problems or changes in performance. Meta-data, such as location, community string, or other properties remains associated with the element, saving the operator from having to re-enter this. Inventory accuracy is improved because the update operation is automatic, not manual, eliminating errors. Inventory accuracy is improved because synchronization is automated, eliminating manual delays. Change Management for Sub-Elements In addition to the challenge of detecting and correctly managing changes on sub-elements, it is important to display this information correctly on reports. From an external (customer) point of view, subelement changes should be invisible. From an internal (network operations) perspective, the change must be visible. Tivoli Netcool Performance Manager manages all of this automatically. There are many reasons why the identifier (in SNMP, the Object Identifier, or OID) might change for a particular sub-element. Assuming that the sub-element is a port or virtual circuit residing on an interface, some of the changes will be due to failure and recovery scenarios, or network reconfigurations due to growth: Adding or removing an interface card can cause the SNMP indexes to shift for other sub-elements. The interface the sub-element resides upon might fail, forcing the service associated with the sub-element to be moved to another interface. The service may be moved to a currently unused sub-element.the service may be moved to a sub-element in use, and the service currently on the sub-element is moved to another sub-element Most network changes should be invisible to customers. Their reports should reflect the quality of their service, and moves and changes to the network to preserve their service should be invisible to them. This is particularly important for SLA reporting. You certainly want to avoid forcing the customer to view two reports, one for the original NIC and a second report for the replacement NIC. Throughout the network changes, network operations and engineering staff must have an accurate view of the actual sub-elements. For troubleshooting and capacity planning purposes, they should have a historical view of performance and traffic on a particular port, with information on changes that have occurred. Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 3
Chapter 1: Introduction Grouping Sub-Elements Properties can be used to automatically group sub-elements. For example, sub-elements can be grouped according to technology, customer, service or site. Groups can be hierarchical, so it is possible to create structures like the following: Site/Technology, to see all ATM SVCs in the New York POP. Customer/Service, to show all of the services a particular customer has subscribed to. Technology/Site, to see which sites are generating the most Frame Relay activity. Sub-elements can exist in multiple groups simultaneously. For example, a sub-element might be part of a network operations group and a particular customer s group. Where to Go From Here For information on troubleshooting tasks to perform after a new SNMP Inventory has been run, see Chapter 2, SNMP Inventory Troubleshooting on page 5. For information on periodic administrative tasks to perform, see Chapter 3, SNMP Inventory Management on page 33. 4 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
This chapter discusses SNMP Inventory troubleshooting, and is made up of the following topics: Topic Page Overview 5 Discovery Troubleshooting 7 Synchronization Troubleshooting 21 Grouping Troubleshooting 28 Monitoring the Tivoli Netcool Performance Manager Log File 29 Where to Go From Here 32 Overview The Tivoli Netcool Performance Manager SNMP Inventory consists of the following three major phases, which usually happen sequentially: SNMP Discovery Detects all resources on a target network and creates a virtual image of the network. Synchronization Compares the virtual network image generated by the Discovery with the records in the Tivoli Netcool Performance Manager database that were created by the previous Inventory run. Any modifications (new, missing, or renamed resources, for example) are then synchronized through the application of various algorithms, and the new network image is written to the database. Grouping Updates the grouping structure in the database, which determines the kind of information that is to be collected on each resource, element, sub-element, and so forth. In almost all cases, Tivoli Netcool Performance Manager s SNMP Inventory requires virtually no operator intervention. However, under certain circumstances, problems arise which you will need to address. The following sections discuss the more common problems you are likely to encounter and where possible provide suggestions for remedial actions. We strongly suggest that you monitor the logs for potential error messages by doing one of the following: Running a Discovery from the command line. If you run a Discovery from the command line, redirect STDER to a log file, as follows: inventory -nox -action discovery -name lowell >output 2>error_log Note: For a complete list of error messages written to the Tivoli Netcool Performance Manager log file, see the Tivoli Netcool Performance Manager Error Messages Guide. For more information on using the Tivoli Netcool Performance Manager log file, see Monitoring the Tivoli Netcool Performance Manager Log File on page 29. Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 5
Running a Discovery from the DataMart GUI If you use the DataMart GUI to initiate a Discovery, error messages will appear on the DataMart GUI- >Resource tab->inventory Tool icon -> Live Information tab, as shown in Figure 2:. Figure 2: Errors Displayed in the DataMart GUI General profile settings. Discovery run ID Command to extract all messages of a specified IP address from: $PVMHOME/log/TraceInventory.log Discovery progress The Inventory Tool prints out messages like the following every five seconds: 2005/12/09 13:46:52 [PL2DBS1, 238 sec, IP done.1/ SNMP done.1/ Elmt 0.1.0/ SubElmt 0.0.0] These messages explain the progress of the discovery as follows: IP done.1 Indicates that the IP phase of the discovery process has completed. SNMP done.1 Indicates that the SNMP phase of the discovery process has completed. Elmt 0.1.0 Indicates that progress of discovered elements, using the following syntax: numberofobjectsininputqueue.numberofthreadsrunning.numberofelementsdiscovered SubElmt 0.0.0 Indicates that progress of discovered sub-elements, using the following syntax: numberofobjectsininputqueue.numberofthreadsrunning.numberofsubelementsdiscovered If after two minutes there is no change in these messages, the Inventory Tool displays a more detailed message like the following: 2005/12/09 13:46:57 Current activity @ 2005.12.09-18.46.54 2005/12/09 13:46:57 Stage: IP done.1 2005/12/09 13:46:57 Stage: SNMP done.1 2005/12/09 13:46:57 Stage: Elmt 0.1.0 2005/12/09 13:46:57 W: R00004/192.168.80.2 2005/12/09 13:46:57 Stage: SubElmt 0.0.0 The line that includes the run number and IP address (2005/12/09 13:46:57 W: R00004/192.168.80.2, for example) can be used to troubleshoot possible problems, as explained in Discovery Seems to Hang or Never Finishes on page 18. 6 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
Discovery Troubleshooting The following sections address the more common problems that arise during Discovery. Discovery Does Not Start The following sections offer the most common solutions to problems with Discovery not starting. Discovery Fails Because Discovery Server Does Not Run If the Discovery server fails to start, an error message like the following is returned: IIOP: couldn't connect to 192.168.68.251:34024: couldn't open socket: connection refused Error: StartInventory Failed for Discovery Server : IDL:omg.org/CORBA/INTF_REPOS:1.0 {minor 0 completion_status COMPLETED_NO} To troubleshoot this problem, do the following: 1. Log in as pvuser on the system where the channel manager and log server are installed. 2. Change your working directory to the $DC_HOME/bin directory, by entering the following command. Note that $DC_HOME is defined as /opt/datachannel by default. cd $DC_HOME/bin 3. Verify that the Discovery server is not running by entering the following command: $ dccmd -action status -pattern DISC.*.* If the Discovery server is not running, the dccmd command returns output like the following: NUMBER FACILITY HOST STATUS ES DURATION EXTENDED STATUS 1 DISC unresponsive ACTION: If the Discovery server is not running, do the following: 1. Restart the Discovery server by entering a command like the following, specifying the Discovery server for your deployment (in this example we use DISC.DEV19.1): dccmd action bounce pattern DISC.DEV19.1 2. Verify that the Discovery server is running by entering the following command: dccmd -action status -pattern DISC.*.* If the Discovery server is running, the dccmd command returns output like the following: NUMBER FACILITY HOST STATUS ES DURATION EXTENDED STATUS 1 DISC DEV19.QUALLA running 1 running For more information on using the dccmd command, see the Netcool/Proviso Command Line Interface Guide. Discovery Fails Because Collector Stops During Discovery If the collector stops during a Discovery, several different error messages are logged. The most common error messages are the following: Error: Aborted at March 14, 2005 10:21:58 pm Error: Connection refused Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 7
. Chapter 2: SNMP Inventory Troubleshooting Error: Discovery Server : Status of lowell : invalid CLIENTERR [DC1] R00015 Connection refused (-I 682 -D 0 -profil lowell -collector dev19.quallaby.com:3002 -nbgetifaddress 100 -invfiletxt /opt/datamart/conf/inventory_subelements.txt -vname {} -intcollector 1) To troubleshoot this problem, do the following: 1. Log in as pvuser (or the user name that you specified during installation) on the system where DataMart is installed. 2. (Optional) Ensure that the Oracle database and Listener are running. For more information, see the Tivoli Netcool Performance Manager Installation Guide. 3. Enter the following command, replacing DATAMART_ROOT with the root DataMart directory (/opt/datamart by default): DATAMART_ROOT/bin/pvm The DataMart GUI appears, as shown in Figure 3:. Figure 3: DataMart GUI Name of the system where Tivoli Netcool Performance Name of the system where the database is installed. 4. Click on the Collector Information icon. The Collector Information Tool appears, as shown in Figure 4:. 8 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
. Chapter 2: SNMP Inventory Troubleshooting Figure 4: Collector Information Tool The Collectors Tab lists all the collectors loaded from the database. 5. Select the collector that you are troubleshooting. You look for the collector name in the output of the Live Information tab of the Discovery trace, as follows (in this example, the collector is dev19.quallaby.com): BeforeDiscovery for lowell : nothing to do Running Discovery on profile : lowell (mode 1) Enforce SE invariant uniqueness per element : active (from profile ) Duplicate IpAddress: active with collector number: 1 (dev19.quallaby.com:3002) (from profile ) Sub-element configuration file: /opt/datamart/conf/inventory_subelements.txt (from profile ) Discovery collector number: 1 (dev19.quallaby.com:3002) (from profile ) Start Reload formulas on dev19.quallaby.com:3002... Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 9
Figure 5: Stopped Collector ACTION: Restart the collector by entering the following command on the Collector server: /opt/dataload/bin/pvmdmgr start Note: The collector cannot be restarted from the GUI. Discovery Does Not Start Because Inventory is Locked To troubleshoot this problem, do the following: 1. Log in as pvuser (or the user name that you specified during installation) on the system where DataMart is installed. 2. (Optional) Ensure that the Oracle database and Listener are running. For more information, see the Tivoli Netcool Performance Manager Installation Guide. 3. Enter the following command, replacing DATAMART_ROOT with the root DataMart directory (/opt/datamart by default): DATAMART_ROOT/bin/pvm The DataMart GUI appears, as shown in Figure 3:. 4. Click on the Resource tab->inventory Tool icon. The Inventory Tool appears, as shown in Figure 6:. Figure 6: Inventory Tool 5. Click on the Live Information tab. If the profile is locked, you will see a warning message like the one displayed in Figure 6:. 6. To confirm that the profile is locked, return to the DataMart GUI and click on the Monitor tab->datamart Status icon. The DataMart Status Tool appears, as shown in Figure 7:. 10 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
Figure 7: DataMart Status Tool In this example, the Inventory profile lowell is locked by an Inventory process with a PID of 13458. 7. Determine if the lock is valid and if the process is still running, by entering the following command, replacing LOCK_PROCESS_PID with the PID of the locking process (13458 in our example): ps -aef grep LOCK_PROCESS_PID ACTION: If the process is active, the lock is valid and you should wait until the Inventory completes before running another Inventory. ACTION: If the process is inactive, remove the lock and start another Inventory. To remove the lock, click on the Remove Locks icon located in the upper left of the toolbar. Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 11
The qping Utility Has Incorrect Settings As part of Discovery in Inventory Mode 1, Tivoli Netcool Performance Manager scans IP ranges using the DataLoad qping utility. This utility will fail with the following error messages if group, user, and execute permissions are not set correctly on the file: To troubleshoot this problem, do the following: 1. Log in as pvuser (or the user name that you specified during installation) on the system where the DataLoad server is installed. 2. Change your working directory to $PVMHOME/bin (/opt/dataload/bin by default), and enter the following command to check the settings on the qping utility: cd $PVMHOME/bin ls -l./qping The correct settings are as follows: -r-sr-sr-- 1 root pvusers 1537814 Jan 14 15:31 /opt/dataload/bin/qping ACTION: If the settings are not correct, su to root and enter the following commands: chown root $PVMHOME/bin/qPing chmod 6554 $PVMHOME/bin/qPing chgrp pvusers $PVMHOME/bin/qPing Verify the settings by entering the ls -l command again. Important: The qping utility and the DataLoad user must belong to the same group (in our example, pvusers). The sticky bit setting ( s in the ls -l line) allows the utility to be executed as root by any UNIX user belonging to the pvusers group. 12 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
Discovery Starts But Issues Warning Messages The following sections offer the most common solutions to warning messages that occur during Discovery. IP Addresses Are Rejected If an IP address is rejected during Discovery, errors like the following will be written to the log or will display on the Information Tab of the Inventory Tool: Warning: IP Address 10.48.58.45 rejected because this IP address has been excluded. Warning: IP Address 10.60.64.230 rejected because this IP address has been excluded. Important: In most cases, this is not a problem. To troubleshoot this error message, do the following: 1. Log in as pvuser (or the user name that you specified during installation) on the system where DataMart is installed. 2. (Optional) Ensure that the Oracle database and Listener are running. For more information, see the Tivoli Netcool Performance Manager Installation Guide. 3. Enter the following command, replacing DATAMART_ROOT with the root DataMart directory (/opt/datamart by default): DATAMART_ROOT/bin/pvm The DataMart GUI appears, as shown in Figure 3:. 4. Invoke the Discovery Tool Wizard by doing the following: 4-a. Select the Resource Tab->Inventory Tool icon. The Inventory Tool appears with the Configuration Tab displayed, as shown in Figure 8: Figure 8: Inventory Tool Configuration Tab 4-b. Highlight a profile and then select Edit->Profile or click on the edit icon Wizard appears, as shown in Figure 9.. The Inventory Tool Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 13
Figure 9: Inventory Tool Wizard 4-c. Click the Next button twice to navigate to the Discovery Tool Wizard, as shown in Figure 10:. Figure 10: Discovery Tool Wizard ACTION: As shown in Figure 10:, check to see if the rejected IP addresses either individually or within a specified range have been intentionally excluded. If they have not, contact Micromuse support. Duplicate Elements Are Found If the same element, a router for example, has more than one IP address associated with it, Tivoli Netcool Performance Manager will discover the element multiple times, reject the duplicates discoveries, and write warning messages to the log. Figure 11:, for example, shows Tivoli Netcool Performance Manager rejecting three duplicate elements. 14 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
Figure 11: Rejected Duplicate Elements This is expected behavior. ACTION: If these warning messages appear often, Discovery performance may be degraded, since Tivoli Netcool Performance Manager must spend a lot of time calculating and eliminating duplicate elements. You may therefore want to exclude the duplicate addresses from the Discovery by doing the following: 1. Invoke the Discovery Tool Wizard (DTW). For instructions on how to invoke the DTW, see STEP 4 on page 4. on 13. 2. Add the duplicated IP addresses to the IP address exclude area of the DTW, as shown in the following figure: Elements Are Not Identified During Discovery If DataLoad did not receive an SNMP answer using the community name configured in the Tivoli Netcool Performance Manager profile, warning messages like the following are written to the log and displayed on the Live Information Tab of the Inventory Tool: Warning: Unidentified Agents= {192.168.1.201,192.168.1.100,192.168.1.84,192.168.1.75,192.168.1.103,192.168.1.93,192.168.1.98,192.168.1.92,192.168.1.60,192.168.64.103} Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 15
Warning: Unidentified Agents= {192.168.1.48,192.168.64.1,192.168.64.113,192.168.64.106,192.168.127.253,192.168.1.242,192.168.1.181,192.168.1.182,192.168.1.254,192.168.1.62} Devices may fail to respond for reasons like the following: The device is not reachable from DataLoad collector (for example, no network route). The SNMP agent was not started in the device. Tivoli Netcool Performance Manager has the wrong SNMP community name for the device. The device Access List is preventing Tivoli Netcool Performance Manager DataLoad to act as an SNMP Manager for this device. The firewall configuration is preventing SNMP traffic with the device. To troubleshoot this problem, follow these steps: 1. Log in as pvuser (or the user name that you specified during installation) on the system where DataMart is installed. 2. (Optional) Ensure that the Oracle database and Listener are running. For more information, see the Tivoli Netcool Performance Manager Installation Guide. 3. Change your working directory to DATAMART_ROOT/bin, replacing DATAMART_ROOT with the root DataMart directory (/opt/datamart by default). 4. Perform an Internet Control Message Protocol (ICMP) query on the device, by entering the following command, replacing IP_ADDRESS_OF_UNIDENTIFIED_DEVICE with the IP address of the unidentified device, and NAME_OF_COLLECTOR_SERVER_DOING_PING with the name of the collector server doing the ping: qping IP_ADDRESS_OF_UNIDENTIFIED_DEVICE -S NAME_OF_COLLECTOR_SERVER_DOING_PING If successful, the qping command returns the IP address of the unidentified device, as follows: $ qping 192.168.68.33 -S dev19 192.168.68.33:10 5. Connect to the device and verify whether or not the SNMP agent is enabled and running. 6. Connect to the device and verify the community name. 7. (Optional) You might have to change the community name in Tivoli Netcool Performance Manager, by using the DataMart->Inventory->Discovery Wizard to add an "alternate" community name to the profile. To change the community name, follow these steps: 7-a. Invoke the Discovery Tool Wizard (DTW). For instructions on how to invoke the DTW, see STEP 4 on page 4. on 13. 7-b. Add an alternate community name in the specified text box. 7-c. Click the Add button to confirm your choice, as shown in the following figure: 16 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
Elements Skipped Because No Related Sub-Elements When Tivoli Netcool Performance Manager discovers an element but cannot discover any related sub-elements, warning messages like the following are written to the log and displayed on the Information Tab of the Inventory Tool: Skipping 3 elements (Set ('192.168.66.221_jeffs2' '192.168.66.221_default' '192.168.66.221_jeffs1')) in output file, because they don't have related subelements Important: This is not expected behavior, and should be resolved immediately. To troubleshoot this problem, follow these steps: 1. Log in as pvuser (or the user name that you specified during installation) on the system where DataChannel is installed. 2. Change your working directory to $DC_HOME/log (/opt/datachannel/log by default), by entering the following command: cd $DC_HOME/log 3. Enter the following command to search the proviso.log for the Discovery formula of the skipped element, replacing EL_IP_AD, with the IP address of the skipped element: grep DISC proviso.log grep R00020 grep ELS_IP_AD grep CHKDISCOVERY The grep command returns no output. ACTION: If the grep command does not return output, the content of the following files in the $PVMHOME/conf directory ($PVMHOME is defined as /opt/datamart by default): $PVMHOME/conf/inventory_elements.txt $PVMHOME/conf/inventory_sub_elements.txt Contact Micromuse support. The grep command returns output like the following: 2005.03.15-18.58.28 UTC DISC.DEV19.1-13308 2 CHKDISCOVERY R00020/192.168.68.173. Family:Generic~Agent, try discoveryformula Formula (Basic_Element - 7486 string) Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 17
2005.03.15-18.58.43 UTC DISC.DEV19.1-13308 2 CHKDISCOVERY R00020/192.168.68.173. Family:1213_Device, try discoveryformula Formula (1213_Device - 4851) 2005.03.15-18.58.45 UTC DISC.DEV19.1-13308 2 CHKDISCOVERY R00020/192.168.68.173. Family:IETF_IF, try discoveryformula Formula (IETF_IF - 9885) ACTION: The following may be attempted: If you believe that the device should in fact respond, try to discover the device manually and monitor the trace, using the DataMart->Metric->Formula Editor, as shown in the following figure: Activate the relevant portion of the device MIB (for example SAA in the Cisco router). In some cases, SNMP attributes do not respond properly, and the Discovery formula for those devices must be changed. In this event, contact Micromuse support. Discovery Seems to Hang or Never Finishes If the Discovery server shows no progress for several minutes (up to thirty minutes in some cases), as shown in the following trace log, it has probably encountered problems with a Discovery formula: Problems real or apparent with Discovery formulas can be the result of the following: The SNMP agent is slow to respond. The network latency is very long. The Discovery formula is not well-suited to the SNMP agent. 18 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
Collector performance issues. When Tivoli Netcool Performance Manager is in such a state one of two things result: The Discovery formula finally succeeds and the Inventory continues. The Discovery hangs until the Inventory timeout occurs (two hours) when the following error message is written to the log file: Error: Profile is not progressing during the last 7201 seconds, aborting Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 19
ACTION: Contact Micromuse support. and provide them with the DATACHANNEL_ROOT/datachannel/log/[yyyy.mm.dd]SNMP.log Before contacting Micromuse support, and before the Discovery finishes or times out, do the following: 1. Determine which Discovery formula is causing the problem by doing the following: 1-a. 1-b. 1-c. Log in as pvuser (or the user name that you specified during installation) on the system where DataChannel is installed. Change your working directory to DATACHANNEL_ROOT/log, replacing DATACHANNEL_ROOT with the root DataMart directory (/opt/datachannel by default). Enter the following command to search the proviso.log for the Discovery formula: grep DISC proviso.log grep R00010 tail -1 2005.03.15-20.32.16 UTC DISC.PVDEMO2.1-1048 2 CHKDISCOVERY R00010/172.31.0.51. Family:Cisco_CBQoS_Action, try discoveryformula Formula (Cisco_CBQoS_Action - 7784) In this example, the Discovery formula is Cisco_CBQoS_Action. 2. Determine the internal request ID that is stuck by doing the following: 2-a. Log in as pvuser (or the user name that you specified during installation) on the system where the DataMart or DataLoad server is installed. 2-b. Change your working directory to DATAMART_ROOT/log, replacing DATAMART_ROOT with the root DataMart directory (/opt/datamart by default). 2-c. Enter the following command to find the internal request ID, replacing SERVER_NAME with the name of the Tivoli Netcool Performance Manager DataMart server: statget -S SERVER_NAME grep -i once + [33] ID 96627,{CAL none (ONCE)(next=2005/03/15 20:32:16)}(P3) ACTIVE (LastExec): ServiceForm:(Trgt=(string)172.31.0.51)(Form=(form)Cisco_CBQoS_Action) (Inst=)(RComm=public) + [40] ID 96683,{CAL none (ONCE)(next=2005/03/15 20:48:19)}(P1) ACTIVE (LastExec): ServiceSTAT (LONG) In this example, the internal request ID is 96627. Important: If the command returns no output, contact Micromuse support. 3. Enable limited debugging on the collector server in order to populate the log with additional information by doing the following: 3-a. Change your working directory to /opt/dataload/contribs. 3-b. Enter the following command, replacing TASKID with the Internal request ID you found in STEP 2 on page 2. on 20. dialogtest2 Debug 6.TASKID Tivoli Netcool Performance Manager outputs a message like the following: Set debug level to 6 for taskid 96627. Debug configuration: > Global Level= 1; Mask=FW > ID 96627 Level= 6; Mask=FWI1234 WARNING: DO NOT USE THE GLOBAL COLLECTOR DEBUGGER. 20 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
Synchronization Troubleshooting The following sections address the more common problems that arise during Synchronization. Synchronization (Elements) During the synchronization phase, Tivoli Netcool Performance Manager compares the virtual network image generated by the Discovery with the records in the Tivoli Netcool Performance Manager database that were created by the previous Inventory run. Any modifications (new, missing, or renamed resources, for example) are then synchronized through the application of various algorithms, and the new network image is written to the database. In order to track elements and sub-elements through subsequent inventories, Tivoli Netcool Performance Manager identifies them with a unique, never-changing identifier, called an invariant. The default element invariant, for example, is a concatenation of the following three attributes: The MIB II sysname The first IP address responding to the ICMP scan (in Inventory mode 1) or the one given in the the mode 2 Inventory file The fist valid physaddress (MAC address) in the MIB II iftable Table 1 illustrates the logic used to determine if a newly-discovered element is actually new or is an existing element that has moved or changed. Table 1: Synchronization Invariant Logic Element Attributes in Database MIB II sysname IP Address physaddress New Element If... Defined Defined Defined Two or three attributes have changed. Empty Defined Defined Two attributes have changed. Empty Empty Empty The resolved name is different from resolved name in the database. Note: If an element does not respond to one of these three MIB II attributes, contact Micromuse support. Important: To avoid synchronization errors, we strongly recommend that you limit device configuration changes to one attribute between two runs of an Inventory (Discovery and Synchronization). Pre-synchronization Summary As Tivoli Netcool Performance Manager prepares for synchronization, it compiles a list of the following types of elements: New Updated Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 21
Burned Unchanged Not Found Histo Reject Figure 12 shows the display written to the DataMart->Inventory Tool->Live Information tab, with this summary information highlighted: Figure 12: Pre-synchronization Summary The following sections discuss this summary in more detail. Identifying Not Found Elements ACTION: If the Pre-Synchronization summary lists any not found elements (there are three in Figure 12, for example), contact Micromuse support. Identifying Burned Elements In cases where the MIB II sysname, IP address, and physaddress are all defined for an element (the first row in Table 1), Tivoli Netcool Performance Manager may encounter a special situation if the MIB II sysname and phyaddress have changed, but the IP address remains unchanged. 22 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
In this special case, Tivoli Netcool Performance Manager creates a new element and writes it to the database. However, since the IP address for both elements is the same, so too is the resolved name, which means that the elements would be seen by Tivoli Netcool Performance Manager as the same element. To prevent this situation, Tivoli Netcool Performance Manager renames the initial elment with a Burned prefix, as shown in Figure 13:: Figure 13: Burned Element When Synchronization completes, a list of burned elements is written to the following file: PROFILE_HOME/PROFILE_NAME/synchro/e_burned.dat ACTION: If you have an e_burned.dat file, contact Micromuse support. The burned elements and their sub-elements are duplicated, and the statistics attached to these resources are not continuous in reports. Preventing Burned Elements Tivoli Netcool Performance Manager relies quite heavily on the sysname when calculating invariants. To prevent burned elements, so the following: 1. Do NOT change the sysname on a device when the lowest physaddress has also changed. 2. Ensure that the sysname is defined and not left empty. 3. Ensure that the sysname is defined with a unique value. 4. Minimize any changes to the sysname once they have been set, unless you want Tivoli Netcool Performance Manager to see the device as new and interrupt the continuity of statistics. Detecting Too Many New Elements If there has been extensive configuration changes before an Inventory run (for example, if two attributes were changed simultaneously on a router), Tivoli Netcool Performance Manager will discover and create a new element in the database. The consequences are severe: The element and its sub-elements are duplicated. The statistics attached to these resources are not continuous in the reports. Double polling might happen on "both", until the "first" device is "retired" by the Inventory. Aside from limiting device configuration changes to one attribute between two runs of an Inventory, there is no way to prevent this situation from occurring. Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 23
When Synchronization completes, a list of new elements is written to the following file. Note that $PVMHOME is defined as /opt/datamart, by default: $PVMHOME/importexport/PROFILE_NAME/synchro/new_e.dat ACTION: If you have a new_e.dat file, check to ensure that the number of new elements is not unexpected. If the number is unexpected, contact Micromuse support. Note: A large new_e.dat file is to be expected the first time a network is discovered. Detecting Different Elements Resolved With the Same Name If the name resolution system (for example, DNS) assigns identical names to the IP addresses of different devices a configuration mistake Tivoli Netcool Performance Manager will produce the following error during Synchronization: Warning: IP Address 10.48.58.45 rejected Warning: IP Address 10.60.64.230 rejected When Synchronization completes, a list of rejected elements is written to the following file: PROFILE_HOME_DIRECTORY/PROFILE_NAME/synchro/duplicateElement_e.reject ACTION: If you have a duplicateelement_e.reject file, do the following: From the DataMart server, resolve the name of both conflicting IP addresses, in order to confirm the identical naming resolution. Fix the naming resolution system and start the same Inventory again. Do Not Delete Incorrectly Named Elements That Define SNMP Community Names When initially configuring Tivoli Netcool Performance Manager, you use the SNMP Configuration Tool, as shown in Figure 14:, to associate elements with their SNMP Community name so that Tivoli Netcool Performance Manager can read from the devices. Figure 14: Configuring SNMP Community Names The SNMP Configuration Tool writes this information to the database. 24 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
During the next Inventory run, the synchronization process sees the elements that were newly entered into the database by the SNMP Configuration Tool as missing all three invariant attributes (see row three in Table 1 on page 21). Tivoli Netcool Performance Manager then attempts to determine if a new element needs to be created by comparing the resolved names of the devices, with the following results: If the name that was entered into the SNMP Configuration Tool was the correct resolved name of the device, Tivoli Netcool Performance Manager will not create a new element and will instead update the attributes for the existing element with the discovered values. If the name that was entered into the SNMP Configuration Tool was the incorrect resolved name of the device, Tivoli Netcool Performance Manager will create a new element. On subsequent Discoveries, Tivoli Netcool Performance Manager will rediscover and eventually update this element and ignore the element with the incorrect resolved name. ACTION: Do NOT remove the element with the wrong resolved name from the database. Tivoli Netcool Performance Manager continues to use that entry for SNMP Community name mapping. If the entry is removed, Tivoli Netcool Performance Manager will lose the SNMP Community mapping and will not be able to read from that resource. Failed Synchronization While monitoring the Synchronization process, pay particular attention to failed elements, as shown in Figure 15:. Figure 15: Failed Elements and the Inventory Tool ACTION: If Synchronization reports failed elements (greater than 0), contact Micromuse support immediately. Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 25
Synchronization (Sub-elements) The sub-element invariant is defined in the Discovery Formulas referenced in the file $PVMHOME/conf/inventory_sub_elements.txt. Note that $PVMHOME is defined as /opt/datamart by default. This invariant is usually defined differently per object type (for example, an SAA probe) and vendor (Cisco, for example). The logic used to decide if a newly-discovered sub-element is in fact new, depends on the following: If the invariant is not defined, the sub-element name (usually the concatenation of the element name and the SNMP instance) is used to determine if a new sub-element should be created or an existing one should be updated. Note: Any change to the name between two subsequent Discovery runs triggers the creation of a new sub-element. Important: An SNMP index (instance) change will trigger the creation of a new sub-element (because the SNMP index is part of the name), which will cause reporting discontinuity. If an invariant is defined, the parameter Enforce Sub-Element Invariant uniqueness per element" is used, which is set in the DataMart->Inventory->Discovery Wizard, as shown in the following figure: Note: If the option highlighted in the figure is checked, uniqueness is verified on a per element basis; if left unchecked, uniqueness is verified across the entire profile. Identical Sub-element Invariant If an invariant is defined for a particular type of sub-element (for example, Cisco SAA), Tivoli Netcool Performance Manager will check the uniqeness of that invariant during an SNMP Inventory. If Tivoli Netcool Performance Manager detects two sub-elements with the same invariant, it displays an error message like the following: Warning: Identical invariant detected for invariant 172.31.0.51-1_CBQoS-172.31.0.51- class-default-match any -output-1 (172.31.0.51-1,CBQoS<1107><1123> and 172.31.0.51-1,CBQoS<1107><1177>). Important: When identical invariants are detected, neither sub-element is stored in the database. 26 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
ACTION: To resolve this situation, follow these steps: 1. Use the DataMart->Metric->Mib Browser utility to verify that the two invariants are in fact identical. 2. If they are identical, the following needs to be done: The Discovery formula needs to be changed to ensure the uniqueness of either the element or the profile. This is usually done by concatenating more OIDs to the invariant. The SNMP OID value, used for building the invariant, needs to be changed. For example, if you use the Cisco ifalias settable OID, its value might not be unique (among other things, a configuration error may exist due to a non-unique Customer contract number, which should be fixed. Discovery should then be re-triggered, so that both sub-elements can be discovered. 3. Contact Micromuse support for assistance in making these changes. Checking the TraceInventory.log File Figure 16: shows the summary information in the $PVMHOME/log/TraceInventory.log file. Note that $PVMHOME is defined as /opt/datamart by default. Figure 16: TraceInventory.log File You should review the $PVMHOME/log/TraceInventory.log file and check for the following: Large Number of Sub-elements Not Found A large mount of missing sub-elements (discovered before, but not rediscovered), may be because a large portion of your network is not reachable. Note: If the same sub-elements are continually not found during subsequent Inventories, they will eventually be retired. For more information, see Finding Elements and Sub-elements About to Reach Their Retry Limit on page 33. ACTION: If your network is reachable, the sub-elements exist, and you are still receiving error messages, contact Micromuse support. Rejected Sub-elements ACTION: If the log reports any of the following types of rejected sub-elements, contact Micromuse support: histo reject dup reject Burned sub-elements A burned sub-element is determined as follows: The sub-element has an invariant in the database (for example, inv1). Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 27
The sub-element has a name (for example, name1). During an Inventory, a new sub-element is discovered and created with the same name (for example, name1) and a different invariant for example, inv2). In this case, two different sub-elements are about to be created in the database with the same name. To prevent this duplication, Tivoli Netcool Performance Manager renames the initial sub-element with a Burned prefix, so that each sub-element can be stored in the database with a unique name. A list of burned sub-elements is written to the following file: PROFILE_HOME/PROFILE_NAME/synchro/burned_seinv.dat1 ACTION: If the log reports any burned sub-elements, contact Micromuse support. Disconnected sub-elements A disconnected sub-element is determined as follows: The sub-element has an invariant in the database (for example, inv1). The sub-element has a name (for example, name1). During an Inventory, a sub-element is discovered with the same name (for example, name1) and an empty invariant. Note: An empty invariant typically happens when an invariant is used to assign physical resources (ports, for example) to customers. In this case, the empty invariant is interpreted as a "disconnection" between the customer and the resource. If a sub-element is determined to be disconnected, no new sub-element is created, and the initial sub-element is marked as disconnected. A list of disconnected sub-elements is written to the following file: PROFILE_HOME/PROFILE_NAME/synchro/disconnected_seinv.dat1 ACTION: If the log reports any disconnected sub-elements, contact Micromuse support. Grouping Troubleshooting If all problems with Discovery and Synchronization have been solved or none arise it is unlikely that you will encounter problems with Grouping. 28 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
Monitoring the Tivoli Netcool Performance Manager Log File The Tivoli Netcool Performance Manager log file is located in the DataChannelInstallROOT/log directory (/opt/datachannel/log by default) and should be monitored on a regular basis. Note: Currently, only DataChannel and DataLoad write to the Tivoli Netcool Performance Manager log. Log files for other components can be found in $PVMHOME/log (/opt/datamart/log, by default) and SILVERSTREAM_HOME/log (/opt/silverstream/log, by default). You can write a program to do monitor the Tivoli Netcool Performance Manager log file (recommended) or you can check the Tivoli Netcool Performance Manager log file by hand, using the grep command. For example, the following command checks the Tivoli Netcool Performance Manager log file for a fatal loader error: grep CMGR proviso.log grep w F 982348945 2001.02.16-18.42.25 CMGR F ERROR There was a problem loading the domain database Tivoli Netcool Performance Manager Log File Format Entries in the Tivoli Netcool Performance Manager log file have the following format: <formatted date> <facility> <severity> [<msg code>] <message> <information> <formatted date> is a formatted date string YYYY.MM.DD-hh.mm.ss <facility> indicates the program that made the log file entry (for example, CMGR for Channel Manager) <severity> can be one of the following: I informational W Warning F Failure 1 Debug level 1 2 Debug level 2 3 Debug level 3 <msg code> is a registered msg code for fatal and warning log messages <message> indicates the basic type of the message <information> is an additional text message Tivoli Netcool Performance Manager Log Messages For a complete list of log messages written to the Tivoli Netcool Performance Manager log, see the Tivoli Netcool Performance Manager Error Messages Guide. Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 29
Burned Subelements There are two reasons an subelement (SE) can be in a burned state. If its Element is burned An SE is not in Discovery and another SE with Invariant is discovered with the same Element and Instance Scenario 1 - Instance Shift Causes Disconnect Initial Discovery finds three sub-elements, all with Invariant. Elt dbindex Elt Name Instance SE dbindex SE Name Invari ant State Missing Count 200000431 alm00142061.elt01 <04> 200000450 alm00142061.elt01.seinv04 I:01:04 on 0 200000431 alm00142061.elt01 <05> 200000449 alm00142061.elt01.seinv05 I:01:05 on 0 200000431 alm00142061.elt01 <06> 200000448 alm00142061.elt01.seinv06 I:01:06 on 0 During a subsequent Discovery the following occurs: The SE with Invariant I:01:05 is not found The SE with Invariant I:01:06 is found, but at Instance <05> Result: The SE with Invariant I:01:05 is marked missing because it is not in the Discovery The SE with Invariant I:01:05 becomes Disconnected because it is not in Discovery and another SE was discovered with the same Element and Instance The SE with Invariant I:01:06 is updated (Instance, Name, Label, etc.), but retains the same dbindex and Invariant Elt dbindex Elt Name Instance SE dbindex SE Name Invari ant State Missing Count 200000431 alm00142061.elt01 <04> 200000450 alm00142061.elt01.seinv04 I:01:04 on 0 200000431 alm00142061.elt01 <05> 200000449 alm00142061.elt01.seinv05 I:01:06 on 0 200000431 alm00142061.elt01 Disconnect ed_200000 449_<05> 200000448 Disconnected_200000449 _alm00142061.elt01.seinv 05 I:01:05 off 1 In theory, a Disconnected resource could come back; the Invariant is the unique identifier to indicate the return of the SE. The Disconnected resource attempting to return would also have to have a different Instance in order to return. Element Name and Instance are required to be unique across all Sub-elements. 30 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
Scenario 2 - Instance Shift Causes Burn Initial Discovery finds three sub-elements without Invariant and three sub-elements with Invariant. Elt dbindex Elt Name Instance SE dbindex SE Name Invari ant State Missing Count 200000432 alm00142061.elt02 <01> 200000437 alm00142061.elt02.seinv01 on 0 200000432 alm00142061.elt02 <02> 200000436 alm00142061.elt02.seinv02 on 0 200000432 alm00142061.elt02 <03> 200000435 alm00142061.elt02.seinv03 on 0 200000432 alm00142061.elt02 <04> 200000447 alm00142061.elt02.seinv04 I:02:04 on 0 200000432 alm00142061.elt02 <05> 200000446 alm00142061.elt02.seinv05 I:02:05 on 0 200000432 alm00142061.elt02 <06> 200000445 alm00142061.elt02.seinv06 I:02:06 on 0 During a subsequent Discovery the following occurs: The SE that was at Instance <03> is not found in Discovery The SE with Invariant I:01:06 is found, but at Instance <03> Result: The SE that was at Instance <03> is marked missing because it is not present in Discovery The SE that was at Instance <03> is Burned because it is not present in Discovery and another SE in Discovery has the same Element and Instance The SE with Invariant I:01:06 is updated (Instance, Name, Label, etc.), but retains the same dbindex and Invariant Elt dbindex Elt Name Instance SE dbindex SE Name Invari ant State Missing Count 200000432 alm00142061.elt02 <01> 200000437 alm00142061.elt02.seinv01 on 0 200000432 alm00142061.elt02 <02> 200000436 alm00142061.elt02.seinv02 on 0 200000432 alm00142061.elt02 <03> 200000445 alm00142061.elt02.seinv03 I:02:06 on 0 200000432 alm00142061.elt02 <04> 200000447 alm00142061.elt02.seinv04 I:02:04 on 0 200000432 alm00142061.elt02 <05> 200000446 alm00142061.elt02.seinv05 I:02:05 on 0 200000432 alm00142061.elt02 Burned_20 0000435_< 03> 200000435 Burned_200000435_alm0 0142061.elt02.se03 off 1 Burned resources cannot come back, there is no Invariant to uniquely identify the SE in Discovery as that which was Burned. If the Burned resource attempted to come back without Invariant and with the same Instance, Inventory would reject both because of a duplication on Element Name and Instance that cannot be resolved. Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 31
Where to Go From Here For information on periodic administrative tasks to perform, see Chapter 3, SNMP Inventory Management on page 33 32 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
Chapter 3: SNMP Inventory Management This chapter explains how to manage the Tivoli Netcool Performance Manager SNMP Inventory, and is made up of the following topics: Topic Page Overview 33 Routine SNMP Inventory Management Tasks 33 Where to Go From Here 36 Overview This chapter describes the monitoring actions that have to be performed regularly to keep Tivoli Netcool Performance Manager's SNMP Inventory process performing optimally. If your SNMP Inventory is triggered every day, we recommend performing these monitoring actions at least once a week. If your SNMP Inventory is triggered several times a day (for example, every six hours), we recommend performing these monitoring actions at least once a day. Routine SNMP Inventory Management Tasks The following sections discuss routine SNMP Inventory management tasks: Finding Elements and Sub-elements About to Reach Their Retry Limit When elements and sub-elements reach their retry limit, they are retired from the Inventory, which means that Tivoli Netcool Performance Manager stops collecting on them and removes them from some groups. Note: The retry limit is set and managed in the DataMart->Inventory Tool->Edit Profile->Synchronization Wizard. The default is three retries and a resource that is older than seven days, as shown in the following figure. BOTH limits must be met in order for a sub-element to be retired. Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 33
Chapter 3: SNMP Inventory Management To find elements and sub-elements about to reach their retry limit, do the following: 1. Log in as pvuser (or the user name that you specified during installation) on the system where DataMart is installed. 2. Enter the following commands, noting that $PVMHOME is defined as /opt/datamart by default. The commands look for elements and sub-elements approaching their retry counter, using greater than 2 (>2) for the retry count and 3 (the default) to specify the age of the resource. Elements $PVMHOME/bin/resmgr/-noHead -export elt -colnames "dbindex origin name profil state missing" -filterrule "%(elt.profil) not like '%bulk%' AND %(state)='on' AND %(missing) >2" The resmgr command returns output like the following: 200000004 _ inventory _ skywalker.quallaby.com-1 _ lowell _ on _ 8 _ 200000024 _ inventory _ ducks.quallaby.com-1 _ lowell _ on _ 4 _ 200001383 _ inventory _ 192.168.82.8-1 _ lowell _ on _ 7 _ Sub-elements $PVMHOME/bin/resmgr -nohead -export se -colnames "dbindex origin name label elt.profil invariant state missing" -filterrule "%(elt.profil) not like '%bulk%' AND %(state)='on' AND %(missing) >2" The resmgr command returns output like the following: 200000029 _ inventory _ bbpser170-1_processid<12074> _ bbpser170-1_"bbpser170" "AMGR_visual" pid 12074 _ unix _ AMGR_visual _ on _ 3 _ Note: For more information on the resmgr command, see the Tivoli Netcool Performance Manager Command Line Interface Guide. 34 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
Chapter 3: SNMP Inventory Management ACTION: If the elements or sub-elements are in the off state, then they will be retired. If they are in the on state, but are approaching both their age limit and retry limit, they are about to be retired. The commands in this section search for elements and sub-elements that are in the on state, but will soon be retired. To address this problem, do the following: 1. Fix the connectivity problem so that Tivoli Netcool Performance Manager can discover the element or subelement about to be retired. 2. Check to see if there is a problem with the SNMP community name that is preventing Tivoli Netcool Performance Manager from taking to the resource. 3. Increase either the retry count or the resource age setting (or both) in the Synchronization Wizard. Finding Elements and Sub-elements That Have Been Retired As explained in Finding Elements and Sub-elements About to Reach Their Retry Limit on page 33, when an element or sub-element exceeds its Inventory Retry and Inventory Timeout limits, it is changed to the off state and retired from the Inventory. To detect elements and sub-elements that have been retired, do the following: 1. Log in as pvuser (or the user name that you specified during installation) on the system where DataMart is installed. 2. Enter the following commands, noting that $PVMHOME is defined as /opt/datamart by default. The commands look for elements and sub-elements that have been move to the off state. Elements $PVMHOME/bin/resmgr -nohead -export elt -colnames "dbindex origin name profil state missing" -filter "state(off)" The resmgr command returns output like the following: 200028920 _ inventory _ Delete_200028920_192.168.1.5-1 _ lowell _ off _ 72 _ Sub-elements $PVMHOME/bin/resmgr -nohead -export se -colnames "dbindex origin name state missing" -filter "state(off)" The resmgr command returns output like the following: 200000724 _ inventory _ Delete_200000724_kafka.com-2_If<1> _ off _ 7 _ 200000880 _ inventory _ Delete_200000880_192.168.1.84-2_<NULL> _ off _ 8 _ 200000916 _ inventory _ Delete_200000916_192.168.1.84-2_If<1> _ off _ 8 _ 200001205 _ inventory _ Delete_200001205_192.168.1.84-2_If<2> _ off _ 8 _ Note: For more information on the resmgr command, see the Tivoli Netcool Performance Manager Command Line Interface Guide. ACTION: Micromuse recommends the following Best Practices when dealing with retired elements and subelements: Group them in a special group (for example, the retired group) and periodically check its content. Delete them using the Resource Editor if you confirm that they have been permanently removed from the network. Note: This does not remove their statistics from the database. It just marks them "deleted" in the database, so they will not overload the "retired" group, which will be kept at a manageable size. Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 35
Chapter 3: SNMP Inventory Management Where to Go From Here For information on troubleshooting tasks to perform after a new SNMP Inventory has been run, see Chapter 2, SNMP Inventory Troubleshooting on page 5. 36 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1
Chapter 3: SNMP Inventory Management NOTES Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1 37
Chapter 3: SNMP Inventory Management 38 Tivoli Netcool Performance Manager SNMP Inventory Management and Troubleshooting Guide, Version 1.3.1