Page 1 of 17 Introduction Understanding IBM Tivoli Monitoring 6.1 Agents In A Microsoft Clustered Environment 06/01/2006 The purpose of this document is to describe the IBM Tivoli Monitoring 6.1 agents in a Microsoft Cluster Server environment. The IBM Tivoli Monitoring 6.1 agents discussed are the Windows OS, Microsoft Exchange Server and Microsoft SQL Server agents. The document explains the details of what it takes to get these agents to run in the environment and the expectations of these agents behavior in that environment. This document assumes the reader is familiar with Microsoft Cluster Server environments. This document does not describe the Microsoft Cluster Server environment or how to setup Exchange or SQL in the cluster environment. This document also assumes the reader is familiar with installation and setup of the aforementioned IBM Tivoli Monitoring 6.1 agents in a non-cluster environment, because this document only describes changes required for the installation and setup in a clustered environment. IBM Tivoli Monitoring 6.1 Fix Pack 1 is a prerequisite for the cluster support due to changes in the IBM Tivoli Monitoring 6.1 Windows OS agent and the Tivoli Enterprise Portal Server. The Cluster Configuration The IBM Tivoli Monitoring 6.1 cluster environment was comprised of a two node quorum based cluster. Applications installed in the cluster consisted of a single Exchange Virtual Server and two SQL Virtual Servers. Each SQL Virtual Server was setup to have a different node as the preferred nodes for the SQL Virtual Server. The SQL Virtual Servers were setup that way so under normal conditions the two SQL Virtual Servers would not run on the same node. Each node in the cluster had an instance of the Windows OS agent, an instance of the Exchange agent and two 1 instances of the Microsoft SQL Server Agent installed. Each of the Windows OS agents was setup to be always running (auto startup). The Exchange agent and Microsoft SQL Server Agents were set to manual startup and the running (start/stop) of these agents was controlled by a cluster resource. The agent cluster resource was added to the instance of the virtual server cluster Resource Group the agent was responsible to monitor. That way when the virtual server failed over the agent monitoring that virtual server instance moved with the virtual server to the new (failover) node. In the following figure each of the blue boxes represents a node in the cluster. Node 1 is the preferred node for the Exchange Virtual Server and first SQL Virtual server while node 2 is the preferred node for the second SQL Virtual server. The white boxes inside the blue represent the active virtual servers and active IBM Tivoli Monitoring agents. The gray boxes represent the currently inactive IBM Tivoli Monitoring agents. 1 For the Microsoft SQL Server Agent the number of instances is controlled by setup, only one code install is required.
Page 2 of 17 Node 1 Node 2 SQL Resource Group 1 SQL Resource Group 2 Exchange Resource Group Exchange Agent SQL Agent Server 2 Exchange Agent SQL Agent Server 2 SQL Agent Server 1 SQL Agent Server 1 OS Agent OS Agent Figure 1: Cluster Configuration The following figure shows that when a virtual sever (in this case Exchange) moves from node 1 to 2, the Exchange agent is stopped on node 1 and started on node 2 (as a result of the Resource Group moving). Node 1 Node 2 SQL Resource Group 1 SQL Resource Group 2 Exchange Resource Group Exchange Agent SQL Agent Server 2 Exchange Agent SQL Agent Server 2 SQL Agent Server 1 SQL Agent Server 1 OS Agent OS Agent Figure 2: Cluster Configuration after Exchange Failover With the proper setup, described later in this document, only the active agents are visible in the portal. The following figure shows the navigation area of the system where the cluster name is SQLCLUSTER. Note: In addition to the agent setup, this navigation view requires a change to the Tivoli Enterprise Portal Server from IBM Tivoli Monitoring 6.1 Fix Pack 1. If this change pack is not applied, the agents will appear in the navigation view under the cluster node name that the first agent registers.
Page 3 of 17 Figure 3: Portal View of Cluster Installing IBM Tivoli Monitoring Agents in a Microsoft Cluster Server Cluster Environment It is assumed that the reader understands how to install and setup the IBM Tivoli Monitoring 6.1 environment. If the reader is not familiar with the basic installation and configuration of IBM Tivoli Monitoring 6.1 and these agents, please refer to the appropriate IBM Tivoli Monitoring 6.1 documentation. This document concentrates on the unique steps required for installation and setup of IBM Tivoli Monitoring 6.1 agents in a Microsoft Cluster Server cluster environment. For those readers familiar with installing and setting up these IBM Tivoli Monitoring 6.1 agents, only three additional steps are required for the cluster environment: 1. setting CTIRA_HOSTNAME to a common value for all agents (usually the cluster name) 2. setting CTIRA_HIST_DIR 2 to a common disk location if history is stored at the TEMA 3 3. creating an agent cluster resource in the Resource Group of the Virtual Server 3 On Windows, IBM Tivoli Monitoring 6.1 requires agents to be installed in the same directory path as the OS agent, therefore each node in a cluster must have all agents installed (on the nodes 2 If history is stored at the Tivoli Enterprise Management Server setting CTIRA_HIST_DIR is not required. Be aware that storing history at the Tivoli Enterprise Management Server puts a higher burden on that server. 3 These steps are not required for the Windows OS agent since it actively runs on all nodes in the cluster
Page 4 of 17 system disk) that are required to support the cluster applications that can possibly run on that cluster node. Setting up the IBM Tivoli Monitoring 6.1 Windows OS Agent The IBM Tivoli Monitoring 6.1 Windows OS agent should be installed on every node in the cluster. There is no unique setup for the IBM Tivoli Monitoring 6.1 Windows OS agent in a Microsoft Cluster Server environment, since it will behave the same way as it does in a nonclustered environment. One problem related to monitoring disk resources that caused the OS agent to ABEND was discovered during test and fixed by IBM Tivoli Monitoring 6.1 Fix Pack 1. Setting up the IBM Tivoli Monitoring 6.1 Microsoft Exchange Agent The IBM Tivoli Monitoring 6.1 Microsoft Exchange agent should be installed and configured 4 on each node in the cluster where it is possible for the Microsoft Exchange virtual server to run. Each instance of the Microsoft Exchange agent must be configured with a CTIRA_HOSTNAME 5. It is suggested that the user set this environmental variable to the name of the Microsoft Cluster Server cluster. Setting this variable to the cluster name allows the user to navigate in the IBM Tivoli Monitoring 6.1 Portal to a windows system name that matches the cluster name 6, a much easier way to find the agents in the IBM Tivoli Monitoring 6.1 Portal. NOTE: When deciding on the value CTIRA_HOSTNAME, remember the managed system name is comprised of three parts CTIRA_SUBSYSTEM_ID, CTIRA_HOSTNAME and CTIRA_NODETYPE and is limited to 31 characters. By default the Exchange CTIRA_NODETYPE is set to EX and CTIRA_SUBSYSTEM_ID 7 is not used. If history for the Microsoft Exchange agent is configured to be stored at the TEMA (the agent), each instance of the Microsoft Exchange agent must be configured with common CTIRA_HIST_DIR that points to a shared disk 8 directory. That way the history file can be maintained from whichever node is running the Exchange agent. When the agent follows the Microsoft Exchange Virtual Server during failover the agent can store historical data to that common location and the history data will be contiguous. The agent environmental variables can be set using the Manage Tivoli Enterprise Monitoring Services tool (agent config tool), pictured in the figure below. Select the Exchange Agent (right click), select Advanced and select Edit Variables. 4 Do not allow the agent to connect into the Tivoli Enterprise Management Server/Tivoli Enterprise Portal Server before the ctira_hostname is setup. If the connection is made with the nodes hostname, that managed system will appear in the portal navigation and can be removed manually when ctira_hostname is set appropriately. 5 In order for the history views in the ITM 6.1 Portal to work properly, the hostname must be the same for each node that monitors a specific instance of the Exchange Virtual Server. 6 For the navigation in the portal to work properly the Tivoli Enterprise Portal Server is at IBM Tivoli Monitoring 6.1 Fix Pack 1 level is required. 7 CTIRA_SUBSYSTEM_ID maybe required in distinguishing instances of the Exchange agent if the cluster supports more then one instance of Exchange Virtual Server. 8 The likely candidate of the shared disks is a disk that is owned by the Microsoft Exchange Virtual server.
Page 5 of 17 Figure 4: Agent Config Tool First click Add, then enter CTIRA_HOSTNAME or select CTIRA_HOSTNAME 9 from the list of Variables, and then specify the cluster name (in this example SQLCLUSTER). Figure 5: Agent Config Setting CTIRA_HOSTNAME 9 The TYPE=REG_EXPAND_SZ is required because of a limitation in the agent config tool creating the registry entry to store the value. The agent config tool currently does not delete the default CTIRA_HOSTNAME registry entry and windows will not allow a change in type. This problem occurs because the Microsoft Exchange agent has a default CTIRA_HOSTNAME entry.
Page 6 of 17 Next click Add again, then enter CTIRA_HIST_DIR or select CTIRA_HIST_DIR from the list of Variables, and then set it to whatever directory on a shared disk you want the agent to store history data. Figure 6: Agent Config Setting CTIRA_HIST_DIR The resultant configuration should look something like the figure below. Figure 7: Agent Config Setting
Page 7 of 17 Next set the startup parameter to manual to allow the cluster resource (cluster resource creation is described below) to control the starting and stopping of the agent. Select the Exchange Agent (right click) and then select Change Startup Figure 8: Agent Config Setting Change Startup Select Manual and click OK. Figure 9: Agent Config Setting Change Startup Manual Once you have configured each instance of the agent you need to setup a cluster resource to control the Microsoft Exchange agent. This resource controls the starting and stopping of agent.
Page 8 of 17 This resource should be associated with the cluster Resource Group that controls the instance of the Microsoft Exchange Virtual Server. Using the Microsoft Cluster Server cluster administration tool, select the instance of the Microsoft Exchange Virtual Server Resource Group under which you wish to create the IBM Tivoli Monitoring 6.1 Microsoft Exchange agent resource. Select new resource and specify the name of the IBM Tivoli Monitoring 6.1 Microsoft Exchange agent resource, a description, and a resource type of Generic Service and the Resource Group that controls the Microsoft Exchange Virtual Server. Figure 10: Exchange Agent Cluster Resource Setup Take the defaults for possible owners to match those of the Exchange Virtual Server.
Page 9 of 17 Figure 11: Exchange Agent Cluster Resource Setup -- Possible Owners If the Exchange disk was specified as the disk to store agent history specify that disk resource history as a dependency. If the disk resource does not come online, the agent resource will not come online either. Figure 12: Exchange Agent Cluster Resource Setup -- Disk Dependency Next, specify the Exchange Agent Service name. There are no start parameters for the Exchange Agent.
Page 10 of 17 Figure 13: Exchange Agent Cluster Resource Setup -- Service Name There are no registry keys, so click the Finish and the Exchange cluster agent resource is complete. Figure 14: Exchange Agent Cluster Resource Setup -- Finished After the Exchange cluster agent resource is complete, select the resource and properties. Take the Advanced tab s defaults except uncheck the Affect the Group checkbox so any agent failure does not affect Microsoft Exchange Virtual Resource Group failover.
Page 11 of 17 Figure 15: Exchange Agent Cluster Resource Advanced Setup Note: Once control of the agent cluster resource is given to the cluster server, in order to make configuration changes or edit the agent s variables on the node that owns (runs) the agent cluster resource, the agent cluster resource has to be taken off line. This is because in order to change configuration parameters the IBM Tivoli Monitoring agent it must be offline. If the agent cluster resource is not offline when the agent config utility attempts to take the agent offline, the cluster server will notice the agent went off line and attempt to bring the agent back on line. When done with configuration changes for the agent do not forget to bring the agent cluster resource back on line. Setting up the IBM Tivoli Monitoring 6.1 Microsoft SQL Server Agent The cluster unique steps for setting up the IBM Tivoli Monitoring 6.1 Microsoft SQL Server Agent are very similar to those steps described above for the IBM Tivoli Monitoring 6.1 Microsoft Exchange agent, so the detailed screen captures are not provided for the Microsoft SQL Server Agent setup. The IBM Tivoli Monitoring 6.1 Microsoft SQL Server Agent should be installed on each node in the cluster where it is possible for the Microsoft SQL Virtual Servers to run. Since there can be multiple instances of the Microsoft SQL Server Agent each instance must be configured with a CTIRA_HOSTNAME. It is strongly suggested that the user set the CTIRA_HOSTNAME environmental variable to the name of the Microsoft Cluster Server cluster for all agents running in that cluster. Setting all agents CTIRA_HOSTNAME in the cluster to the same name allows the user to navigate to all the agents for that cluster in the IBM Tivoli Monitoring 6.1 Portal to a windows system name that matches the cluster name 6. NOTE: When deciding on the value CTIRA_HOSTNAME remember of the managed system name is comprised of three parts CTIRA_SUBSYSTEM_ID, CTIRA_HOSTNAME and CTIRA_NODETYPE and is limited to 31 characters. By default for the Microsoft SQL Server Agent CTIRA_NODETYPE is set to MSS and CTIRA_SUBSYSTEM_ID is set to the Microsoft SQL Virtual Server name. The CTIRA_SUBSYSTEM_ID is used to distinguish the multiple instances of the Microsoft SQL Server Agent.
Page 12 of 17 If history for the Microsoft SQL Server Agent is configured to be stored at the TEMA (the agent) each instance of the Microsoft SQL Server Agent must be configured with common CTIRA_HIST_DIR that points to a shared disk directory. Each Microsoft SQL Server Agent startup parameter should be set to manual to allow the cluster resource to control the starting and stopping of the Microsoft SQL Server Agent. Once these parameters are set for each Microsoft SQL Server Agent instance cluster resources to control the Microsoft SQL Server Agents must be created. Each Microsoft SQL Server Agent is comprised of two Windows Services KOQAGENTx and KOQCOLLx where x is the agent instance number. The easiest way to see the relationship is to look at the window services. From the test environment the four services names were: Monitoring Agent for Microsoft SQL Server SQLTEST Monitoring Agent for Microsoft SQL Server - Collector SQLTEST Monitoring Agent for Microsoft SQL Server SQLTEST2 Monitoring Agent for Microsoft SQL Server - Collector SQLTEST2 The following figure represents the two services for SQLTEST instance of the Microsoft SQL Server Agent.
Page 13 of 17 Figure 16: SQLTEST agent services Note that the service names are KOQAGENT0 and KOQCOLL0 and the start parameter is Hkey KOQ\610\SQLTEST since they are required for the generic service parameters step of the resource creation. The steps for creating the resource are: From Start --> Administrative Tools --> Cluster Administrator Select the group for the Instance that is being worked --> SQLTEST Right click --> New --> Resource Name: KOQAGENT0; Resource Type: Generic Service; Group: SQLTEST Take default of all Available Nodes Do add any dependencies on history disk Service name: KOQAGENT0 ; Start Parameters: -Hkey KOQ\610\SQLVS1 ; Finish Select the Advance tab, and then uncheck the Affect the group Right click --> New --> Resource Name: KOQCOLL0; Resource Type: Generic Service; Group: SQLTEST Take default of all Available Nodes Do add any dependencies on history disk Service name: KOQCOLL0; Start Parameters: -Hkey KOQ\610\SQLVS1 ; Finish Select the Advance tab, and then uncheck the Affect the group Next, bring the two agent resources online Repeat these same steps for the other agent instances, in the test environment: SQLTEST2: KOQAGENT1 and KOQCOLL1 services.
Page 14 of 17 Note: Once control of the agent cluster resource is given to the cluster server, in order to make configuration changes or edit the agent s variables on the node that owns (runs) the agent cluster resource, the agent cluster resource has to be taken off line. This is because in order to change configuration parameters the IBM Tivoli Monitoring agent it must be offline. If the agent cluster resource is not offline when the agent config utility attempts to take the agent offline, the cluster server will notice the agent went off line and attempt to bring the agent back on line. When done with configuration changes for the agent do not forget to bring the agent cluster resource back on line. Windows OS Agent The OS agent monitors information that is both affected (shared disks, processes ) and not affected (memory, CPU ) by cluster resources as the resources are failed over from node to node in the cluster. Therefore the OS agent actively runs on all nodes in the cluster. The IBM Tivoli Monitoring 6.1 agent was not modified to distinguish the differences between cluster affected and non affected resources. OS agent history for those attributes that can move from node to node are only maintained for the time that that node owns the resource. Resources not currently owned by the node may not show at all or may show with values of zero. This depends on how the OS software represents this information. In most cases the information is not shown on the node that does not own the resource. The physical disk attributes is an example of a monitored resource. The node that do not own the resource shows the disk but the attributes value as Zero while the logical disk information and attributes are only shown by the owning node. It should be noted that when the logical disks fails over the system interface requires a finite amount of time to discover this and therefore the agent. A problem was found and fixed during test in IBM Tivoli Monitoring 6.1 Fix Pack 1. This problem is related to the movement of a logical disk. If the move takes long enough to occur and the agent tries to sample disk data (specifically TotalSizeMB) the agent would terminate. With this fix the agent no longer terminates and returns zero for the TotalSizeMB when this error path is taken. How to create situations for Cluster Server entries in the Windows system log Use the Windows OS agent to monitor the system log for cluster services entries. Specify the following values: 1. The Attribute Group equal to nt_event_log 2. Attribute Item: Log Name (Unicode) equal to System (case sensitive) 3. Attribute Item: Source equal to source of the log entry for example ClusSvc 4. Attribute Item: Category equal to source of the log entry for example Failover Mgr 5. Attribute Item: Event ID equal to the desired cluster eventid ie: a. 1201 - the cluster service brought the resource group online b. 1204 - the cluster service brought the resource group offline Exchange Agent This test used Exchange Server 2003 for the Exchange virtual server. Since the test cluster environment was comprised of only a two node cluster only a single instance of Exchange virtual server was setup. Microsoft states only an Active-Active configuration of Exchange virtual server is supported on a two node cluster. This means when one node fails the other node must have the capabilities to run both virtual servers. The Exchange Agent doesn t support two instance of Exchange running on the same system so an Active-Active configuration is not supported.
Page 15 of 17 As part of the testing it was observed that when the Exchange Resource Group was moved from node to node the server down situation event would fire. This is caused by the fact that the Exchange agent comes on line faster then the Exchange server. When the Exchange server comes on line the event will clear. Take action testing was not done for the Exchange agent because the Exchange agent only supports starting and stopping of the Exchange Server services and this is the responsibility of the cluster server. Users should not use Exchange Agent Start and Stop take actions since they conflict with the actions taken by the cluster server. Microsoft SQL Server Agent This test used SQL Server 2000 Enterprise Edition for both instances of the SQL virtual server. Since both the SQL server and the Microsoft SQL Server Agent supports multiple instances running on the same node two SQL virtual servers were used in the test. The failover part of the test verified that one agent would not interfere with the other while running on the same node. As part of the testing it was observed that when the SQL Resource Group was moved from node to node the server down situation event would fire. This is caused by the fact that the Microsoft SQL Server Agent comes on line faster then the SQL server. When the SQL server comes on line the event will clear. Users should not use Microsoft SQL Server Agent Start and Stop take actions since they conflict with the actions taken by the cluster server. IBM Tivoli Monitoring 6.1 Portal Navigation With the change to CTIRA_HOSTNAME and the Tivoli Enterprise Portal Server change in IBM Tivoli Monitoring 6.1 Fix Pack 1, navigation to the agents is easier because the cluster agents appear in the navigation under the cluster name (SQLCLUSTER) while the Windows OS agents would appear under the cluster node names (TIVVM13 and TIVVM14).
Page 16 of 17 Figure 17: Cluster View Without the Tivoli Enterprise Portal Server change the Tivoli Enterprise Portal Server places the agent under the node that the Tivoli Enterprise Portal Server first registers the agent, even if the agent is moved to another node in the cluster. The example below shows the location in the Microsoft Exchange Server agent in navigator tree location if the Tivoli Enterprise Portal Server change was not applied. Figure 18: Exchange Agent Cluster Resource Setup Service Name Clearing Old System Names from IBM Tivoli Monitoring 6.1 Portal Navigation: If the agent appears in the IBM Tivoli Monitoring 6.1 Portal before the CTIRA_HOSTNAME is set to the cluster name or before the Tivoli Enterprise Portal Server change is applied, the agent will appear in the portal under the node it is running on. Once the CTIRA_HOSTNAME is set and the Tivoli Enterprise Portal Server change is applied, the old agent instance must be manually cleared. To clean this up you need to remove the old instance (agent with Manages System Name including the Node Name). Navigate to the Enterprise Managed System Status view.
Page 17 of 17 Figure 19: Navigation to Managed Systems Select old instance and clear the offline entry. Figure 20: Clearing Managed Systems Conclusion For the IBM Tivoli Monitoring 6.1 Windows OS agent, Microsoft Exchange Server agent and Microsoft SQL Server agent there were not significant changes required to get these agents to work in a simple failover environment. The following are the basic changes: a change for the Tivoli Enterprise Portal Server, a fix for the Windows OS agent, setting two environment variables for each of the Exchange and Microsoft SQL Server Agents and creating a cluster resource to control failover for each Agent. The changes to the Tivoli Enterprise Portal Server and the fix for the Windows OS agent to enable Microsoft Cluster Server support have been made available in IBM Tivoli Monitoring 6.1 Fix Pack 1.