SPI for MS Active Directory Replication Monitoring How It Works Introduction The HP OpenView SMART Plug-In (SPI) for Microsoft Active Directory is a critical add-on to any Windows 2000 or Windows Server 2003 management infrastructure. It plays an essential role in monitoring and alarming on directory health as well as replication health issues. It can also be used to monitor trust health in heterogeneous environments of UNIX and Windows. This document describes how the SPI policies determine replication latency. It is important the reader be familiar with directory service concepts. Knowledge of Active Directory is also beneficial. Copyright 2004 Page 1 of 7
Policies Active Directory relies on regular replication of directory objects to maintain health of the domains. When modifications are made to the directory service, the Windows domain controllers (DC) propagate those changes to all other domain controllers. The exceptions are those tasks requiring an Operations Master (FSMO role). One way the SPI for Active Directory ensures a healthy directory is by measuring the time it takes a change to occur on all domain controllers in the domain. When the SPI for Active Directory manages a domain controller it deploys discovery policies. The ADSPI- AutoDiscovery_Rep service auto discovery policy will reside on the managed node and run every day at 2:00am by default. This policy runs the auto discovery program for the Replication monitoring, augmenting the service map created by the WinOS SPI with 2 new service ID s: Replication and Sysvol. Replication auto-discovery policy. Copyright 2004 Page 2 of 7
New services in Service Map. Other replication policies located in a different folder are needed for generating messages and reporting. These policies are also deployed once the services are discovered. The policies that are included in the Replication Monitoring folder are: ADSPI-Rep_CheckObj ADSPI-Rep_InboundObjs ADSPI-Rep_ISM_Chk ADSPI-Rep_Modify_User_Object ADSPI-Rep_ModifyObj ADSPI-Rep_TimeSync ADSPI-Rep_Mon The number of connection objects inbound is an important metric to measure. It s an indication that a bridgehead is getting overloaded, and that a failure has occurred. This is indicative of a large number of DC s that have retargeted their requests because of the (bridgehead) failure. The ADSPI-Rep_InboundObjs is a policy that measures the DRA inbound object/sec counter. By default it will generate messages at Copyright 2004 Page 3 of 7
thresholds of 30 (warning) and 50 (error) objects. These thresholds may be changed by customer requirements once performance of the AD environment has been optimized. The measurement threshold policy ADSPI-Rep_ISM_Chk monitors the status of the Intersite Messaging (ISM) service. It checks whether the service is running or not and how many processes of this service are running. If this service does not run properly, then Intersite replication might have problems and the KCC will be unable to calculate the replication topology. Every 12 minutes this policy will check that the service is running and issue a warning message if it is in any state other than Running. It will generate an error message if the service is not Running state and there are zero processes. View of Windows Server 2003 services. Windows 2000 (Win2K) and Windows Server 2003 use a time service, known as Windows Time Synchronization Service (Win32Time), to ensure that all Win2K computers on a network use a common time. The Windows default authentication protocol requires the service. In Windows domains, time synchronization is crucial because Kerberos uses workstation time as part of the authentication process. The ADSPI-Rep_TimeSynch policy measures the delta between the time master and the local host. If the delta exceeds a given threshold, an alarm is sent to the System Console. The policy runs on a 5 minute interval and launches a program ADSPI_TimeSynch.exe. Copyright 2004 Page 4 of 7
Algorithm There are two objects created by the SPI to monitor replication: OvReplication container and OvReplication user objects. These objects are used separately for replication monitoring. The OvReplication user object is created by the ADSPI-Rep_Modify_User_Object scheduled task policy and the ADSPI-Rep_GC_Check_and_Threshold policy (located in the GC Monitoring policy group folder) for tracking global catalog replication. These objects have the format OvReplication-<hostname> and resemble the following: As seen in the Active Directory Users and Computers window. Copyright 2004 Page 5 of 7
There are two policies that run on every domain controller in the enterprise. One policy (ADSPI- REP_ModifyObj) is responsible for creating a container in the (NTDS) directory that represents the local DC. This object is the OvReplication container and is only created the first time this scheduled policy runs. This policy also sets the current date and time in an attribute of the OvReplication container each time it runs, not every time replication occurs. This is a scheduled policy that runs every 15 minutes by default. The container is then replicated throughout the enterprise (via AD) and the SPI can then identify when the last replication occurred from each DC. The OvReplication container is found in the Active Directory Sites and Services server objects an example of the OvReplication container is below: A second policy (ADSPI-Rep_Mon) queries all the OvReplication containers, reads the data and time attribute, and calculates when last replication of this DC was seen from all other DC and threshold on this value. For example; given 2 domain controllers DC1, DC2 that replicate every 15 minutes: ADSPI-REP_ModifyObj 9:50 DC1: Scheduled policy deployed to run every hour. DC2: Scheduled policy deployed to run every hour. ADSPI-Rep_Mon DC1: Measurement threshold policy deployed. Does not find OvReplication container for any Domain Controller. DC2: Measurement threshold policy deployed. Does not find OvReplication container for any Domain Controller. 10:00 DC1: Creates OvReplication container and set attribute to 10:00am DC2: Creates OvReplication container and set attribute to 10:00am Scheduled AD replication. 10:15 Scheduled AD replication. 10:30 Scheduled AD replication. 10:45 Scheduled AD replication. 11:00 DC1: Policy finds OvReplication_DC2 and reads attribute 10:00am calculates delta, current time minus time attribute (11:00am - 10:00am = 1hour) and thresholds (replication has occurred within the last hour). DC2: Policy finds OvReplication_DC1 and reads attribute 10:00am calculates delta, current time minus time attribute (11:00am - 10:00am = 1hour) and thresholds (replication has occurred within the last hour). Copyright 2004 Page 6 of 7
Note The process described above takes place on both intersite and intrasite domain controllers. The policy ADSPI-Rep_CheckObj is a measurement threshold policy that polls domain controllers daily to find the OvReplication container and if missing issues a message. If a link is broken or a connection object is deleted, the delta will increase until a threshold is violated and a status message is sent to the OpenView management console. You may view the link status easily by using the Active Directory Topology Viewer (ADTV) included with the SPI for MS Active Directory. Conclusion The OvReplication objects are the same as any other object created by a system administrator but are different from the OvReplication containers. The SPI imposes very little resource demands upon the domain controllers because it is using normal Active Directory replication. It is a smart way to see what bridgehead domain controllers are not replicating due to large group update (for example). You may then choose to use ldp.exe or dnslint utility to further diagnose the problem. Copyright 2004 Page 7 of 7