Token-ring local area network management by BARBARA J. DON CARLOS IBM Corporation Research Triangle Park, North Carolina ABSTRACT This paper describes an architecture for managing a token-ring local area network, possibly consisting of several token rings joined by MAC-layer bridges in a network containing several communication subsystems. A four-tier network management hierarchy, consisting of stations, management servers, a token-ring LAN manager, and a communications network manager is presented. Stations participate in the management of the LAN by monitoring themselves and their neighboring stations. Management servers collect configuration and error reports from stations on a single ring segment. The LAN manager can coordinate the activities of the different management servers on its local ring and on rings other than its own, but within one local area network. If the token-ring local area network is part of a larger communications network, possibly a wide-area network, another tier, the communications network manager, is required to provide network-wide management capabilities. An example network configuration is presented in this paper and various scenarios involving the management of that network are described. 423
Token-ring Local Area Network Management 425 INTRODUCTION The management of token-ring local area networks (LANs) involves informing management entities of errors and configuration changes in the network, honoring requests for status, and changing the state of stations attached to the token ring. Information is collected and distributed from centralized entities to allow for a single point of control from which a human operator could monitor and manage the network. A hierarchical management architecture is defined to manage stations and connections in this distributed environment. This hierarchy consists of: token-ring LAN stations, token-ring management servers, a token-ring LAN manager, and, where the token-ring LAN is part of a larger network, a communications network manager. Figure 1 summarizes the management hierarchy for the token ring. The token-ring LAN architecture, a specification of the physical layer and the lower half of the data-link control layer (called the medium access control sublayer) provides sophisticated network management functions. This architecture is defined in the IEEE 802.5 Standard. 1 The token-ring medium-access-control protocol provides management messages to change the configuration of stations attached to the ring, change the state of those stations, and report errors from those stations. These management activities are directed and coordinated by management servers. Three management servers are defined, each performing a different management activity for stations attached to a single token ring. In a local area network consisting of many token rings connected by bridges, each type of management server can be located on each ring to insure that all aspects of each Figure I-Token-ring management hierarchy ring in the LAN are managed. Control and coordination of the activities of the management servers is provided by a token-ring LAN manager. A token-ring LAN manager's responsibility extends across the token rings monitored by the management servers under its control If the token-ring network exists as a part of a larger communications network, possibly including other types of communication subsystems, management of the token ring is included as part of the communications management hierarchy for the parent communications network. Management funcions specific to the token ring can be provided by the token-ring LAN manager, while configuration and fault information critical to the connectivity within the entire network can be reported to a higher management entity called the communications network manager. The concept of a hierarchical management architecture is very useful when several communication subsystems are to be managed from a central location. That is, each subsystem is managed to the extent possible by its subsystem manager; information about configuration and unresolved errors is forwarded to a centralized management facility. The centralized management facility could provide further fault diagnosis, maintain configuration information, and supply information about the health of the network to an operator. TOKEN-RING LAN ARCHITECTURE OVERVIEW A token-ring LAN consists of stations connected sequentially by a series of point-to-point physical links to form a ring. Each station is provided fair access to the shared transmission media through the use of a special bit pattern, called a token. Only one token is present on the ring at any time and it is passed from station to station around the ring. When a station with data to transmit receives (captures) the token, it sends the data it is holding in a frame. The frame consists of a header, the data, and a trailer. The header of the frame contains control information, including the destination address for the information. All other stations on the ring (not using the token) listen to the traffic on the ring while repeating all frames to their downstream neighbors. When a station recognizes its address in the destination address field of a frame header, it copies the frame from the ring. The station also repeats the frame, which propagates around the ring. The sending station removes its frame from the ring. When it finishes transmitting the frame and receives the header of the frame it sent (from around the ring), the sending station transmits a new token on the ring for use by the first downstream station with data to transmit. In this way, each station obtains fair, deterministic access to the transmission media of the ring.
426 National Computer Conference, 1987 Attaching to the Ring When a station attaches to the token ring, it registers addressing information and identifies the product it attaches to the token ring to a management server called the ring parameter server ( described below). The ring parameter server, if present, responds by sending a frame containing the values for operational parameters in use by stations attached to the ring. If the ring parameter server is not attached to the ring, the attaching station uses the default values for its operational parameters. Active Monitor Function One station attached to each ring, called the active monitor, monitors the token on the ring. Any station attached to the token ring can provide the active monitor functions, though only one does so at a time. The token-monitoring functions of the active monitor are necessary to insure that a usable token is always present on the ring. If the active monitor does not recognize a token on the ring for a specified (relatively short) period of time, it purges the ring. During the purge process, all of the stations attached to the ring are reset. The other stations attached to the ring serve as standby monitors that watch the activities of the active monitor and take over in case of failure. Active monitor errors are reported to a management server called the ring error monitor (described below). When a station becomes the active monitor (at initialization time or after a monitor error has occurred), it registers with a management server called the configuration report server ( described below). The active monitor also initiates a periodic poll, called the neighbor-notification process, in which stations identify themselves to their (downstream) neighboring station. The neighbor-notification process enables stations to isolate faults by reporting errors between them and their nearest active upstream neighbor on the ring. It also serves to notify stations that an active monitor is present on the ring. During the neighbor-notification process, the active monitor transmits a special identification frame that identifies itself to its downstream neighbor. The downstream neighbor then identifies itself to its downstream neighbor. This process continues around the ring until it reaches the active monitor again. Then, after a predetermined time, the active monitor initiates the process again. If a station detects a new upstream neighbor during the neighbor-notification process, it reports the new station address to the configuration report server (described below). Note that the upstream station could have changed because a new station attached to the ring or because the old upstream station left the ring. Error Detection and Reporting Two types of errors are detected by token-ring stations: hard errors and soft errors. Hard errors are faults that preclude the operation of a token ring within the normal tokenring LAN protocols. Hard errors detected by token-ring stations include a broken ring and a continually transmitting station. If a station detects a hard error, it broadcasts a special frame, called a "Beacon" frame. The Beacon frame contains the address of the nearest active upstream neighbor of the frame transmitter. All stations attached to the ring receive the Beacon frame; its destination address is defined to be the all-station group address. Upon receiving the Beacon frame, the station identified as the sender's nearest active upstream neighbor removes itself from the ring and executes a test to determine whether it and its attachment to the ring (lobe) are functioning properly. If an error is detected, the station remains out of the ring, thereby bypassing the fault. Otherwise, it reattaches to the ring. After a predetermined amount of time, the Beacon frame transmitter removes itself from the ring and performs the same test that its upstream neighbor did. Again, if this station determines that it or its lobe is not operating correctly, it remains out of the ring and the error will have been bypassed. In this manner, many hard errors can be detected and automatically bypassed without interaction from users of stations attached to the token-ring. However, if the hard error is still present after both stations have had an opportunity to test themselves and their lobes, the token-ring LAN manager is notified and manual intervention is necessary to recover the token ring operation. Soft errors are faults that temporarily degrade the tokenring performance; they are tolerated by the use of errorrecovery procedures. Soft errors detected by token-ring stations include: cyclic redundancy check (CRC) errors in received and repeated frames, and station-congestion errors (a station recognized a frame as being addressed to it, but could not copy it due to insufficient resources). Soft errors are divided into two categories: isolating and non-isolating errors. Isolating errors isolate the location of a fault to a pair of adjacent stations and the transmission medium connecting those stations. Isolating errors detected by stations attached to the token-ring include: an error in a message detected by an error in the CRC appended to that message, a bit in the message that does not represent a zero or a one (see the IEEE 802.5 Standard! for description of the differential Manchester encoding used on the token ring), and an early detection of signal loss on the transmission media. Non-isolating errors cannot isolate the fault on a token ring. The non-isolating errors detected by token-ring stations include: lost frames, stations too congested to receive a frame, and two stations attached to a single token ring with the same address. Token-ring stations periodically report the counts of isolating and non-isolating- errors they detect to a management server called the ring error monitor (described below). TOKEN-RING LAN MANAGEMENT SERVERS Management servers collect information from, and distribute information to, stations attached to a token ring. Responsibilities of the management servers may also include analyzing reports from stations on the ring and forwarding the results of that analysis to the token-ring LAN manager. Each management server's responsibility extends only to the stations attached to its ring. The management servers defined for
Token-ring Local Area Network Management 427 the token-ring LAN are: the ring error monitor (REM), the configuration report server (CRS), and the ring parameter server (RPS). Ring Error Monitor The ring error monitor (REM), collects, analyzes, and may log soft-error reports received from stations attached to its ring. All soft-error reports sent by stations are sent to a wellknown functional address reserved for REM. Therefore, if multiple REMs are present on a ring, they all can receive soft-error reports generated by stations attached to that ring. The function of REM is to determine when a non-random or excessive soft-error condition is present on the ring on which it resides and, if possible, isolate the most probable source of the errors to a fault domain, consisting of two adjacent active stations attached to the ring and the physical medium between them. REM detects excessive soft errors by analyzing soft-error reports sent by stations attached to its ring as they arrive and determining whether soft errors are occurring at a rate that degrades the performance of the token ring. When REM detects such a condition, it may notify the LAN manager, indicating the source of the error. REM maintains a table of weighted error counts for each station attached to its ring from which it has recently received a soft-error report. The weighted error count accumulated for a station is used as an indication of the likelihood that the station is causing excessive errors on the ring. When a softerror report is received, the information contained in the isolating error counters is used to accumulate the weighted error count for the sending station and its nearest active upstream neighbor. When the accumulated error count for a station exceeds a threshold, REM may notify the LAN manager that excessive soft errors have been detected on its ring. REM can provide the addresses of the stations in the fault domain in which it has detected the errors in the notification, thus providing information to allow a human operator to reconfigure the token ring to bypass noisy sections of the ring. Since even random errors may cause the accumulated weighted error count for a station to exceed the threshold eventually, a fixed value is periodically subtracted from the weighted error count for each station for which REM is maintaining a count. As a result of this periodic decrementing of the weighted error counts, only the stations continuously accumulating weighted error counts at a rate faster than the decrement rate will have error counts that grow with time. Configuration Report Server The configuration report server (CRS) collects reports of changes in the order of stations attached to the ring and notifications from a new active monitor on the ring. CRS may also receive commands from the LAN manager to query the stations attached to its ring for certain information, including addressing information, state information, and information about their attached software or hardware, or set the values for operational parameters in stations attached to its ring. The LAN manager can instruct CRS to force a station to remove itself from the ring if, for example, the station were part of a fault domain in which excessive soft errors had been detected (by REM). The information collected and distributed by the configuration report server could be used to maintain a configuration database for the token-ring LAN. Ring Parameter Server The ring parameter server (RPS) is responsible for initializing and maintaining a consistent set of values for operational parameters in use by ring stations attached to its ring. When a station attaches to a ring, it requests the current set of values for the operational parameters being used by stations attached to that ring. The station's request is sent to the well-known functional address reserved for RPS. The request for initialization by an attaching station also contains some registration information pertaining to that station and the product it attaches to the ring. The RPS could forward this information to the LAN manager to notify it that a new station has attached to the ring and to report its characteristics. TOKEN-RING LAN MANAGER A token-ring LAN manager can provide centralized control for all of the management servers in a token-ring LAN. The management servers may be attached to different rings, connected by MAC-layer bridges (see "MAC Layer Interconnection of IEEE of 802 Local Area Networks,,2). Therefore, a centralized LAN manager could provide management function, such as monitoring and controlling stations and physical media for different rings in a multi-ring LAN, from a single point. The token-ring LAN manager coordinates the activities of the management servers in a local area network. It could obtain information about the state of management servers and set the values for operational parameters used by those servers. Examples of operational parameters for which values could be set include counter and counter thresholds maintained in the management servers. Also, for trouble shooting purposes, configuration information and version level information could be made available by management servers. The token-ring LAN manager could also receive unsolicited reports from management servers indicating state changes and errors. For example, when a management server's counter meets its threshold value or the management server detects a configuration change, the token-ring LAN manager would be notified. In this hierarchical management architecture, servers essentially act as surrogates for the token-ring LAN manager on rings other than the one to which the token-ring LAN manager is attached. When the token-ring LAN manager needs to retrieve the status of a station on a remote ring, for example, it would instruct a configuration report server on that ring to obtain the status from the station. The station would then respond to the configuration report server, which returns the requested information to the token-ring LAN manager.
428 National Computer Conference, 1987 Therefore, this architecture provides a framework for information to be exchanged between stations and management servers, and between management servers and a token-ring LAN manager. COMMUNICATIONS NETWORK MANAGER The communications network manager receives reports of anomalies in communications subsystems from the communication subsystem managers. The token-ring LAN manager is the communication subsystem manager that is responsible for the token-ring LAN. The token-ring LAN manager notifies the communications network manager about error conditions resulting in a loss of availability of LAN resources to end users. These conditions include: excessive soft errors on a token ring, hard errors that are not automatically bypassed, and the automatic removal of stations to bypass a LAN error. Also, the token-ring LAN manager may report conditions that hinder its ability to detect errors on the token ring to which it is attached. For example, if the token-ring LAN manager detects an error in its attachment to the token-ring, it notifies the communications network manager. A single operator, using the communications network manager, can monitor all of the communications subsystems in the network, thus reducing the cost and increasing the reliability and availability of the entire communications network. AN EXAMPLE The following example illustrates the use of this hierarchical management framework for the token-ring. The configuration on which these example interactions are based is shown below in Figure 2. The figure shows a local area network consisting of three token rings, connected by bridges (depicted by straight lines between the rings), another separate communication subsystem, and a host containing a communications network manager. Stations attached to the token rings are shown as boxes and management servers residing in stations attached to the ring are lableled inside the boxes. Each ring has stations and the management servers described above attached to it, though only a few stations are shown on each ring. The arrow shown inside ring B indicates the direction of LEGEND: RPS: Rlnv Parameter Server CRS: Confipatlon ~ s... REM: Ring Error Monitor RS: Ring Station Figure 2-An example configuration the token and message flows for this example. The token-ring LAN manager for the bridged LAN is shown attached to ring C. The communications network manager has links with both the token-ring LAN manager and the other communication subsystem manager. If a soft error occurs on ring B and is detected by ring station 3, then that station logs the error and will periodically send a soft-error report to the ring error monitor for ring B, REM (B). The isolating error counts in the soft error report are manipulated and added to the weighted error counts for ring stations 3 and 4, since the error may have occurred at either station or on the transmission medium between them. The non-isolating error counts in the soft error report are accumulated in a count of non-isolating errors on ring B. If soft error reports containing isolating error indications are received at a high enough frequency from station 3, then the isolating soft error count will exceed a predetermined threshold value in the ring error monitor. REM (B) could notify the token-ring LAN manager that excessive soft errors are occurring within fault domain of ring stations 3 and 4 and the connecting medium. Similarly, if REM (B)'s counter for non-isolating errors exceeds a threshold, then the token-ring LAN manager would be notified that a non-isolating error threshold has been exceeded. If the source of the error can be isolated, the token-ring LAN manager could take action to bypass or correct the fault. This action might involve re-configuring the ring on which the fault was detected. In the scenario above, the token-ring LAN manager could instruct CRS (B) to remove ring station 1 from the token ring in order to bypass the fault. Otherwise, the token-ring LAN manager could notify the communications network manager that excessive soft errors are occurring on ring B. The sequence of interactions involving the ring parameter server and the configuration report server are similar. For example, when a ring station attaches to ring B, it requests the values for parameters currently in use by the other stations attached to ring B. RPS (B) responds with the appropriate values. RPS (B), since it then has knowledge of a configuration change on ring B, could notify the token-ring LAN manager of the change. The registration information contained in the original request for parameters could also be forwarded to the token-ring LAN manager. If ring station 4 in Figure 2 removes itself from ring B, a configuration change is detected by its nearest active downstream neighbor: ring station 3. Ring station 3 reports this change to the configuration report server on ring B, CRS (B) and CRS (B) could forward the information to the token-ring LAN manager, which, in tum, may update a configuration database or display the information. The token-ring LAN manager may request information about a ring station on ring B, say ring station 1, by instructing CRS (B) to query that ring station. On receipt of such a request, CRS (B) would obtain the information about the ring station and send it to the token-ring LAN manager. Similarly; the token-ring LAN manager could set the values for operational parameters in the ring station such as the ring number for ring B, by instructing CRS (B) to do so. If the station attaching the token-ring LAN manager to ring
Token-ring Local Area Network Management 429 C detects signal loss, it may report this condition to the communications network manager. The communications network manager could then notify an operator that it cannot manage the token-ring LAN until the fault is corrected. NOTE: If a problem exists in the token-ring LAN manager's station or attachment to the ring, it may not affect the other stations in the network, because of the automatic bypass and recovery procedures built into the token-ring protocol. All of the servers in the multi-ring network report to the token-ring LAN manager on ring C. It is important to also note that servers on a ring may co-reside within a single station on that ring. This is shown on Ring A, where one node houses both the ring error monitor and configuration report server functions. Also, multiple instances of a management server can be attached to a single ring. This case is also shown on ring A, where two ring error monitors are present. CONCLUSION To efficiently manage a token-ring network, management functions are distributed throughout the token-ring local area network (in each station). Stations participate in the management of the token-ring by monitoring the health of the ring and reporting error conditions to management servers attached to their ring. These distributed management functions are coordinated by management servers, which analyze the reports from stations and may forward the results of this analysis to the token-ring LAN manager. The token-ring LAN manager provides a centralized point of control for management functions in a token ring local area network, and may report to a communications network manager which has responsibility for the management of the entire communications network. REFERENCES 1. IEEE Computer Society. Token Ring Access Method and Physical Layer Specifications, ANSIIIEEE Standard 802.5-1985 (ISOIDIS 8802/5). New York: IEEE, 1985. 2. Bernsten, J. A, J. R. Davin, D. A Pitt, and N. G. Sullivan. "MAC Layer Interconnection of IEEE 802 Local Area Networks." Computer Networks, 10(1985)5.