FAULT MANAGEMENT SERVICE IN ATM NETWORKS USING TINA NETWORK RESOURCE ARCHITECTURE Chetan P. Chiba, Setumo Mohapi, Hu Hanrahan Centre for Telecommunications Access and Services 1 Department of Electrical Engineering University of the Witwatersrand, Johannesburg {c.chiba; s.mohapi; h.hanrahan}@ee.wits.ac.za ABSTRACT The TINA Network Architecture (NRA) technology-independent abstraction of potentially heterogeneous underlying networks. TINA defines a NRA that covers management areas of, Configuration, Accounting, Performance and Security (FCAPS). The combination of the TINA NRA and the management architecture allow a technology-independent abstraction of network management functionality. This paper addresses the work in progress that aims to design and develop a distributed functionality for an ATM network that is defined, represented and implemented according to the TINA NRA specifications. Keywords: TINA,, Network Architecture (NRA), TMN and ATM. I. INTRODUCTION The purpose of network management is the assignment and control of proper network resources, both hardware and software, to address service performance needs and the network s objectives. With the ever-increasing size and complexity of underlying networks and services, it has become impossible to carry out these functions without the support of automated tools. With the advent of new softswitch architectures, such as TINA, JAIN and Parlay, and service architectures that aims to separate service provision from the underlying networks, there is a need for an appropriate management framework to support these architectures. The TINA (Telecommunications Information Networking Architecture) NRA and architecture offer a generic structure that may be applied across heterogeneous networks to provide this management functionality. The TINA architecture is decomposed into four architectures: Computing, Service, Network and. TINA s Architecture covers the principles and concepts for managing TINA systems and networks and draws heavily on the ITU s TMN architecture. The TINA-C Architecture follows the functional area organization defined in the OSI Framework, namely fault, configuration, accounting, performance, and security management (FCAPS). Although TINA-C embraces all the areas, the work done so far has been focused in selected management functional areas, i.e. This paper reports on work in progress that aims to design and develop a service using the generic components of the TINA NRA and specialising it for an ATM network.. The proposed solution will define requirements for an ATM NRA implementation, design and develop an ATM NRA management based on TINA specifications, that will be deployed on the SATINA (South African TINA) [6] trial environment. Section II of this paper discusses the origins of fault. It also provides a fault management flow diagram and describes the types of fault s as specified by the OSI Alarm Reporting Function. Section III examines fault management for the TINA environment. It highlights the requirements for a fault management system in the TINA environment. Section IV provides a view of how the fault management service fits into the TINA NRA. Section V explains the fault management information model based on the TINA NRIM as reference. Finally, section VI of this paper provides a flow of events (FOE) of the fault management service. A fault scenario is presented to explain the FOE. II. A. Origins of s FAULT MANAGEMENT Network faults can be classified into hardware and software faults, which cause elements to produce incorrect outputs, which in turn can cause overall failure effects in the network such as congestion. 1 This work was supported by Telkom SA Limited, Siemens Telecommunications and the THRIP Programme of the Department of Trade and Industry. Authors address: Department of Electrical Engineering, Private Bag, Wits 050, South Africa. C.P. Chiba is with the ITAS Division of Telkom SA Limited.
Examples of hardware faults are failures of an element due to physical failures and malfunctions due to a failing or a weakness in their logical design, or elements malfunctioning due to simple wear and tear or through external forces such as accidents, acts of nature, being mishandled, or improperly installed. Examples of software faults include failure of elements due to incorrect or incomplete design of their software, or the network due to software bugs (e.g., incorrect packet header processing), and slow or faulty service by the network due to incorrect information (e.g., incorrect routing tables). B. Flow The flow of fault management, shown in Figure 1, can be described as follows: is not Eliminated Physical Alarms NETWORK Collect Alarms Develop and Implement Corrective Plan Verify is Eliminated Logical Alarms 1 4 5 6 is Eliminated Figure 1: The Process [1] Filter and Correlate Alarms Diagnose s Record Events and Analyse 1. The first step in fault management is to collect monitoring and performance s. Alarms can be classified into two categories, physical and logical, where physical s are hard errors (e.g., a link is down), typically reported through an element manager, and logical s are statistical errors (e.g., performance degradation due to congestion). Once the s have been reported and collected, adequate service must be maintained through immediate action.. The next step is to filter and correlate the s. Alarm filtering is a process that analyzes the multitude of s received and eliminates the redundant s (e.g., multiple occurrences of the same ). Alarm correlation is the interpretation of multiple s such that new conceptual meanings can be assigned to the s, creating derived s.. s are identified by analyzing the filtered and correlated s and by requesting tests and status updates from the element managers, which provide additional information for diagnosis. 4. Once a fault has been diagnosed, corrective procedures is undertaken by the network to eliminate the cause of the fault. The fault management system s role in correction is to develop a plan or series of actions, and to initiate this plan with other functions within the network. 5. The correction must be verified through requests sent to the element managers, where if the fault does not disappear, more data is analyzed and the diagnostic process is repeated. C. Alarms and s relevant to TINA Service The following are the types of fault s a fault management service detects [4]: Communication Alarms is associated with general communication failures. They may be reported by the NE level or may be detected at the resource management level. For example, loss of signal, loss of frame, framing error, local node transmission error, remote node transmission error, call establishment error, degraded signal, communications subsystem failure, communication protocol error, LAN error. QoS Alarms is associated with the degradation in the quality of service. They may be reported by the NE level or may be detected at the resource management level. For example, response time exceeded, queue size exceeded, bandwidth reduced, retransmission rate exceeded, threshold crossed, performance degraded, congestion, resource at or nearing capacity. Processing Alarm is associated with a software of processing fault. For example, parameter out of range, or underlying resource unavailable. Equipment Alarm - is associated with an equipment fault. For example, fault in ATM switch. III. FAULT MANAGEMENT FOR THE TINA ENVIRONMENT Within the context of TINA, fault management is related to the service management, network resource management and DPE management areas. This paper is concerned with the TINA network resource management area. D. Functional Requirements The functional requirements of are providing information and computational specification that confirm the TINA-C
telecommunication management. The functional requirements of fault management are []: Alarm Surveillance: Includes collection and logging of from the network resources, and monitor/retrieval of data from them. Localisation: Analyses the collected information, detects the root cause of, and notifies the result to the clients of the surveillance. CC LNC NML-CP NML-FM NML-CP NML-FM NML-CP NML-FM EML-CP EML-FM EML-CP EML-FM EML-CP EML-FM Correction: Is responsible of dealing with the computational objects that represent the resources in which a root cause is detected in order to restore or to recover them from the fault condition. RA RA-FM ATM Switch KEY: CC = Connection Coordinator CP = Connection Performer NE = Network Element LNC = Layer Network Coordinator NML = Network Layer EML = Element Layer Testing Function: Invokes a test capability of a resource object upon a request from the clients of the service. It may also support a test of series of resource objects. Trouble Administration: Enables the reporting of troubles due to fault conditions and the tracking of their status. IV. APPLICATION OF FAULT MANAGEMENT IN THE TINA NRA The TINA NRA provides a model of a transport network that is capable of transporting multimedia information over end-to-end connections and deals with heterogeneous types of traffic. It is a complex and broad architecture dealing with aspects such as connection, fault, accounting and network topology management. The TMN functional layers [M010] relevant in Network Architecture management are the Network Layer (NML) and the Network Element Layer (EML), since both networks and network elements are the resources being considered in the Network Architecture. The ATM Connection Architecture is composed of 5 computational object classes [] (see Figure ), namely Connection Coordinator (CC), Layer Network Co-ordinator (LNC), Network Level Connection Performer (NMP-CP), Element Level Connection Performer (EML-CP) and Adapter (RA). Also, in this diagram, fault management (FM) computational objects are shown attached to the EML-CP and the NML-CP computational objects (CO). This is where the fault management service fits into the TINA NRA. The CO s are EML-FM, NML-FM and RA-FM respectively. Figure : ATM Connection- Architecture Components All fault management services will perform the 5 fault management activities, i.e. surveillance, localisation, correction, and trouble administration. The NML-FM and the EML-FM CO s are further subdivided into -fault management computational object, i.e. the Alarm Manager, the Coordinator and the Test/ Server. These CO s are described further in the next section. The fault management CO s, in Figure, is shown expanded in Figure below. within a Network EML-FM NML-FM 1 NML-AM NML-TDS NML-FC EML-FM 1 1 EML-AM EML-FC EML-AM NML EML EML-FC AM: Alarm Manager FC: Coordinator TDS: Testing/ Server Figure : Basic computational model [] Federation The network resource fault management services are provided by the of the CO inside and outside the fault management area. The CO s identified for the network resource fault manager are []: 1. Alarm Manager (AM) The Alarm Manager (AM) receives fault-related from Managed Objects (MO s) and performs relevant procedures for correlation, filtering, forwarding the
to fault coordinator or fault management service user and for management. Each AM has its own discriminating criteria through which incoming s are logged and forwarded to relevant computational objects in the system.. Coordinator (FC) The Coordinator (FC) includes capabilities to internally analyze s received from multiple MOs to determine next possible step for fault localization/correction. For this purpose, the FC correlates all available information to refine information concerning the root cause of the event in question. During the, the TDS can be invoked to run tests as appropriate.. Testing/ Server (TDS) The Testing/ Server (TDS) is concerned with of MOs for the purpose of service and function verification of MOs. From fault management s view, the TDS is invoked by either fault coordinator or fault management service user. However, it is also possible that the TDS can be invoked by other computational objects in the system, e.g., resource configuration and connection management objects or scheduler. The diagram below, Figure 4, shows the s among CO in fault management functions []. The dotted rectangle shows the functions of fault management and interfaces for fault management services and activities. RC CM PM NTCM Support Data equipment hierarchy connection topology performance data Alarm register Alarm access register Alarm Manager server Alarm report Alarm summary localise req. fault localisation access Coordinator Testing/ Server managed system Figure 4: CO s in [] Figure 4 can be interpreted as those s that are provided by fault management function CO s that manage the network at various levels. V. FAULT MANAGEMENT INFORMATION MODEL The information model defined in TINA-C for the Network Architecture is the Network MO req MO localise reply report MO Testing/ Information Model (NRIM). The NRIM contains the object classes needed for the representation of network resources. The information model is presented in a number of fragments. The fragments show the related object classes that deal with a particular subject and are introduced for an easier understanding of the information model grouping a limited number of object class s definitions in each fragment. The fault management fragment specifies the management support information objects for fault management. TINA fault management functional area addresses the five fault management activities discussed in section.1. Some of the object types specified in the FM fragment and shown in figure 5 are [4,5]: 1. Manageable- Represents the management information that a network resource has to provide so that it can be subject to fault management. This is a subtype of Manageable.. - represents a set of Manageable objects that is controlled by a fault management function. Associated with a fault management domain is a set of policies that govern the fault management of all objects in the domain. This is a subtype of.. AlarmRecord- Represents the information stored in a Log. This is a subtype of LogRecord. 4. CurrentAlarmSummaryControl- Specifies criteria for the generation of a current summary report. 5. AlarmSeverityAssignmentProfile- Specifies the assignment of severity to different types of s. Each profile object may specify different severity assignments. The fault management information model can be seen in figure 5, together with the fault management CO s and how they related to each other. Figure 5 can be divided into sections. Section 1 contains the following objects: domain, manageable resources, management domain and administrative domain objects. The domain object represents a group of information objects instances. Two types of domains are identified in the TINA management architecture i.e. the management domain and the administrative domain. An administrative domain also contains a number of management domains. The manageable resources share an assignto relationship with the management domain object.
1 Manageable 1+ element assignto 1+ set Entity Admin. When a link between two switches becomes inoperable, a communication is sent to the EML-FM component of the EML-CP in which the link is contained. The severity of the s may range from critical to major [4]. This means that the condition is service affecting and immediate /urgent corrective action is required. (e.g., the resource is out of service/degraded). 1 Manageable Testable Localisation Correctable Administration Alarmable Figure 5: OMT Diagram for [4,5] Section contains specialised types of manageable resources object types, e.g. fault manageable resources, configurable resources, etc. The management domain object from section 1 contains specialised management domains i.e. fault management domain, configuration management domain, etc. A number of manageable resources are assigned to the fault management domain. Section contains objects specific to the fault management service. As an example, the able resource flow diagram is shown in detail. Associated with this object, are the different types of s detected, as mentioned in chapter.. It also contains specialised relationship with other objects in section that reside in the fault management domain. VI. Configurable 1+ assignto element Alarm Record AlarmsurveyedBy SeverityAssignment Communications Alarmable Processing Error Alarmable 4 5 set RCM FLOW OF EVENTS OF A TYPICAL FAULT MANAGEMENT SERVICE To explain a typical fault management scenario in the TINA environment, an example of a damaged link between two ATM switches will be used. Figure 6 illustrates the flow graph of the events that takes place when a link between two switches is damaged. Figure illustrates the between the NML_FM and the EML-FM from the time the is received, to the time the fault is corrected. Log Current Alarm Summary Control Alarm Severity Assignment Profile reporter reporter receiver report Communication Error Alarms To report Processing Error Alarms To NML-NTCM forwarded forward 4 correlation filtering forwarding request NML Alarm Manager fault localisation log Support Data equipment hierarchy connection topology NML Coordinator localised fault corrective access EML-AM EML-AM EML-FC EML-NTCM forwarded 1 forwarding correlation filtering log ing forwarding request EML Alarm Manager fault localisation report fault corrective log event log event log Support Data equipment hierarchy connection topology access federation with NML-FCs in other networks EML Coordinator fault correction NML Testing/ Server report fault corrective (with NML) alternate resource setup request/reply MANAGED OBJECTS 6 5 /diagnostic request from FM clients test on MOs EML Testing/ Server test on MOs Figure 6: Flow of Events of a Service on a Damaged Link [4] The flow of events of a typical fault correction process can be viewed in three steps: 1. EML-AM - The communication is first received by EML Alarm Manager (ELM-AM). The EML-AM first makes the corresponding event log and filter the. In case the passes filtering, EML-AM prepares the corresponding that can be referred for further and reporting purposes. The filtered passes the correlation function, which provides redundant removal and initiates fault localisation procedure. In the event that the EML-FC cannot localise the fault, the is then passed to NML Alarm Manager (NML-AM) through forwarding functions. The EML-AM interacts with the EML- Network Topology Configuration Manager
(NTCM) to get the equipment hierarchy and connection topology data.. NML-AM - The NML-AM receives reports from corresponding EML-AM. The reports from an EML-AM does not specify the root cause of the report. The NML-AM interacts with the NML-FC to identify the root cause of s. The fault localisation results are then forwarded to users. NML-AM makes use of NML-Network Topology Configuration Manager (NTCM) to get the equipment hierarchy and connection topology information between subnetworks as supporting data.. EML-FC - The function (of the EML-FC), receives the fault localisation request from the EML-AM, and performs of current s to determine the root cause of a set of related s. During this phase, the EML-FC interacts with for /diagnostic over a set of related resources. The EML-FC performs automatic restoration by activating a back-up resource of a faulty resource. This is done by re-routing the information on a different link i.e. for an example, using the same VP but using a different VC. To get the reconfiguration data possibilities for re-routing, the EML-FC interacts with the EML-Network Topology Configuration Manager (NTCM) to get the equipment hierarchy and connection topology data. A report is then submitted to the EML- NTCM to report the changes in configuration as a result of fault correction. 4. NML-FC - In the event that the EML-FC cannot locate or correct the fault, the is sent to the NML-AM. The NML-AM in-turn interacts with the NML Coordinator (NML-FC). The NML-FC performs for the current s and interacts with NML-TDS to determine the root cause of a set of related s. For the localisation of faults, which span multiple networks, the EML-FC interacts with NML-FC in other networks through federation. During, connection topology between subnetwork is used as supporting data. If the information flow is re-routed, then the NML- FC interacts with the NML-Network Topology Configuration Manager (NTCM) to report the changes in configuration as a result of fault correction. 5. - The EML Testing/ Server () provides capabilities for /diagnostic of a set of resources. Testing is a function verification of a set of resources and diagnostic involves analysing the results of to find out the main cause of abnormal behaviour within the network. The is activated by EML-FC, NML fault management COs, and other FM clients. 6. NML-TDS - The NML Testing/ Server (NML-TDS) provides capabilities for /diagnostic of a set of resources, which span multiple subnetworks or multiple networks. NML-FC, NML fault management users, NML- TDSs in other networks, and other FM clients including resource configuration and connection management activate the NML-TDS. VII. CONCLUSION The paper addressed the work in progress of the development of a fault management system for an ATM network that is defined, represented and implemented according to the TINA NRA specifications. The proposed solution defines a distributed management functionality that is capable of providing management support across heterogeneous networks. The paper describes the requirements for an ATM NRA implementation. The design approach implements the TINA NRIM and shows how the fault management service can be implemented in the TINA NRA. The FM computational objects, i.e. the Alarm manager, Coordinator and the Test/ Server, have been defined and their s have been discussed in providing a fault management service. VIII. REFERENCES [1]. Gurer DW, Khan I, et al. An Artificial Intelligence Approach to Network. http://www.sce.carleton.ca/netmanage/docs/an_ai _Approach.pdf []. Fuente LA, Walles T. Architecture Version:.0 Document No. TB_GN.010_.0_94. December 1994, TINA Consortium. []. C. Abarca, J. Forslow, T. Hanada, et al., "Network Architecture Version.0," Document N0. NRA_v.0_97_0_10, 10 February 1997, TINA- Consortium. [4]. Natarajan N, Flinck H, Rosli RM, Network Information Model Specification Document No. NRIM_v._97_10_1, TINA Consortium, October 1, 1997. [5]. Kawnaoshi M, 94 Report on and Configuration management Doc. No. TR_MK.006_1.0_94. TINA Consortium. January 0, 1995. [6]. F. Scholtz, H.E. Hanrahan, R.A. Achterberg, "The South African TINA Trial: SATINA," Proceedings of SATNAC98, 6 th -8 th September 1999, University of Durban Westville.