The EMSX Platform A Modular, Scalable, Efficient, Adaptable Platform to Manage Multi-technology Networks A White Paper November 2002 Abstract: The EMSX Platform is a set of components that together provide an element management system (EMS) for managing various equipment such as SONET, WDM, and ATM switches. The platform takes advantage of some of the latest software technologies to provide a highly scalable, efficient, fault tolerant EMS. A modular design was used to promote reuse of code, and to allow extensions such as custom southbound or northbound interfaces to be made without impacting the core EMS modules. The core information model is based on TMF standards, and is extensible. Prepared by: Transmission Development Department (TDD) Optical Network Systems Division (ONSD) 14040 Park Center Road, Herndon, VA 20171 NEC America 1
1 Overview This document is a white paper that describes the features, capabilities and architecture of EMSX, a robust platform to develop EMS applications for multi-technology networks. Over the last several years, as networks have grown larger and more complex, requirements for the element management system (EMS) have also grown more demanding. For example, it is no longer uncommon for a network to contain thousands of network elements. Considering that hundreds or even thousands of objects may be needed to model a single complex network element, the total number of objects managed by an EMS can easily number in the millions. While the networks have grown to include thousands of elements of various transport technologies, the number of operations within the network has also increased. There is also a need to reduce the provisioning time, increase flexibility, and reduce errors in provisioning. High availability is also expected from EMS to reduce loss in revenue. In addition to growing larger and more dynamic, networks are also becoming more diverse, often including several different network element products, from one or more vendors. This has created the need for the telecommunications industry to develop standards for managing such networks. For example, TMF (Telecommunications Management Forum) has developed standards to model multi-technology transport networks. As a result, it is often necessary for an EMS to support standards based northbound interfaces to a service provider management system. A customer may also impose additional interface requirements such as remote access to the EMS via the internet, for example. Hence, the EMS must be designed to support multiple external interfaces, as needed. Another major consideration in the development of an EMS is the length of the development life cycle. In many cases, since the availability of an EMS is considered as important as the network element itself, an EMS development team must be in a position to develop an EMS in a timely manner. Otherwise, the marketability of the equipment may be impacted. In light of these and other demanding requirements, the design and architecture of an EMS have become critical issues. EMSX is an EMS platform that is designed to provide a reliable, scalable, fault-tolerant platform for managing network elements. The following list includes some of the features and benefits of the EMSX platform and its design and architecture. These features and benefits are described in more detail in the remaining sections of this document. 2
High availability through a fault tolerant architecture that protects against hardware, software and network connectivity failures High scalability through a distributed architecture that supports virtually unlimited network growth Adaptability to various northbound and southbound interfaces through use of a modular design Versatility of a single EMSX instance to manage multiple network element types as well as providing different types of external interfaces A MIB (Management Information Base) module designed to provide high performance, data integrity, and ease of programming Support for multiple network element technologies based on an industry standard information model designed specifically for this purpose A modular design that reduces the costs associated with maintenance and enhancements Portability across operating systems via the use of Java, CORBA, XML and other technologies that support multiple operating systems 3
2.1 ed Approach 2 Key Architectural features Fig 2.1 shows the ed approach used in the EMSX. Essentially these s can be thought of as functional modules of EMSX. The following sub-sections describe the s in more detail. Network Management System (NMS) Gateway EMSX Platform Core EMS CC Sub DC Sub Communication Protocol stack Network Elements Fig 2.1 s of EMSX architecture 2.1.1 Gateway Based on the concepts of TMN (Telecommunications Management Network) architecture, there is a need for an EMS to interface with the NMS (Network Management System) using standardized objects and protocols. The gateway encapsulates this functionality. This lightweight translation module handles requests and responses to the NMS. The requests from upper NMS systems are forwarded to the core EMS for processing; the responses and autonomous events are forwarded to the upper NMS. 4
By separating this functionality and implementing it as a separate, the EMSX platform allows for rapid development of new interfaces that adapt to specific service provider technology requirements. The current version of EMSX includes an implementation of the NML-EML interface specifications of the Telecommunication Management Forum, specifically the TMF814 specification. This implementation of a gateway supports a CORBA interface to manage the SONET ADM, WDM and ATM transport network elements. Interfaces can be developed for other specifications that use protocols such as SNMP, CMIP etc., as well. 2.1.2 Core EMS The Core EMS implements the EMS business logic, including the portions of FCAPS (Fault, Configuration, Accounting, Performance, Security) functionality that are relevant to element and network management. The primary function of this is to maintain the MIB for the network that is being managed. This includes the processing of requests from the GUI (Graphical User Interface) and northbound clients, and the processing of autonomous messages such as alarms and performance data. Much of this processing is built upon the EMSX MIB module, which is described in Section 2.4. To support scalability through distributed processing, the core EMS is subdivided into two subs: the central controller and domain controller subs. These subs, and their role in supporting a distributed architecture are described in Section 2.2, Distributed Architecture. 2.1.3 The mediation translates MIB operations into NE specific commands and responses. Likewise, it translates autonomous messages from an NE into the EMS information model. The set of mediation functions for any single type of NE is referred to as an adaptation module for that NE type. Different types of NEs may be managed by a single instance of EMSX by installing appropriate adaptation module for each of the NE types to be managed. The separation of the mediation from the core EMS allows new adaptation modules to be developed and deployed without requiring additional changes to the core EMS. While the EMSX platform is flexible enough to accommodate different management interfaces such as TL1, SNMP, CORBA, etc., the current implementation of the platform provides some specific tools that are useful for developing adaptation modules for network elements that support a TL1 management interface. These tools can be used to build and parse TL1 commands and TL1 autonomous messages for a specific NE based on the syntax rules provided by XML configuration files for that NE type. 5
2.1.4 Communication The communication is responsible for establishing and maintaining communications between the EMS and the network elements that are being managed. It handles functions such as network protocol stack interface and NE connection management (establishment, maintenance, retry logic, disconnection etc.). Network elements usually support one of the following communication stacks: TCP/IP, native OSI, OSI over TCP. The EMSX platform is able to support these protocol stacks. 2.2 Distributed Architecture One feature of the EMSX platform that supports high scalability is its distributed architecture. This distributed architecture is implemented in terms of central controllers (CCs) and domain controllers (DCs). A domain controller is a set of processes and data that is able to manage a set of network elements called a domain. For a relatively small network, a single domain controller may be able to manage the entire network. However, for very large networks, where a single workstation would be overloaded with the task of managing such a large network, the network may be partitioned into a set of domains. Each domain could then be managed by a single domain controller running on a dedicated workstation. Note that there are no software limitations regarding the size of a domain. Rather, the size of a domain is typically limited by physical resources, such as disk space, memory, or the processor speed of a workstation. A central controller is a set of processes and data that provide a unified interface to EMSX clients. All requests, such as GUI or NMS requests are sent to the central controller. The central controller then distributes the requests to the appropriate domain controller or domain controllers for completion. Once the processing for the request is complete, the central controller returns the response to the requestor. Additionally, the central controller handles the autonomous messages received from all domain controllers and forwards them to the EMSX clients. In this manner, the central controller provides a single point of interface to all northbound clients. Figures 2.2 and 2.3 show two typical EMSX configurations. 6
Host1 NMS Gateway Gateway mib CC sub DC sub Core EMS Comm Communica tion NE1 NE2 NEn Fig.2.2. EMSX Single Host Configuration Figure 2.2 shows the simple case where the network is sufficiently small so that it can be managed by a single workstation. In this case, the central controller and a single domain controller are configured to run on the same single workstation. 7
NMS Host1 CC Gateway Gateway CC sub Core EMS DC-1 mib Host 2 Host 3 DC-2 DC Sub mib DC Sub Host 2 Comm Host 3 Comm Communication NE1 NE2 NEn NEx1 NEx2 NExn NE Domain1 NE Domain2 Fig 2.3. EMSX Distributed Configuration Figure 2.3 shows an example where the network is sufficiently large (or is expected to grow) so that two domains are needed. In this case, each domain controller runs on a dedicated workstation. In this example, the central controller is also configured to run on a dedicated workstation, so that the central controller does not compete with a domain controller for resources. 2.3 Fault Tolerance EMSX provides fault tolerance against process, host, and site failures. At the process level, each EMSX process is monitored by a system monitor. Should failure of a process be detected, the system monitor automatically restarts the failed process. The system monitor also provides manual functions that allow a system administrator to start and stop any EMSX process. The system monitor also monitors other important system resources such as memory, hard disk space, etc. 8
When a problem is detected or any user-defined threshold is crossed, the system monitor generates an alarm, which is referred to as EMS alarm, or an alert, which is referred to as EMS alert. EMSX can be configured to implement a fault tolerant system, wherein recovery from host failure is facilitated. An example of this configuration is shown in Fig 2.4. NMS Fault tolerant EMSX CC FT Pair Host1 Gateway Host 2 Gateway Gateway CC sub (active) Synchronization CC Sub (standby) Core EMS DC-1 FT Pair DC-2 FT Pair Host 3 Host 4 Host 5 Host 6 mib DC Sub (active) Comm Synchron ization mib Host 3 DC Sub (standby) Comm mib DC Sub (active) Comm Synchron ization mib DC Sub (standby) Comm Commu nication NE1 NE2 NEn NEx1 NEx2 NExn NE Domain1 NE Domain2 Fig 2.4. EMSX in Fault-tolerant and distributed configuration Fault tolerance at the DC level is accomplished by configuring a backup DC. In this configuration, two DCs, referred to as a DC pair, are used to manage a single domain. One of the DCs runs in the active mode and the other in the standby mode. In the active mode, a DC operates normally, processing requests from the CC, as well as processing autonomous messages from the NEs in its domain. In the stand-by mode, a DC keeps its 9
MIB in synchronization with that of the active DC; it does not do normal processing as the active DC does. If the active DC fails, the stand-by DC takes over all the processing. This switching process is referred to as a DC switch. Since the standby DC MIB is in synchronization with that of the active DC, a DC switch is fast and reliable. Once a DC pair has been configured, a DC switch is automatically initiated in case of an active DC failure. A DC switch is transparent to end-users, such as a GUI operator. However, the user is notified of the DC switch via an EMS alert, and may take any required corrective action. Fault tolerance at the CC level is similar to the DC level fault tolerance. A CC pair consists of an active and stand-by CC. Setting policies for fault management for both DC and CC pairs are part of EMSX system administration functions. This can be done in different ways. For example, a DC pair can be configured such that the stand-by DC automatically takes over if it detects that the active DC is down, or can be configured such that the operator will always manually control the DC switch. In addition, the system administrator can force a switch at any time (for example, when software update needs to be done.) 2.4 Multi-technology management and information model EMSX has the capability to serve multiple northbound NMS interface standards as well as manage diverse southbound network elements (NEs) from multiple vendors. An architectural feature that aids in achieving this flexibility is the powerful MIB (management information base) at the EMSX core that is used to hold a real-time representation of the network. Together with the MIB module, which provides programming API for all MIB operations, EMSX provides a robust and flexible information model. 2.4.1 EMSX Management Information Base (MIB) The EMSX MIB module supports a wide variety of managed objects (MOs) such as managed element, equipment, termination points, topological link etc., and a range of operations on them. While the designed attributes of the MOs should be sufficient to model features of most NEs, it is possible to incorporate user-defined attributes into the MOs via XML configuration files. This makes the MIB extensible and adaptable to different NE types. The MIB is organized in a tree structure (referred to as the management information tree, MIT or simply tree) that captures the containment relationships of the MOs, and provides fast and efficient access to any MO within the MIB. 10
2.4.2 MIB Persistence A persistent MIB is an essential feature of an EMS. Among other things, a persistent MIB provides an important role in system recovery in case of a failure. It also serves as an inventory of the managed network, which can be easily analyzed using SQL queries. MIB persistence is tightly integrated into the MIB module. Through this tight integration, the MIB module provides automatic persistence of its MO data and takes care of synchronizing the data in memory with the persistence data. Whenever an MO is created, updated or deleted, the changes are automatically applied to the MIB database. This frees the application programmer from the need to write any database interface code related to the MIB. The tight integration with MIB persistence also enables the MIB module to provide other features that help the application programmer manage system performance and resources related to MIB processing. For example, although the MIB is maintained in a database, the MIB module has the facility to load and keep portions of the MIB or the entire MIB in memory to allow fast and efficient execution of certain MIB operations. To achieve a good balance between memory usage and performance, the MIB module allows the user to specify the depth of the tree to be kept in memory. Once it loads all MOs up to the specified depth from database, the module retains them in memory all the time. The module performs all subsequent operations on those MOs in memory, thus increasing performance for operations on those objects. When objects beyond the specified depth are accessed, the MIB module loads them dynamically from the database. Once the MIB operations are completed for these objects, the corresponding memory is released. To achieve a good balance between system startup time and performance, another runtime control is provided. This control specifies the depth of the tree that is loaded into memory during application start-up. By specifying a low tree depth, system startup time may be reduced. On the other hand, a higher depth eliminates the MO load time that occurs when any unloaded MO is first accessed. 2.4.3 Transactional support The MIB module provides transactional support with standard features such as rollback and commit, for all operations (i.e. creating MOs, deleting MOs, and updating MO attributes). This feature allows complex operations spanning multiple MOs to be executed in a reliable manner that ensures data integrity, including synchronization of the data in memory and the database. The need for writing complex code to manage transactions in the business logic is eliminated by this feature. 2.4.4 Automatic Notifications 11
When its modification operations (create, delete and update) are invoked, the MIB module automatically generates certain types of notifications such as create notifications, attribute value change notifications and state change notifications. A notification policy specifies the MOs and attributes for which notifications are generated. A notification policy may be specified via an XML file. This feature frees the user from having to write code to generate such notifications. 2.5 Message Oriented Architecture Typical EMS software may be required to process a huge amount of data and perform a large number of operations per second. IPC (Inter-process Communication) architectures based on the synchronous method-oriented paradigms do not scale up well under such severe stress conditions and often suffer resource and performance problems. These factors are critical when the EMS is based on a highly distributed architecture. EMSX uses PlatformX a message based application development framework for its IPC needs. Using Platform-X, EMSX processes send asynchronous messages to request action or perform operations in other processes. Since the responses to these requests are also sent as asynchronous messages, the sending threads need not be blocked, waiting for the responses. This results in better utilization of thread resources and provides better system response. The benefits are significant when processes communicate across a network and communicate with network elements across a network. 2.6 Reusable Components EMSX is built using reusable components where possible. These components serve as a set of services and they can be used to develop any object-oriented application. More description about these components can be found in some of the references. 2.6.1 Platform X This package is essentially a framework for developing distributed multi-threaded applications. To promote high performance for demanding applications, PlatformX employs the message-oriented paradigm, using JMS transport, as opposed to the methodoriented paradigm used by technologies such as CORBA and EJB applications. The framework provides useful classes for tasks such as sending messages and correlating responses to requests, thus eliminating the need for the application programmer to deal with tedious details when developing a message-oriented application. Additionally, the framework provides support for fault tolerant processing. PlatformX provides a versatile and comprehensive framework for implementing IPC mechanism in the distributed processing of EMSX ed components. 12
2.6.2 DB Mapper The DB Mapper package serves as an object-relational mapping tool to persist java objects to relational databases. This package implements a Data Access Object (DAO) pattern that allows an application programmer to execute the typical create, retrieve, update, and delete (CRUD) operations on a relational database without writing SQL code. The package accomplishes this by using XML configuration files that specify the object-relational (OR) mapping of Java classes together with the Java reflection API to generate the necessary SQL statements on the fly. This greatly reduces the effort needed to program the typical CRUD operations used by an application. This package also uses the JDBC interface so that it is portable across any database that implements JDBC. EMSX uses this package for all database related operations. 2.6.3 System Monitor Any mission critical system that may need to be available on a 7x24 basis needs to automatically monitor itself. High system availability is achieved by monitoring various processes and automatically restarting them after failure. In addition to the basic monitoring of the processes themselves, other system resources such as disk usage, memory usage, status of third party applications, etc., must also be monitored. The system monitor package provides these functions. The System Monitor package provides the functionality needed to perform basic system tasks such as system start-up, system monitoring and system management. This package uses an XML file that specifies information such as the set of processes to be started and monitored, their command lines, health check interval, etc. Based on this input, the system monitor package is able to perform its functions without any further application programming. This package can be reused in any programming environment where it is important to coordinate aspects of system start-up, such as the start-up sequence, as well as continued monitoring and automatic restart of failed processes. The package also provides a user interface to perform all system monitor administration functions. 2.7 Portability EMSX is developed in Java. Hence, EMSX is portable across most operating systems and hardware platforms. Java has evolved into a mature language and it is currently used for developing mission critical applications. By using current standards of Java technology, EMSX achieves portability and reduces technology obsolescence. 13
14
3 CONCLUSION The design and architecture are very critical to developing an EMS to meet the demands and requirements of the telecommunications industry. EMSX provides a software platform that has been developed taking many of these requirements into consideration. The EMSX product, which is built as a framework of components, provides a strong basis for development of a custom element management system. EMSX incorporates a fault-tolerant, ed and distributed architecture that provides a good starting point for building a robust product. EMSX can be used by any vendor or service provider to manage or provide an interface to manage a network of varying sizes. It allows an EMS vendor to support customized northbound or southbound interfaces without building the detailed infrastructure required for supporting fault tolerance, recovery etc. EMSX provides a robust infrastructure upon which the FCAPS management functionality can be developed in a rapid and costeffective manner. 15
4 List of references 1. dbmapper White Paper 2. dbmapper Users Guide 3. PlatformX White Paper 4. PlatformX Users Guide 16