OAM Operations Administration and Maintenance IERU Communications Ltd OAM Rev. A Page 1 of 9
Operations Administration and Maintenance 1. Overview This paper describes the Ethernet and Multi-Protocol Label Switching (MPLS) tools and procedures used to accomplish Operations, Administration and Maintenance (or Management) (OAM). This white paper describes the OAM features and the unique benefits of IERU s solutions. It is a general term used to describe the processes, activities, tools, standards, etc involved with operating, administering, managing and maintaining any system. This functionality addresses to the fault management aspects of the Fault, Configuration, Accounting, Performance and Security (FCAPS). The ITU-T introduced the term Telecommunications Management Network (TMN) to describe a separate network that has interfaces to the telecommunication network. TMN defines interconnection points between the two networks and specifies management functionalities. Network management tasks are grouped into functional areas such as FCAPS. Figure 1 describes the relationship between the telecommunications network and the TMN. Figure 1. Telecommunications Network and the TMN In conclusion, OAM provides failure identification on the physical connection between 2 devices and NMS notification. IERU Communications Ltd OAM Rev. A Page 2 of 9
2. OAM Functions and Mechanism OAM signals belonging to an administrative domain originate and terminate in MEPs present within that administrative domain. A MEP present at the boundary of an administrative domain prevents OAM signals, corresponding to a MEG in that administrative domain, from leaking outside this administrative domain. However, when a MEP is not present or is faulty, the associated OAM signals could leave the administrative domain. OAM objectives determines to provider protected revenues by preventing service outages and offering faster service restoration, maximizing revenue growth by enabling richer service offerings and reducing operational costs by cutting repair costs and operational overhead. Figure 2 describes the Ethernet OAM. Figure 2. Ethernet OAM OAM functions divided by three main categories: 1. OAM Functions for Fault Management Fault Detection Fault Verification Fault Isolation Fault Notification Protection Switching 2. OAM Functions for Performance Monitoring Frame Loss Measurement Frame Delay Measurement Frame Delay Variation Measurement 3. OAM Functions Discovery Diagnostics Maintenance Channel IERU Communications Ltd OAM Rev. A Page 3 of 9
3. OAM Process Flow Figure 3 shows the service provider process flow from the fault is appeared in the network until ending after verification of the repair. Each step should be optimized to protect both the service provider and the end user. Figure 3. OAM Process Flow Fault Detection Fault detection includes mechanisms to detect faults at the device control plane or data plane level. Faults must be detected quickly enough to minimize Time to Recover (TTR). However, detection should be based on an observation window large enough to avoid false fault detections. For example, a management system can become non-responsive for a few microseconds while handling a burst of interrupts. As long as the control plane is restored to a normal state within an acceptable time window, the network element does not experience a software failure. OAM handles a wide range of failure scenarios that vary in nature and location, from a software defect to a backhoe tearing apart a fiber conduit by mistake. Fault Notification Once detected by the network element layer, the fault needs to be conveyed to the entities that will work toward repairing the fault. Such entities can require either human or automated servicing such as the manual replacement of a faulty transceiver, or a Rapid Spanning Tree Protocol (RSTP) reconvergence after a link failure, respectively. IERU Communications Ltd OAM Rev. A Page 4 of 9
Fault Verification After notification, the Network Operation Center (NOC) manager should verify the fault, and determine whether the condition persists. By the time the link fail indication is received, the Ethernet network will have re-converged. Failover and restoration with IERU s Carrier Ethernet Service Delivery devices takes less than 50 milliseconds (According to CFM standards). Fault verification using on-demand OAM techniques eliminates false failure indications. Not verifying the validity of the fault could lead the network operator to try to isolate a failure that does not exist. Fault Isolation Fault isolation consists of determining the exact source, location, and nature of the fault, including the specific network element(s) and network layer(s) experiencing the fault. A failure at a low level may impact higher levels and lead to additional failures. For example, a link failure can lead to broken MPLS tunnel connectivity, also impacting all of the MPLS VCs that tunnel carries. Notification of a low-level failure can be followed or surrounded by higherlevel failure notifications. This process makes fault isolation more difficult, time-consuming, and costly. Features such as alarm correlation help minimize the cost of isolating a fault by decreasing the number of fault notification messages. Repair Repair depends on the efficiency of the OAM process. Repair Verification After a remedy is enacted, the same on-demand OAM mechanisms used during fault verification confirm that the fault no longer exists. An IP ping can be used both to verify IP connectivity faults on the control- plane and restore connectivity. IERU Communications Ltd OAM Rev. A Page 5 of 9
4. OAM Protocols Organization OAM is a general description for the group of several protocols on different network layers. With the addition of comprehensive OAM capabilities, Ethernet and MPLS offer a complete feature set that allows carriers to maximize Ethernet based service revenue. IEEE, IETF, ITU-T, and MEF now describe mechanisms that report the status of a given end-to-end service, representing to the administrator view the network, and provide link connectivity information of the network. After the OAM and the process flow were defined, figure 4 shows, in general, the assignment standards and protocols into OAM process flow. Figure 4. OAM Protocol Organization IEEE 802.3ah Ethernet First Mile (EFM) OAM Ethernet in the First Mile (IEEE 802.3ah) - Defines mechanisms for monitoring and troubleshooting Ethernet access links. Specifically it defines tools for discovery, remote failure indication, remote and local loopbacks and status and performance monitoring. OAM stands for Operations, Administration, and Maintenance (or sometimes Management). It is a general term used to describe the processes, activities, tools, standards, etc involved with operating, administering, managing and maintaining any system. OAM Benefits Auto-Discovery Eliminates the need for operator configuration. Uni-Directional Fault Signaling Enables the detection of a one-way link failure. Remote Loopback Provides on-demand link diagnostics, including biterror rate approximation and link length. Link Monitoring Offers proactive, traffic-based threshold link monitoring. Critical Events Supports communication of network element conditions that may cause link failure, including power and temperature. Layer 2 Variable Retrieval Allows supplemental link statistics collection, augmenting by SNMP, RS232, TCP/UDP and RMON. IERU Communications Ltd OAM Rev. A Page 6 of 9
Organization Specific Extensions Enables standards development organizations and vendors to expand the scope. IEEE 802.1ag Connectivity Fault Management IEEE 802.1ag (CFM) standard provides for failure identification of service continuity by monitoring the logical connection. It utilizes Continuity Test (CCM) messages to detect failures caused as a consequence of a physical path or logical connection interruption or even miss configuration of services. The CFM protocol uses mechanisms similar to ping and traceroute through (LBMs) - loopback messages and (LTMs)- Link Trace messages, allowing improved recognition of the correct failure point in the network. Different domain levels for maintenance or management (MDs) allow each operator, services provider, or customer the complete management of their network without interference or exposure. This advanced management capability allows the operator to adopt an individual approach to each customer, isolating and re-establishing circuits without interfering with other services, or other clients. ITU-T Y.1731 Fault management and performance monitoring (ITU-T Y.1731) - Defines performance monitoring measurements such as frame loss ratio, frame delay and frame delay variation to assist with SLA assurance and capacity planning. For fault management the standard defines continuity checks, loopbacks, link trace and alarm suppression (AIS, RDI) for effective fault detection, verification, isolation and notification in carrier networks. IEEE 802.1ab LLPD The Link Layer Discovery Protocol (LLDP) is a vendor-neutral Data Link Layer protocol used by network devices for advertising of their identity, capabilities, and interconnections on an IEEE 802 LAN network. The protocol is formally referred to by the IEEE as Station and Media Access Control Connectivity Discovery specified in standards document 802.1ab. LLDP performs functions similar to several proprietary protocols, such as Cisco Discovery Protocol, Extreme Discovery Protocol, Nortel Discovery Protocol (also known as SONMP), and Microsoft's Link Layer Topology Discovery (LLTD). Information gathered with LLDP is stored in the device as a management information database (MIB) and can be queried with the Simple Network Management Protocol (SNMP) as specified in RFC 2922.The topology of an LLDP-enabled network can be discovered by crawling the hosts and querying this database. Information that may be retrieved includes: System name and description Port name and description VLAN name IP management address System capabilities (switching, routing, etc.) MAC/PHY information MDI power Link aggregation IERU Communications Ltd OAM Rev. A Page 7 of 9
5. DMSwitch OAM Implementation The DMSwitch provides connectivity management adhering to the latest industry standards; IEEE 802.3ah (OAM) and 802.1ag (CFM), enabling the Metro Ethernet operator to detect, isolate and verify failures end-to-end, on the entire network span. The DMSwitch is capable of identifying the remote devices configuration parameters and constantly monitors the quality of the physical connection. In the event of failures or even power loss (also known as dying gasp ), the DMSwitch transmits the information to the other switch on the link via predefined messages on the OAM protocol (OAMPDUs). The device allows for remote loop tests and validation of the SLA in order to isolate the failure. The DMSwitch provides performance monitoring and measurement of packet loss, packet delay and throughput according to ITU-T Y.1731 recommendations. In addition to OAM and CFM standards, an advanced NMS feature provide constant monitoring of physical cable connections, identifying discontinuities and short-circuits immediately and determining the approximate distance to the failure point. Protection Mechanisms Classic, Rapid and Multiple Spanning Tree protocols are available such as RSTP providing shorter conversion times and EAPS protocol specific for sub-50 ms Ethernet rings. Port trunking functionalities enable the grouping of physical ports into logical ports with automatic load balancing and with typical sub-200 ms recovery times enabling the construction of high throughput Metro Ethernet application topologies with protection and short failure recovery times. Robust Security Mechanisms - The DMSwitch series is featured with mechanisms that guarantee security in the operation and maintenance of the install base. All communication protocols are encrypted and can specify, via filters, which units of the network may have management access to other equipment. A highly secure and trustworthy management structure can be built using local and remote Syslog access, while user authentication is provided through RADIUS and TACACS+ servers. E-mail alarm notifications, single SNTP clock, protection against DoS-Denial of Service, attacks and IEEE 802.1x port authentication protocol further enhance protection of the devices. For Metro Ethernet applications the quantity of MAC addresses may be limited per port and per VLAN. Bandwidth in every port can be limited for broadcast, multicast and DLF Destination Lookup Failure, traffic. Additionally Layer 2 and Layer 3 protection mechanisms may be utilized to avoid network attacks. IERU Communications Ltd OAM Rev. A Page 8 of 9
For more details and information please contact our staff: market@ieru.net or helpdesk@ieru.net. IERU Communications Ltd 2 Ha'Haroshet St., PO Box 2197 44641 - Kfar Saba, Israel Tel: +972 (9) 7662599 Fax: +972 (9) 7651093 e-mail : helpdesk@ieru.net www.ieru.net IERU Communications Ltd OAM Rev. A Page 9 of 9