Managing MPLS with Service Assurance Tools Whitepaper Prepared by www.infosim.net August 2006 Abstract MPLS provides the foundation for the offering of next-generation services and applications such as Voice over Internet Protocol (VoIP), Video conferencing/streaming, and Virtual Private Networks (VPNs). MPLS supports quality of service (QoS) to guarantee service level agreements (SLAs). With network management (NMS), service providers can optimize their networks to meet the unique resource requirements of each class of service as they roll out and operate new, converged services over MPLS core networks. The importance of QoS heights the need for visibility into service quality parameters across a more complex core network and the ability to identify the root cause of problems.
1. MPLS Technology The global service provider and enterprise markets are undergoing an evolution, migrating from traditional circuit-switched networks to next-generation, packet-switched IP-based ones. Many service providers are implementing or evaluating Multiprotocol Label Switching (MPLS) as part of their migration strategy. MPLS promises to significantly improve network management effectiveness, reduce network operating costs and strengthen providers competitive position. In MPLS, each packet carries a virtual circuit identifier, called label, as a field of a shim header inserted between the IP header and the MAC/link layer header of a packet. A single packet can carry more than one shim header. The set of all headers carried by a packet is called an MPLS stack. MPLS handles labels just like all other virtual circuit identifiers are handled in other virtual circuit switching technologies. When a packet arrives at the first MPLS router, also called ingress Label Edge Router (ingress LER) of the MPLS domain, the source and destination IP addresses of the packet are analyzed and the packet is classified in a Forwarding Equivalence Class (FEC). All packets within the same FEC use the same virtual circuit, called Label Switched Path or LSP. Then, the ingress LER inserts or pushes an MPLS header on the packet. Subsequent routers of the MPLS domain update the MPLS header by swapping the label (L1 against L2, L2 against L3, ). Finally, the last router of the LSP, called egress LER, removes or pops the MPLS header, so that the packet can be handled by subsequent MPLS-unaware IP routers or hosts. - 2 -
Providers recognize the potential of MPLS to reduce costs by consolidating their core networks onto a common network infrastructure and reduce the time it takes to deploy new services. Services such as Virtual Private Networks (VPNs) and Voice over IP (VoIP) can be offered at different Quality of Service (QoS) levels. MPLS offers traffic engineering capabilities such as shaping to increase efficiency of over-utilized (congested) resources by directing traffic to resources that are underutilized. - 3 -
2. MPLS Challenges Service providers implementing MPLS face significant challenges. To successfully migrate to MPLS and fully realize its benefits, providers must closely manage network availability and performance, adding yet more challenges to overcome. This paper provides a brief overview of how providers can meet these challenges with global-class network management. Network Engineers face a lot of challenges managing an MPLS service. Ensuring high quality delivery of all applications and services across an MPLS network, at all times, for all offices and users essential for today s enterprises required new capabilities. Managing an MPLS core is difficult as one must accurately assess the severity of any failure and its impact on customers. How to determine the impact on customers and services when a network interface goes down? There is a need to know which customers services are affected, and the priority of the failures. Redundant networks with traffic engineering and tunnelling means it s not always clear whether a network failure translates into a customer service failure. How to get information on real VPN performance, not just devices? It is a requirement to know where VPNs are located, the available bandwidth of each access point and how much traffic is going through each. How to monitor VPN specific service-levels? Service-level thresholds must be monitored instead of just interface level threshold. For example, all error rates and discard rates of all the interfaces that participate in one VPN. In an MPLS network, faults can occur as a result of router or link failures. When this occurs in the core network, LSPs are typically rerouted quickly. The service will fail but may be impacted. This is especially the case in today s over-dimensioned networks. Failures and the resulting traffic reroutes can lead to lots of noise at the operation center. Sifting through the deluge of incoming information to identify those events that can affect service is a major management challenge. An easy-to-use root cause analysis capability is required. When a critical error occurs, time is of the essence to identifying the service and customer in jeopardy. A good network management solution aggregates and prioritizes event notifications and presents network information in one comprehensive display, common to all team members, with drill down capabilities to pinpoint the service and customer affected to further hasten problem resolution or perhaps avoidance of degradation altogether. - 4 -
3. MPLS Management Assuring QoS and SLAs It is crucial for service provider to continually meet their established SLAs as their business success depends upon it. To overcome this challenge, service provider continually monitoring the network performance of each class of service and ensuring SLAs are being contractually met. A NMS for fault and performance management must be able to manage the entire MPLS core and provides both active testing and passive monitoring. This is especially important for the quality delivery of real-time voice and video applications that can be affected by minute network issues, which may have no effect on data transmission. By enabling service event and performance visibility in the MPLS core, the NMS monitors KPIs and sets thresholds alerting providers of potential service degradation. Service Quality Analysis When an MPLS network is deployed and operational, service providers must ensure conformance to SLAs to ensure user satisfaction and avoid financial penalties. If elements of the network fail or are congested, services and the business can be directly impacted. Therefore, performance monitoring is becoming increasingly important and increasingly complex. Networks especially converged networks support a growing range of traffic types, each requiring specific traffic engineering and QoS handling at Layers 2 and 3. For example, when a service provider and its enterprise customer enter into a contract for an MPLS network, they must agree on the service classes. There might be four classes of service for a single customer or there may be different customers in the same class but in different VPNs. Below-par performance must be attributed to either the provider s or the enterprise s network. Management systems that provide clear realtime and historical answers about root causes and key performance and quality indicators (KPIs and KQIs) are required to validate service delivery. Monitoring the User s Experience The end user s experience is fundamental. Failure to understand the user s issues represents missed opportunities. The role of CRM represents great leverage for a provider here. In addition, accurately pinpointing an end user s faults and problems has a more direct benefit to the provider. A provider needs to be concerned with an enterprise s QoS to retain its customers. - 5 -
End-to-End Monitoring of VPNs MPLS management also needs to address the issue of end-to-end monitoring in cases where the infrastructure is being provided by different ISPs. There is a requirement to cover network-tonetwork connections where multiple service providers bridge their MPLS networks to cover national or even international territories. The ability to provide SLA reporting across end-to-end multiple MPLS networks is a basic capability for a management solution. Pinpoint Asset Allocation Inefficiencies Asset and inventory reporting enable resource reallocation for maximum efficiency by identifying under-utilized assets, out-of-date device software, and other allocation imbalances. As network resources, devices as well as links, are very expensive, it is a necessity to utilize these components efficiently. The ability to manage infrastructure resource usage is an important task in managing and MPLS environment - 6 -
4. The Service Level Management Tool General Requirements The network management system must include capabilities to link a service to the infrastructure and to a customer, model the relationship and dependencies among the network, applications, host and business processes. It is important for an NMS to be able to provide the following KPI: End-to-end (or site-to-site) SLA reporting on latency issues that may affect realtime applications such as voice and ERP systems. Packet loss reporting, particularly valuable in a voice environment where retransmissions are needed to cover such occurrences. Monitor Chassis MIB for router operating environment parameters including temperature and utilization of CPU, buffer, and heap. Monitor MPLS MIB for label switched path (LSP) performance including volume, throughput, availability, primary path availability, and path transitions and changes. Monitor Destination Class Usage MIB for performance data such as throughput, volume, and utilization, all by destination class. Open Standards The IETF has released two standards for MPLS monitoring. There are the IETF drafts "Multiprotocol Label Switching (MPLS) Label Switch Router (LSR) Management Information Base" and "Multiprotocol Label Switching (MPLS) Traffic Engineering Management Information Base". IETF drafts are also known as RFCs. They provide an open framework for managing MPLS. The NMS should support open standards and not provide proprietary vendor-lock-in technologies. The RFCs describe management of the following objects: Traffic-engineered tunnels Tunnel resources Tunnel paths Tunnel performance counters Reports The NMS has to aggregate MPLS-specific and non MPLS-specific data and present it as a consolidated time baseline view of network performance. Both real-time and tend capabilities are mandatory. - 7 -
The NMS has to provide service providers with the capability to monitor and report on the performance metrics delivered by the devices. Examples include: Juniper Networks chassis Management Information Base (MIB), Destination Class Usage MIB, Firewall MIB, and MPLS MIB for use in capacity planning, guaranteeing service level agreements (SLAs), and accounting and billing by usage and service. - 8 -
5. Summary MPLS provides a robust and flexible solution for delivering multiple services across the core network and is seen as the solution for overall convergence of the metro and transport networks. As a growing number of organizations turn to MPLS networks for network convergence and a range of service levels, expert MPLS preparation and delivery of end-to-end quality of service can be a key differentiator. Service provider class MPLS network management is a complex and potentially expensive process. The need for an end-to-end view of network-managed objects is a key requirement for emerging services such as RFC 2547 based IP-VPNs. Service provider require a truly scalable network management solution offering quality of service, traffic engineering and the continued support of profitable legacy services. Choosing one management system to support such a complex migration from legacy networks can make the difference between success and failure. Infosim s StableNet for example, delivers the easiest to use available set of tools to help manage the migration and operation of MPLS networks. Using StableNet with MPLS networks for planning and management offers compelling advantages. It helps: determine what to purchase and where to invest identify and marks components that need attention assess performance and locates bottlenecks keep traffic moving smoothly - 9 -
Infosim StableNet provides real-time end-to-end visibility and accurate troubleshooting. Businesses benefit from the assurance that their networks, systems and applications are up and service levels are met. StableNet give organizations the security that their IT systems do support vital business processes and revenue generation. StableNet provides end-to-end support spanning applications, systems, triple-play, switches and routers, and enterprise management systems. Infosim is internationally recognized as a technology leader in the OSS market. Infosim develops solutions for optimization of business effectiveness and reduction of operational risks. Patented solutions increase business agility and create competitive advantages. Infosim is a privately owned corporation, headquartered in Germany. The regional headquarter Infosim Asia Pacific is located in Singapore. Infosim and StableNet are registered trademarks of Infosim GmbH & Co KG. All other trademarks and registered trademarks in this document are the properties of their respective owners. For more information contact: The Press Officer The Press Officer Infosim GmbH & Co KG Infosim Asia Pacific Pte Ltd Friedrich-Bergius-Ring 15 11 Collyer Quey 97076 Wuerzburg 17-03 The Archade Germany Singapore 049317 mail: info@infosim.net mail: info@asia.infosim.net fax: +49 931 20592 209 fax: +65 6327 4474-10 -