Fast Re-Route in IP/MPLS networks using s IP Operating System
Introduction: Today, Internet routers employ several routing protocols to exchange routes. As a router learns its potential routes, it builds a database of next-hops. For every possible destination prefix, the router uses this database to identify where to route a packet. This database is called the Forwarding Information Base (FIB) and is programmed in the line card. When a new link, router or shared risk link group (SRLG) is added, there is a time delay or up convergence, which takes place until the new connectivity information infiltrates the network. When a link, router or SRLG fails there is also a time delay, known as down convergence, which is present until information about the connectivity loss is disseminated throughout the network. Up convergence time delays are not as critical as the latency created during a down convergence. During a down convergence delay, packets in route towards the failed path are dropped. When a local link fails, a router notifies its neighbors, via Interior Gateway Protocols (IGP) and Border Gateway Protocols (BGP), re-computes new next-hops for all affected prefixes, and then installs those next-hops in the forwarding plane. Until the new next-hops are installed, traffic directed towards the affected prefixes is discarded. This process can take several seconds. The duration between a network failure event, and when all routers in that network update their forwarding hardware, is called convergence time. The process of routers updating next-hop information in their FIBs, to bypass a failure during network failure, is called Re-Routing. This paper provides an overview of how to achieve Fast Re-Route (FRR) functionality in a network by using s Internet Protocol Operating System (IP Operating System). You will gain an understanding of FRR, as well as how it enables large networks to recover from failure in sub second. Familiarity with Internet Protocols, Multi Protocol Label Switching (MPLS)/Label Distribution Protocol (LDP) and Virtual Private Networks (VPN) is essential to understanding the contents of this document. What is FRR? FRR is the ability of a router to support the following two functionalities: 1. Pre-calculating a backup route to destination prefixes in its next-hop database. This backup route is accessed via a backup next-hop and is activated when the primary route to a destination prefix goes down. When a router with a backup route to a prefix detects a connectivity failure to that prefix, to achieve convergence, it will exchange routing information for recalculation of a new next-hop. It then updates the FIB and the forwarding hardware. However, the advantage of pre-calculating a backup next-hop is that a router can successfully forward packets during convergence (i.e. before the new next-hop to the prefix has been calculated and the FIB has been updated) Prior to convergence, the network is in a transient state. As a result, different routers have different views of the network, which may cause them to calculate next-hops that are incompatible with the next-hop calculations of other routers in that network. This scenario creates the possibility of forwarding packets in a loop among two or more routers. Because the backup next-hops must ensure correct (but not necessarily optimal) forwarding in a nonconverged network, it is essential that any chosen backup next-hop prevents packets from being forwarded in a loop when other routers are using old next-hops, as well as when these routers start using new next-hops. Such a backup next-hop is called a loop free alternate (LFA) next-hop. To avoid forwarding loops during network convergence, all routers in the network must calculate LFA next-hops. 2. As soon as the failure of the primary route is detected, in the forwarding hardware, the router replaces the active next-hop to the failed destination prefix with a pre-calculated backup next-hop within tens of milliseconds. Routing protocols will download the pre-calculated backup next-hops, along with the primary next-hops in the forwarding hardware, so that the hardware is aware of the existence of backup next-hops. The hardware monitors adjacent links. When a link fails the primary next-hop to the unreachable prefixes is replaced with the backup nexthop for these prefixes. For non-adjacent links, the routing protocol in the control plane signals to the forwarding hardware that a destination prefix is down. At that time, the hardware replaces the primary next-hop with the backup next-hop. Since the update occurs in the hardware, it happens much faster than protocol convergence. A network built with routers that support FRR experiences less traffic loss and less micro-looping than non-frr networks. But achieving acceptable speed is a concern. Although a router s convergence time generally takes several seconds, new media applications that use voice and video are sensitive to traffic losses greater than tens of milliseconds. In order for next generation networks to support these applications, and provide 99999% reliability, these networks need to recover from failure in milliseconds. Position Paper: FRR in IP/MPLS Networks using 2
FRR in s IP Operating System: The FRR infrastructure in s IP Operating System is built using the following components: 1. Loop Free Alternate (LFA): Open Shortest Path First (OSPF), Intermediate System to Intermediate System (IS-IS), Label Distribution Protocol (LDP) and Resource Reservation Protocols (RSVP) support the calculation of LFA routes in s IP Operating System. In the topology shown in Figure 1, OSPF and IS-IS will calculate 1.1.1.1 as the primary prefix P2 next-hop and 2.1.1.2 as a backup next-hop to reach prefix P2 from P1. Similarly the LDP will use Label1 as the primary label and Label2 as a backup label to reach prefix P2 from P1. Both OSPF and IS-IS protocol implementations in the IP Operating System support per-prefix LFA as compared to per-link LFA. This ensures that networks using the IP Operating System have better node protection, better FRR topology coverage and better capacity utilization. 2.1.1.2/Label2 R2/LSR2 1 1 2.1.1.1 R1/LSR1 1.1.1.1 1.1.1.2/Label1 R4/LSR4 P2 P3 P4 1 2 R3/LSR3 Figure 1 LFA support for OSPF: In the IP Operating System, OSPF supports a per-prefix a LFA computation over a point-to-point OSPF interface that is configured for a LFA computation. If the point-to-point interface has a full neighbor adjacency, the LFA computation is performed for the entire area. By default, all point-to-point and broadcast interfaces can be used as LFA backups. LFA support for IS-IS: In the IP Operating System, IS-IS supports a per-prefix LFA computation over a point-to-point IS-IS interface configured for a LFA computation. IS-IS performs a L1 area/l2 domain LFA computation by executing a Shortest Path First (SPF) computation based on the standpoint of each eligible neighboring system. LFA support for LDP: In the IP Operating System, LDP functions in a liberal label retention mode to store next-hop labels corresponding to neighbors, which are not currently the primary neighbors, and in a downstream unsolicited mode to distribute the labels for the Forwarding Equivalence Class that are on other paths besides the Shortest Path Tree. The LDP LFA calculation follows the IGP protocols, via the Routing Information Base (RIB), for calculation of the backup next-hop. Position Paper: FRR in IP/MPLS Networks using 3
CLI/Configuration OSPF/ISIS/LDP/RSVP Routing Information Base will 1. Download primary next-hop calculated by the routing protocol Forwarding Information Base will store the primary and the back-up next-hop in the hardware The flow chart provides a logical view of a LFA implementation in the IP Operating System. IGPs will calculate primary and back-up next-hops. The RIB will download the primary and backup next-hops to the FIB, which in turn will program the next-hops in the hardware. 2. Fast local repair with protected Next Hops: Double Barrel Next-Hop (DBNH): The IGP/BGP/LDP protocols, that feed the prefix arrival information, provide a backup next-hop for the primary nexthop to the RIB, when executing a route add or update. The RIB will use the primary and backup information to create a double-barrel next-hop (DBNH). If there is an Equal Cost Multi Path (ECMP) of primary paths, then each primary path may optionally have a backup path. When a DBNH is downloaded to the line card, it will appear as a next-hop with two next-hop IDs - primary and backup. The line card holds on to this DBNH and downloads only the active part of the next-hop to the network processor for packet forwarding. When a line card receives a trigger, it will overwrite the active primary next-hop in the network processor, with the backup from the DBNH. The double barrel is also applicable to the Link State Protocol (LSP) paths in order for the LDP to compute and install an LFA backup path/label for a primary path/label. For the topology shown in Figure 1 a DBNH for an IGP protocol will look as follows: Prefix P1 uses a double barrel that contains a primary next-hop of 1.1.1.2 and backup next-hop of 2.1.1.2 Double Barrel Next-Hop Primary Next-Hop Backup Next-Hop is 2.1.1.2 Primary Next-Hop 1.1.1.2 is programmed in the Network Processor Position Paper: FRR in IP/MPLS Networks using 4
Similarly, DBNH for LDP will look as follows: Prefix P1 uses a double barrel that contains a primary next-hop of Label1 and backup next-hop of Label2 to get to P2 Double Barrel Next-Hop Primary Next-Hop Backup Next-Hop is Label 2 Primary next-hop Label1 is programmed in the Network Processor Double Label Next-Hop (DLNH): In LSP-based networks, egress Provider Edge (PE) routers need to be protected at the ingress PE. A labeled next-hop is a label acting as a next-hop to a Label Switching Router (LSR). A single pair of primary and backup labels (corresponding to primary and backup egress PE routers) can act as next-hop for thousands of external prefixes. In such a situation, a connectivity failure to the primary label requires a backup label update for thousands of prefixes in the line card, which can take up to several seconds. In the IP Operating System, this potential delay is avoided by the use of DLNHs. A DLNH is a next-hop that consists of two labels. Each of the two labels points to a DBNH. When the primary nexthop of the DBNH is active, the first label is used. When the backup is active the second label is used. In case of a failure, the advantage of a DLNH is that only a single mapping between the DLNH and DBNH needs to be updated. Advantages of DBNH and DLNH: 1. Since a DBNH and DLNH are programmed in the hardware, the switch from active next-hop to the back-up nexthops is achieved in about 50 milliseconds. 2. When link fails on an edge router that can reach several thousand prefixes using one next-hop, the router can reroute traffic to these prefixes with a single update to the next-hop database in the line card. 3. DBNHs and DLNHs are agnostic of the technology that notifies the link failure such as: Bidirectional Forwarding Detection, Fast Failure Notification, etc., and can be used with any underlying link failure detection mechanism. Consider a network, as shown in Figure 2, where PE routers LSR3 and LSR4 provide connectivity to a few thousand prefixes. LSR3 and LabelP14 allocate LabelP13 and LabelP23, and LabelP24 is allocated by LSR4. LabelP32 and LabelP42 are allocated by a customer edge LSR that is directly connected to these prefixes. LabelP32 and LabelP42 are allocated by another customer edge LSR that is directly connected to these prefixes. Label13 LabelP32 LSR1 Label14 LSR3 LabelP35 P2...P5000 P1 R1 Label23 LabelP42 LSR2 Label24 LSR4 LabelP45 P2...P5000 Figure 2 Position Paper: FRR in IP/MPLS Networks using 5
The DLNH in the FIB on LSR1 will look like: Double Label Next-Hop P2...P5000 P5001...P10000 LabelP32 LabelP42 Double Label Next-Hop Label13 Label14 Label14 Label13 LabelP45 LabelP35 P2 P5000 prefixes are reachable through a DLNH containing labels LabelP32 and LabelP42. The DLNH in turn points to a DBNH of Label13 (from LSR3) and Label14 (from LSR4). When LSR3 is active, LabelP32 of DLNH is used to forward traffic. When the line card on LSR1 detects that connectivity to LSR3 is down, it switches the second label Label14 of the DBNH to active in the network processor, and from then on LabelP42 of DLNH is used to forward traffic. A similar explanation applies to prefixes P5001...P10000 that use LSR4 as the primary next-hop. As a result, only a single update is required to update the next-hop for 5000 prefixes and, since this update is takes place in the hardware, it is executed in less than 50 milliseconds. 3. Triggers for Fast Re-Route: The IP Operating System has the ability to detect and propagate adjacent and remote link failures using proprietary Fast Failover Notification (FFN) and Event Tracking Infrastructure (ETI). FFN is a resilient infrastructure that detects link or router failures for critical services in milliseconds. FFN consists of a trigger mechanism, and an infrastructure to propagate the triggered event to all line cards, as well as several protocols on the control plane that depend on the FFN events to make re-routing decisions. The trigger could be events related to link or line card failure, Bidirectional Forwarding Detection (BFD) failure or keep alive failure. On the line card, a FNN event indicates that the primary next-hop is not reachable and triggers the line card to switch to the backup next-hop. In addition to FFN, ETI is used to trigger a switch from a primary to a backup next-hop. The IGP/BGP/LDP protocols store an ETI object ID in a DBNH. When the protocol detects that the next-hop is not reachable, the protocol signals the ETI infrastructure on the line card and the line card switches to the backup next-hop immediately. The IP Operating System provides the ability for multiple clients to use DBNHs, with the same primary or a backup pair by supporting different DBNHs for each client. An example of two clients needing the same double barrel, but different triggers, is the L2VPN and L3VPN that use the same PE routers as primary/backup. L2VPN routes are added by LDP, which may want control-plane driven triggering. L3VPN routes are added by BGP, which may want multi-hop BFD as a trigger. 4. Remote Loop Free Alternate (LFA): A LFA cannot provide protection in ring topologies as the one shown in the figure. LDP tunnel from R6 to R3 provide access from R6 to P1 P1 R3 R1 R6-R2-R1 is the primary path from R6 to P1 R2 X R4 R6 R5 Figure 3 Position Paper: FRR in IP/MPLS Networks using 6
In the topology shown in Figure 3, when all links are active, R6 sends traffic to P1 via R2 and R1. If the link between R6 and R2 breaks, R6 is not able to send traffic to P1 via R5, since R5 also uses R6 to send traffic to P1. The connectivity from R6 to R1 is restored in s IP Operating System by using remote a LFA. With remote a LFA, R6 is able to dynamically identify R3 as its remote LFA node and sets up a directed Label Distribution Protocol (LDP) session with R3. Then R6 uses this LDP session with R3 to send traffic for P1 and thus connectivity to P1 from R6 is restored. The LDP session between the two routers, as well as the related label processing, does not require any prior path provisioning. 5. IP FRR on s Smart Services Router Smart Services Router (SSR) developed using the IP Operating System is s flag ship routing and services platform. The SSR leverages the IP FRR functionality in the IP Operating System with the help of high density line cards that support the DBNH and DLNH along with ETI and FFN, and is well positioned to help operators achieve network convergence in sub second. 6. Use Cases for FRR in IP/MPLS networks: Static IPFRR with BFD: 2.1.1.2/32 R2 4.1.1.1/32 2.1.1.1/32 1.1.1.0/32 R1 3.1.1.1/32 3.1.1.2/32 R3 4.1.1.2/32 Figure 4 In the topology shown in Figure 4, the next-hop static routes are protected from failure using a DBNH. The BFD can be used to monitor the failure of a primary next-hop, and when failure is detected, the line card switches to existing a backup next-hop in a few milliseconds. R1 has a static route to 4.1.1.0/32 via primary next-hop 2.1.1.2 and backup next-hop 3.1.1.2. The FIB has a DBNH (with 2.1.1.2 and 3.1.1.2) for 4.1.1.0/32. Single session BFD is used to monitor connectivity to 2.1.1.2. When connectivity loss to 2.1.1.2 is detected, line card switches the backup next-hop 3.1.1.2 to active. Position Paper: FRR in IP/MPLS Networks using 7
IGP FRR with BFD: 2.1.1.2/32 R2 4.1.1.1/32 2.1.1.1/32 1.1.1.0/32 R1 3.1.1.1/32 3.1.1.2/32 R3 4.1.1.2/32 Figure 5 In the topology shown in Figure 5, a next-hop of IGP protocols such as the OSPF and IS-IS are protected using a DBNH calculated by LFA. R1 has an IGP route to 4.1.1.0/32 via primary next-hop 2.1.1.2 and a backup next-hop 3.1.1.2, computed through the LFA. The FIB has a DBNH (with 2.1.1.2 and 3.1.1.2) for 4.1.1.0/32. Single session BFD is used to monitor connectivity to 2.1.1.2. When connectivity loss to 2.1.1.2 is detected, the line card switches the backup next-hop 3.1.1.2 to active. IPFRR in redundancy solutions with MC-LAG/VRRP: In the IP Operating System, FRR is used in redundancy solutions based on Multi Chassis (MC) LAG/Virtual Router Redundancy Protocol (VRRP) to re-route traffic through a backup Inter Chassis Redundancy (ICR) link when the active link goes down. H1 Redundancy with MC LAG/VRRP R1 L2 Network L2 L3 Network R2 H2 Figure 6 Position Paper: FRR in IP/MPLS Networks using 8
In the topology shown in Figure 6, router R1 has an active link to the L2 network and is the primary path for traffic from the L3 network to the L2 network. The link from R2 to the L2 network serves as the backup link to connect the L3 network to the L2 network. The link from the R2 to L3 network has a higher metric then the link from R1 to the L3 network and, as a result, the traffic from the L3 network flows to R1. R2 is also connected to R1 through an ICR link, which serves as a backup for traffic from the L3 network to the L2 network. Therefore, R1 has a DBNH via the ICR link to R2. When the primary link on R1 to the L2 network goes down, not only does the link on R2 to the L2 network become active, the line card on R1 activates the backup next-hop to R2 and traffic from the L3 network is rerouted to the L2 network through R2 in less than 50 milliseconds. As a result, FRR allows a fast recovery from link failure. BGP FRR using BGP Best External: The IP Operating System-based MPLS VPN networks are able to converge in milliseconds using the Best External feature in BGP. Primary PE router PER3 has route to PER1 and PER2 through BGP Best External PER1 CER2 PER3 MPLS VPN Network CER1 Customer Network PER2 Primary PE router Figure 7 In the MPLS VPN network shown above, both PE routers PER1 and PER2 are connected to the customer edge router CER1. The network is setup so that PE1 has a higher local preference. The traffic on the customer edge network exits the MPLS VPN network through PER1 to reach CER1. Since both PER1 and PER2 are connected to CER1, when the BGP Best External is enabled on PER1 and PER2, the BGP protocol calculates the backup best external path to CER1 through PER2 and advertises this path to PER3. As a result, the FIB on PER3 has a DBNH with PER1 as an active nexthop and PER2 as a backup next-hop. When PER3 detects a loss of connectivity to PER1, the line card immediately switches the backup next-hop of PER2 to active and traffic to CER1 is rerouted via PER2 in less than 50 milliseconds. Fast Re-Route in IP/MPLS networks using s IP Operating System BGP FRR using BGP Path Diversity: The IP Operating System-based BGP networks are able to converge in milliseconds using the Path Diversity feature on BGP Route Reflectors. PE RR Primary PE router PER1 PER4 Route Reflector has diverse session with peer PER4 CER1 Customer Network Figure 8 PER2 Primary PE router Position Paper: FRR in IP/MPLS Networks using 9
In the BGP network shown above, the Route Reflector learns the best path to CER1 through PER1, as well as the backup path through PER2 due to the BGP Best External. However, the route reflector only mirrors the best path through PER1 to its client PER4. When BGP Path Diversity is enabled, the Route Reflector has another diverse session with its client PER4 and in turn also displays the backup path through PER2 to PER4. Therefore, PER4 knows the best path and backup path to reach CER1. The FIB on PER4 has a DBNH for CER1 with PER1 as an active next-hop, and PER2 as backup next-hop. When PER4 detects a loss of connectivity to PER2, the line card immediately switches the backup next-hop of PER2 to active and traffic to CER1 is re-routed via PER2 in < 50 milliseconds. 6. IPFRR LFA Analyzer: also provides a network optimization tool called IPFRR LFA Analyzer for improving the IPFRR coverage with LFA. This tool, which costs much less than existing commercial tools, can help to determine if the network coverage can be improved by adding a new link with a higher cost or modify existing link costs to maximize the number of protected failure scenarios. A user friendly GUI allows operators to create their networks and assign costs to the links. Using this tool, operators can improve the performance of their existing networks by 10-40% by adding a few new links and achieve close to perfect LFA coverage via cost optimization. Conclusions: By developing an infrastructure based on LFAs and protected next-hops, the IP Operating System is able to detect and re-route traffic in less than 50 milliseconds from the event of failure. Independent of the number of prefixes that are reachable from an edge router, the DBNH and DLNH provide the ability to re-route traffic in sub second. Routers running s IP Operating System are able to take advantage of cutting edge FRR infrastructure to detect network failure and re-route traffic in milliseconds. These routers are able to minimize traffic loss during failure and recover from it without causing noticeable service degradation. As a result, s IP Operating System enables networks that are more scalable and have a lower downtime. About : is a world-leading provider of communications technology and services. We are enabling a networked society with efficient real-time solutions that will allow 60 billion people to study, work and live more freely in sustainable societies around the world. Our offering comprises services, software and infrastructure within Information and communications technology for telecom operators and other industries. Today 40 percent of the world s mobile traffic goes through networks and we support customer networks servicing more than 2.5 billion subscriptions. Please visit us at http://www.ericsson.com/ Position Paper: FRR in IP/MPLS Networks using 10
appendix Acronym BGP: Border Gateway Protocol BFD: Bidirectional Forwarding Detection CE: Customer Edge DBNH: Double Barrel Next-Hop DLNH: Double Label Next-Hop ECMP: Equal Cost Multi Path ETI: Event Tracking Infrastructure FIB: Forwarding Information Base FRR: Fast ReRoute FFN: Fast Failover Notification IP OPERATING SYSTEM: Internet Protocol Operating System IGP: Interior Gateway Protocol ICR: Inter Chassis Redundancy LDP: Label Distribution Protocol LFA: Loop Free Alternate IS-IS: Intermediate System to Intermediate System LSP: Link State Protocol LSR: Label Switching Router MPLS: Multi Protocol Label Switching MC: Multi Chassis OSPF: Open Shortest Path First PE: Provider Edge RIB: Routing Information Base RSVP: Resource Reservation Protocols SRLG: Shared Risk Link Group VRRP: Virtual Router Redundancy Protocol VPN: Virtual Private Network Inc. 200 Holger Way San Jose, CA 95134 Phone: +1 408 970 2000 www.ericsson.com/us (EUS) Jan 2014 01/287 01-FGB 101 0192 Uen Rev A Specifications subject to change without notice. Position Paper: FRR in IP/MPLS Networks using 11