Deploying BGP Fast Convergence / BGP PIC



Similar documents
BGP Convergence in much less than a second Clarence Filsfils - cf@cisco.com

Introduction Inter-AS L3VPN

IPv6 over MPLS VPN. Contents. Prerequisites. Document ID: Requirements

BGP Best Path Selection Algorithm

APNIC elearning: BGP Attributes

Lab 4.2 Challenge Lab: Implementing MPLS VPNs

MPLS VPN Route Target Rewrite

Fast Re-Route in IP/MPLS networks using Ericsson s IP Operating System

BGP Attributes and Path Selection

BGP Link Bandwidth. Finding Feature Information. Prerequisites for BGP Link Bandwidth

Implementing MPLS VPNs over IP Tunnels on Cisco IOS XR Software

Understanding Virtual Router and Virtual Systems

Introducing Basic MPLS Concepts

How To Import Ipv4 From Global To Global On Cisco Vrf.Net (Vf) On A Vf-Net (Virtual Private Network) On Ipv2 (Vfs) On An Ipv3 (Vv

IMPLEMENTING CISCO MPLS V3.0 (MPLS)

MPLS Inter-AS VPNs. Configuration on Cisco Devices

MPLS-based Layer 3 VPNs

BGP DDoS Mitigation. Gunter Van de Velde. Sr Technical Leader NOSTG, Cisco Systems. May Cisco and/or its affiliates. All rights reserved.

How To Fix Bg Convergence On A Network With A Bg-Pic On A Bgi On A Pipo On A 2G Network

APNIC elearning: BGP Basics. Contact: erou03_v1.0

MPLS Implementation MPLS VPN

- Multiprotocol Label Switching -

BGP Link Bandwidth. Finding Feature Information. Contents

How Routers Forward Packets

MPLS-based Virtual Private Network (MPLS VPN) The VPN usually belongs to one company and has several sites interconnected across the common service

Table of Contents. Cisco Configuring a Basic MPLS VPN

Implementing MPLS VPN in Provider's IP Backbone Luyuan Fang AT&T

Notice the router names, as these are often used in MPLS terminology. The Customer Edge router a router that directly connects to a customer network.

Implementing Cisco Service Provider Next-Generation Edge Network Services **Part of the CCNP Service Provider track**

Presentation_ID. 2001, Cisco Systems, Inc. All rights reserved.

BGP Support for Next-Hop Address Tracking

Why Is MPLS VPN Security Important?

IMPLEMENTING CISCO MPLS V2.3 (MPLS)

PRASAD ATHUKURI Sreekavitha engineering info technology,kammam

MPLS VPN over mgre. Finding Feature Information. Prerequisites for MPLS VPN over mgre

Building A Cheaper Peering Router. (Actually it s more about buying a cheaper router and applying some routing tricks)

Implementing MPLS VPNs over IP Tunnels

WAN Topologies MPLS. 2006, Cisco Systems, Inc. All rights reserved. Presentation_ID.scr Cisco Systems, Inc. All rights reserved.

Implementing Cisco MPLS

Multiprotocol Label Switching Load Balancing

l.cittadini, m.cola, g.di battista

How To Make A Network Secure

Cisco Configuring Basic MPLS Using OSPF

Transitioning to BGP. ISP Workshops. Last updated 24 April 2013

MPLS VPN Implementation

AMPLS - Advanced Implementing and Troubleshooting MPLS VPN Networks v4.0

Tackling the Challenges of MPLS VPN Testing. Todd Law Product Manager Advanced Networks Division

Frame Mode MPLS Implementation

Configuring a Load-Balancing Scheme

BFD. (Bidirectional Forwarding Detection) Does it work and is it worth it? Tom Scholl, AT&T Labs NANOG 45

Routing Issues in deploying MPLS VPNs. Mukhtiar Shaikh Moiz Moizuddin

BGP Multipath Load Sharing for Both ebgp and ibgp in an MPLS-VPN

Example: Advertised Distance (AD) Example: Feasible Distance (FD) Example: Successor and Feasible Successor Example: Successor and Feasible Successor

ENSURING RAPID RESTORATION IN JUNOS OS-BASED NETWORKS

How To Understand Bg

MPLS. Cisco MPLS. Cisco Router Challenge 227. MPLS Introduction. The most up-to-date version of this test is at:

IMPLEMENTING CISCO IP ROUTING V2.0 (ROUTE)

Expert Reference Series of White Papers. Cisco Service Provider Next Generation Networks

Bell Aliant. Business Internet Border Gateway Protocol Policy and Features Guidelines

Expert Reference Series of White Papers. Cisco Service Provider Next Generation Networks

Using OSPF in an MPLS VPN Environment

Advances in BGP BRKRST Gunter Van de Velde Sr. Technical Leader

RFC 2547bis: BGP/MPLS VPN Fundamentals

For internal circulation of BSNLonly

BGP: Frequently Asked Questions

Introduction to MPLS-based VPNs

This feature was introduced. This feature was integrated in Cisco IOS Release 12.2(11)T.

Deploying MPLS-based IP VPNs Rajiv Asati Distinguished Engineer BRKMPL-2102

BGP FORGOTTEN BUT USEFUL FEATURES. Piotr Wojciechowski (CCIE #25543)

Configuring a Basic MPLS VPN

Introduction to HA Technologies: SSO/NSF with GR and/or NSR. Ken Weissner / kweissne@cisco.com Systems and Technology Architecture, Cisco Systems

MPLS VPN. Agenda. MP-BGP VPN Overview MPLS VPN Architecture MPLS VPN Basic VPNs MPLS VPN Complex VPNs MPLS VPN Configuration (Cisco) L86 - MPLS VPN

Exterior Gateway Protocols (BGP)

UPDATE = [Withdrawn prefixes (Optional)] + [Path Attributes] + [NLRIs].

DD2491 p MPLS/BGP VPNs. Olof Hagsand KTH CSC

Understanding Route Redistribution & Filtering

Using the Border Gateway Protocol for Interdomain Routing

Cisco Exam CCIE Service Provider Written Exam Version: 7.0 [ Total Questions: 107 ]

To Add Paths or not to Add Paths

Load balancing and traffic control in BGP

Kingston University London

IP/MPLS-Based VPNs Layer-3 vs. Layer-2

Fundamentals Multiprotocol Label Switching MPLS III

MPLS/VPN Overview Cisco Systems, Inc. All rights reserved. 1

OVERLAYING VIRTUALIZED LAYER 2 NETWORKS OVER LAYER 3 NETWORKS

MPLS Concepts. Overview. Objectives

Border Gateway Protocol (BGP)

MPLS WAN Explorer. Enterprise Network Management Visibility through the MPLS VPN Cloud

IPv6 over IPv4/MPLS Networks: The 6PE approach

BGP Terminology, Concepts, and Operation. Chapter , Cisco Systems, Inc. All rights reserved. Cisco Public

Advanced BGP Policy. Advanced Topics

Enterprise Network Simulation Using MPLS- BGP

APNIC elearning: Introduction to MPLS

BGP as an IGP for Carrier/Enterprise Networks

SEC , Cisco Systems, Inc. All rights reserved.

Inter-Autonomous Systems for MPLS VPNs

Multi Protocol Label Switching (MPLS) is a core networking technology that

Advances in BGP BRKRST Oliver Boehmer AS Solutions Architect

HP Networking BGP and MPLS technology training

Transcription:

Deploying BGP Fast Convergence / BGP PIC Oliver Böhmer V2.1

Housekeeping We value your feedback- don't forget to complete your online session evaluations after each session & the Overall Conference Evaluation which will be available online from Thursday Visit the World of Solutions and Meet the Engineer Visit the Cisco Store to purchase your recommended readings Please switch off your mobile phones After the event don t forget to visit Cisco Live Virtual: www.ciscolivevirtual.com 4

Agenda Introduction / Motivation BGP Data Plane Convergence how does it work? Designing for BGP PIC Configuring and Monitoring BGP PIC Advanced BGP PIC Topics Dealing with Summaries/Default Routes BGP Label Allocation and Effect on BGP PIC MPLS-VPN InterAS and BGP PIC 5

Loss of Connectivity LoC Link and/or Node Failures cause packet loss until routing has converged Loss can be caused by black-holes and/or micro-loops, until all relevant nodes in the path have converged Time between failure and restoration is called LoC How much LoC is acceptable? Minutes? Seconds? Milliseconds? How often is LoC acceptable? 6

LoC Impact on sample applications Signalling VoIP call Routing Protocols L1/L2 Transport Convergence Video Tunnels go down TCP Sessio n dies 50 msec 500 msec 1 sec 5 secs 30 secs 1 Minute 7

Routing Convergence Building Blocks Detection (link or node aliveness, routing updates received) Update topology database Compute optimal path Download to HW FIB State propagation (routing updates sent) Switch to newer path 9

traffic loss (msec) Routing Fast Convergence IS-IS Test Results 700 600 500 400 300 200 100 0 c12000, c12k-prp1-eng4p-pre28s-isis-ipip-cpuload-1mpps- IOS pr60-ipcs50-bgp160-ip2ip-remote-nolb Agilent measurements 0 500 1000 1500 2000 2500 3000 3500 prefix nr Sub-second IGP (OSPF/ISIS) convergence can be conservatively achieved 0% 50% 100% average 90% 10

IGP vs. BGP Convergence IGP (OSPF/ISIS) deals with hundreds routes Max a few thousands, but only a few hundreds are really important/relevant BGP is designed to carry millions of routes and a few large customers carry that amount of prefixes! So? Isn t that a routing problem in the global Internet, where we don t really have SLAs for? Well, no 11

BGP Fast Convergence Two Sample Use Cases Larg(er) Layer 3/MPLS VPN Deployments offering critical Enterprise voice or video services Internet-Facing Datacenter/ Hosting Facility using Full Internet Routing table > Couple thousand routes Service Provider A Service Provider B MPLS Network n * 350000 (!!) routes received 12

BGP Control-Plane Convergence Components I: Core failure 1. Core Link or node goes down 2. IGP notices failure, computes new paths to PE1/ 3. IGP notifies BGP that a path to a next-hop has changed 4. PE3 identifies affected paths, runs best path, path to no longer as good as the one to PE1 5. Updates RIB/FIB, traffic continues 3/8, NH 1.1.1.1 NH <> PE1 1.1.1.1 3.0.0.0/8 3/8, NH <PE1> NH <> NH <> PE3 1.1.1.5 3/8, NH 1.1.1.5 NH <PE1> 13

BGP Control-Plane Convergence Components II: Edge Node Failure Edge Node (PE1) goes down 2. IGP notices failure, update RIB, remove path to PE1 3. IGP notifies BGP that path to PE1 is now longer valid 4. PE3 identifies affected paths, runs best path, removes paths via PE1 5. Updates RIB/FIB, traffic continues 3/8, NH 1.1.1.1 NH <> PE1 1.1.1.1 3.0.0.0/8 3/8, NH <PE1> NH <> PE3 1.1.1.5 3/8, NH 1.1.1.5 NH <PE1> 14

BGP Control-Plane Convergence Components III: Edge Neighbour Failure (with next-hop-self) 1. Edge link on PE1 goes down 2. ebgp session goes down 3. PE1 identifies affected paths, runs best path 4. PE1 sends withdraws to other PEs 5. & 3 run best path, update RIB/FIB, traffic continues Withdraw: 3/8, NH <PE1> 3/8, NH 1.1.1.1 NH <> PE1 1.1.1.1 3.0.0.0/8 3/8, NH <PE1> NH <> PE3 1.1.1.5 3/8, NH 1.1.1.5 NH <PE1> 15

BGP Control-Plane Convergence Summary Two out of the three scenarios depend on IGP convergence, including IGP failure detection Can react to this in much less than a second Fast ReRoute techniques can even recover core failures in sub-50 milliseconds BGP is notified of the change, and needs to identify affected paths by scanning the BGP table(s) Scanning implementation can be improved by only scanning affected table(s) or parts of tables (ex: scoped walks in IOS-XR), but scanning effort will still grow with table size Hence: We need to find another approach to achieve predictable fast BGP convergence!!! 16

Control vs. Data Plane Convergence Control Plane Convergence For the topology after the failure, the optimal path is known and installed in the dataplane May be extremely long (table size) Data Plane Convergence Once IGP convergence has detected the failure, the packets are rerouted onto a valid path to the BGP destination While valid, this path may not be the most optimum one from a control plane convergence viewpoint We want this behaviour, in a prefix-independent way that s what this session is all about 17

BGP Data Plane Convergence how does it work

t, Loss Of Connectivity Prefix Independent Convergence (PIC) no PIC PIC n 2n 3n 4n 5n 6n PIC is the ability to restore forwarding without resorting to per prefix operations. Loss Of Connectivity does not increase as my network grows (one problem less to worry about) 19

Prefixes, path lists and paths BGP Entry 110.0.0.0/24 BGP Entry 110.1.0.0/24 BGP Entry 110.2.0.0/24 IGB Entry 10.0.0.3/32 GigE1/0 GigE0/0 Advertised with next hop 10.0.0.3/32 PE1 (10.0.0.3) Path List Path 1 Via 10.0.0.3 Path List Path 1 GigE0/0 Group of prefixes 110.0.0.0/24 110.1.0.0/24 110.2.0.0/24... 20

Non Optimal: Flat FIB BGP Net 110.0.0.0/24 BGP Net 110.1.0.0/24 OIF OIF BGP Net 110.5.0.0/24 IGP Net 10.0.0.3/32 OIF OIF Each BGP FIB entry has its own local Outgoing Interface (OIF) information (ex: Gig0/0) Forwarding Plane must directly recurse on local OIF Information Original Cisco implementation and still in use both by us and competition FIB changes can take long, dependent on number of prefixes affected 21

Right Architecture: Hierarchical FIB P1 PE1 PE3 BGP Net 110.0.0.0/24 BGP Net 110.1.0.0/24 BGP pathlist IGP pathlist PE1 via P1 PE1 via P2 P2 Gig1, dmac=x PE1 IGP pathlist Gig2, dmac=y BGP Net 110.5.0.0/24 BGP nexthop(s) via P2 IGP nexthop(s) Output Interface Pointer Indirection between BGP and IGP entries allow for immediate update of the multipath BGP pathlist at IGP convergence Only the parts of FIB actually affected by a change needs to be touched Used in newer IOS and IOS-XR (all platforms), enables Prefix Independent Convergence 22

BGP PIC Core P1 PE1 PIC-acting router PE3 P2 Core Failure: a failure within the analyzed AS IGP convergence on PE3 leads to a modification of the RIB path to PE1 BGP Dataplane Convergence is finished assuming the new path to the BGP Next Hop is leveraged immediately (BGP PIC Core) BGP NHT sends a modify notification to BGP which may trigger BGP Control-Plane Convergence. This may be long without impacting Tight-SLA experience 23

LoC (ms) Characterization P1 PE1 BGP PIC Core PE3 BGP Net 110.0.0.0/24 BGP Net 110.1.0.0/24 BGP pathlist IGP pathlist PE1 via P1 PE1 via P2 Gig1, dmac=x P2 BGP Net 110.5.0.0/24 PE1 BGP nexthop(s) IGP pathlist via P2 IGP nexthop(s) Gig2, dmac=y Output Interface 100000 Core As soon as IGP converges, the IGP pathlist memory is updated, and hence all children BGP PL s leverage the new path immediately 10000 1000 100 10 PIC no PIC Optimum convergence, Optimum Load- Balancing, Excellent Robustness 1 25000 50000 75000 100000 125000 150000 175000 200000 225000 250000 275000 300000 325000 350000 1 Prefix 24

BGP PIC Edge PE1 does not set next-hop-self P1 PE1 P1 PE1 PE3 PE3 P2 P2 Edge Failure: a failure at the edge of the analyzed AS IGP convergence on PE3 leads to a deletion of the RIB path to BGP Next-Hop PE1 BGP Dataplane Convergence is kicked in on PE3 (BGP PIC Edge) and immediately redirects the packets via BGP NHT sends a delete notification to BGP which triggers BGP Control-Plane Convergence. This may be long, but without impacting Tight-SLA experience 25

0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 Characterization P1 PE1 BGP PIC Edge PE3 BGP Net 110.0.0.0/24 BGP Net 110.1.0.0/24 BGP pathlist IGP pathlist PE1 via P1 PE1 via P2 Gig1, dmac=x P2 PE1 IGP pathlist Gig2, dmac=y BGP Net 110.5.0.0/24 BGP nexthop(s) via P2 IGP nexthop(s) Output Interface 1000000 msec At IGP Convergence time, the complete IGP pathlist to PE1 is deleted SW FIB walks the linked list of parent BGP path lists and in-place modify them to use alternate next hops (ECMP or backup). BGP Path lists are shared, so this is efficient 100000 10000 1000 100 10 1 250k PIC 250k no PIC 500k PIC 500k no PIC Prefix 26

BGP PIC Edge and Next-Hop Self PE1 does set next-hop-self P1 PE1 PE3 P2 With the edge device setting next-hop to its loopback (next-hop-self), edge link going down does not change next-hop as seen on other routers Failure goes unnoticed by others! However: Next-hop on edge device with failing link goes away, so this device can react in PIC-Edge fashion Traffic re-routed via core to alternate edge 27

Key Take-Aways PIC Core and PIC-Edge leverage hierarchical forwarding structure PIC Core: Path towards Next-Hop changes IGP LoadInfo changed PIC Edge: Next-Hop goes away BGP Path list changed All BGP prefixes (no matter how many!!) converge as quickly as their next-hop Generally, IGP is responsible for next-hop convergence BGP convergence depends on IGP convergence So? What do I need to do to speed up my BGP convergence with BGP PIC??? 29

Designing for BGP PIC

Designing (for) BGP PIC BGP PIC is a forwarding / data plane layer feature, so what s there to design??? Well, there is a bit: BGP data plane convergence depends on how quickly the next-hop(s) converges (or is deleted), which boils down to Fast failure detection Fast IGP convergence BGP Net 110.0.0.0/24 BGP Net 110.1.0.0/24 BGP Net 110.5.0.0/24 BGP pathlist PE1 BGP nexthop(s) IGP pathlist PE1 via X4 PE1 via P2 IGP pathlist via P2 IGP nexthop(s) Ge1, dmac=x Ge2, dmac=y Output Interface For PIC Edge, we need some form of tunnelling/encapsulation between edge devices For BGP PIC-Edge, we need to have an alternative/backup next-hop BGP Net 110.0.0.0/24 BGP Net 110.1.0.0/24 PE1 IGP pathlist PE1 via X4 PE1 via P2 IGP pathlist Ge1, dmac=x Ge2, dmac=y BGP Net 110.5.0.0/24 BGP nexthop(s) via P2 IGP nexthop(s) Output Interface 31

BGP Net 110.0.0.0/24 BGP PIC Design I: Fast IGP Convergence BGP Net 110.1.0.0/24 BGP pathlist PE1 IGP pathlist PE1 via X4 PE1 via P2 IGP pathlist Ge1, dmac=x Ge2, dmac=y BGP Net 110.5.0.0/24 BGP nexthop(s) via P2 IGP nexthop(s) Output Interface IGP Convergence is outside the scope of this session, but in principle 1. Provide for fast failure detection (BFD, carrier-delay) Avoid tuning down BGP hello/hold timers for failure detection 2. For IOS, tune OSPF/ISIS LSA- and SPF-throttle timers Reduces convergence from ~5 down to <1 sec NX-OS and IOS-XR already tuned reasonably fast 3. Prioritize next-hop prefixes (i.e. PE loopback addresses) to make sure they converge before less important prefixes (i.e. link addresses or the like) 4. Keep the IGP slim and clean, use point-to-point adjacencies, carry customer routes in BGP 5. Evaluate Fast ReRoute techniques (like LFA-FRR) to further improve convergence Please see past Fast Convergence sessions/techtorials for details, visit us in the Design Clinics or meet using Meet the Engineer 32

BGP PIC Design II: PE PE Encapsulation Some BGP-PIC Edge convergence scenarios lead to edge devices forwarding packets on to alternate edges, back via the core Core routers might be unaware of the failure (yet) and send packets back to the previous edge device, causing a loop Loop! Solution: Ensure there is no intermediate IP destination lookup, via means of encapsulation: Direct adjacency between edges (physical link or GRE tunnel) Using MPLS LSPs/Tunnels in the core 33

BGP Net 110.0.0.0/24 BGP PIC Design III: More than one path BGP Net 110.1.0.0/24 PE1 IGP pathlist PE1 via X4 PE1 via P2 IGP pathlist Ge1, dmac=x Ge2, dmac=y BGP Net 110.5.0.0/24 BGP nexthop(s) via P2 IGP nexthop(s) Output Interface When a PE/next-hop goes away, the re-routing node needs already a backup/alternate path in its FIB This sounds rather obvious, but can be non-trivial in some scenarios: Scenario 1: Route Reflectors PE3 PE1 1/8 via PE1 RR 1/8 1/8 via Scenario 2: Active/Standby Routing Policies let s start with this one first... PE4 34

Scenario 2: BGP Active/Standby The Problem (1/3) Initial State: CE just comes up Update: NetA, via PE1, LP: 200 NetA, via e0 LP: 200 i NetA, via PE1 LP: 200 via LP: 100 Update: NetA, via LP: 100 PE1 (active) e0 s0 CE (standby) NetA, via s0 LP: 100 35

BGP Active/Standby The Problem (2/3) Initial State: CE just comes up Update: NetA, via PE1, LP: 200 NetA, via e0 LP: 200 via LP: 100 PE1 (active) e0 CE s0 i NetA, via PE1 LP: 200 via LP: 100 Update: NetA, via LP: 100 (standby) NetA, via s0 LP: 100 via PE1 LP: 200 36

BGP Active/Standby The Problem (3/3) withdraws its ebgp path as it is no longer best But now all other PEs are left with a single path no alternate path for PIC-Edge to converge to!!! NetA, via e0 LP: 200 PE1 (active) i NetA, via PE1 LP: 200 Withdrawal: NetA, via LP: 100 e0 s0 CE (standby) NetA, via s0 LP: 100 via PE1 LP: 200 37

BGP Active/Standby The Solution: BGP Best-External keeps advertising his best external path Requires no BGP protocol change (local change only) Availability: 12.2SRE/15.0S/MR, XE3.1 and XR3.9 Update: NetA, via PE1, LP: 200 NetA, via e0 LP: 200 via LP: 100 PE1 (active) PE3 i NetA, via PE1 LP: 200 via LP: 100 Update: NetA, via LP: 100 e0 s0 (standby) NetA, via s0 LP: 100 via PE1 LP: 200 CE Extra question: What if PE1/ acted as RR towards PE3? router bgp address-family { vpnv4 ipv4 [vrf...] } bgp advertise-best-external 38

Scenario1: ibgp Path Propagation and Reflection Regular BGP BestPath algorithm leads to an RR only reflecting one path, namely its best path Without explicit policy either hot-potato policy prevails (rule 8) or the lowest neighbor address (rule 13) is used as tiebreaker What can we do to propagate both paths to PE3/4? PE3 PE1 1/8 via PE1 RR 1/8 via 1/8 PE4 39

Path Diversity in L3VPN Environments Important sites are dual homed Unique RD allocation ensures both paths are learned, even through route reflectors For active/backup scenarios, best-external is required RD1:1/8 via, Label L1 PE3 PE1 RR RD1 RD2 1/8 PE4 RD2:1/8 via, Label L2 40

Solution 1: Internet in a VRF Unique RD allocation ensures both paths are learned, even through route reflectors Consider per-vrf or per-ce label allocation when advertising full Internet routing table within a VRF RD1:1/8 via, Label L1 PE3 PE1 RR RD1 RD2 1/8 PE4 RD2:1/8 via, Label L2 41

Solution 2: No RR Full ibgp mesh Yes, I am serious, at least for reasonably sized and static environments PE3 PE1 1/8 PE4 42

Solution 3: RR + partial ibgp mesh Done in practice by several operators Very specific and hence difficult to abstract any rule. Just know it exists and analyses the possibility. PE3 PE1 1/8 via PE1 RR 1/8 via 1/8 PE4 43

Solution 4: Engineered RR Some operators place their RR s in the topology to ensure they select different paths. Their PE s are clients to multiple RR s. the top RR is closer to PE1 and selects the black path the bottom RR is closer to and selects the blue path Above behaviour can also be achieved using specific BGP policies PE3 RR PE1 RR 1/8 PE4 44

Solution 5: AddPath New Capability to allow a BGP speaker to advertise more than one path ( The holy grail ) Available in IOS-XR 4.0, IOS-XE 3.7, 15.2(4)S, 15.3T Requires support for this functionality on RR and PEs PE3 1/8 via PE1 PE1 RR 1/8 PE4 1/8 via http://www.cisco.com/en/us/docs/ios-xml/ios/iproute_bgp/configuration/xe-3s/asr1000/irg-additional-paths.html 45

Solution 6: BGP Diverse Paths (aka Shadow RR ) New feature (IOS-XE 3.4S) allows a RR to advertise a 2 nd best path Two deployment models: 1. RR maintains two ibgp session to PEs (shown below) Primary connection advertises best path (PE1) Secondary connection (or secondary RR) advertises next-best-path () 2. Dual RRs, first RR advertises best, the other RR the blue/2 nd best path No update on PEs/RR-clients required PE3 PE1 1/8 via PE1 RR 1/8 via 1/8 PE4 http://www.cisco.com/en/us/docs/ios/ios_xe/iproute_bgp/configuration/guide/irg_diverse_path_xe.html 46

Designing (for) BGP PIC Summary Work on the baseline Improve your IGP convergence Use BFD for speedy ebgp failure detection Consider enabling MPLS to provide PE-PE tunnelling Ensure you have multiple paths and keep this always in the back of your mind, whatever BGP designs/services you are coming up with This one is the hardest one to fix once solutions/etc. are deployed 47

Coming back to: BGP Active/Standby BGP Best-External Extra question: What if PE1/ acted as RR towards PE3? Let s look at A BGP RR always needs to reflect the best path to its clients (PE3) this is the one via PE1!! We need to enable AddPath to advertise the best-external in addition to the best one Update: NetA, via PE1, LP: 200 PE1 (active) PE3 RR reflects: NetA, via PE1, LP: 200 e0 s0 CE i NetA, via PE1 LP: 200 via LP: 100 Update: NetA, via LP: 100 (standby) NetA, via s0 LP: 100 (external) via PE1 LP: 200 (best) 48

Configuring BGP PIC

Enabling BGP PIC Enabling IP Routing Fast Convergence BGP PIC leverages IGP convergence Make sure IGP converges quickly IOS-XR: IGP Timers pretty-much tuned by default IOS: Sample OSPF config: process-max-time 50 ip routing protocol purge interface interface carrier-delay msec 0 negotiation auto ip ospf network point-to-point bfd interval 100 min_rx 100 mul 3 router ospf 1 timers throttle spf 50 100 5000 timers throttle lsa all 0 20 1000 timers lsa arrival 20 timers pacing flood 15 passive-interface Loopback 0 bfd all-interfaces 50

Enabling BGP PIC Core PIC Core is a pure data plane feature, leveraging a hierarchical forwarding/cef plane On IOS devices (C7600, ASR1k), hierarchy needs to be enabled via: cef table output-chain build favor convergence-speed All other platforms supporting PIC core (CRS, XR12k ASR9k, NX-OS) do this natively, there is nothing to configure here 51

Enabling BGP PIC Edge: IOS Two BGP-PIC Edge Flavors: BGP PIC Edge Multipath and Unipath Multipath: Re-routing router load-balances across multiple next-hops, backup nexthops are actively taking traffic, are active in the routing/forwarding plane, commonly found in active/active redundancy scenarios. No configuration, apart from enabling BGP multipath (maximum-paths... ) Unipath: Backup path(s) are NOT taking traffic, as found in active/standby scenarios router bgp... address-family ipv4 [vrf...] or address-family vpnv4 bgp additional-paths install... or implicitly when enabling best external router bgp... address-family bgp advertise-best-external 52

Enabling BGP PIC Edge: IOS-XR As in IOS, PIC-Edge in XR w/ multipath requires no additional configuration PIC-Edge unipath needs to be enabled explicitly: route-policy BACKUP! Currently, only a single backup path is supported set path-selection backup 1 install [multipath-protect] [advertise] end-policy router bgp... address-family ipv4 unicast additional-paths selection route-policy BACKUP! address-family vpnv4 unicast additional-paths selection route-policy BACKUP! 53

Monitoring BGP-PIC: IOS PE3#sh bgp vpnv4 unicast vrf red BGP table version is 43, local router ID is 10.0.0.3 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale, m multipath, b backup-path, x best-external, f RT- Filter, a additional-path Origin codes: i - IGP, e - EGP,? - incomplete Network Next Hop Metric LocPrf Weight Path Route Distinguisher: 1:1 (default for vrf red) *> 100.0.0.10/32 100.1.10.10 0 0 100 i *bi 110.0.0.11/32 10.0.0.2 0 100 0 110 i *>i 10.0.0.1 0 2000 0 110 I PE3# Backup/Repair Path Use show ip route [vrf..] repair-paths to view backup paths in RIB.. 54

PE1 Monitoring BGP-PIC: IOS PE3 PIC-Edge Enabled PE3#sh bgp vpnv4 unicast vrf red 110.0.0.11/32 BGP routing table entry for 1:3:110.0.0.11/32, version 43 Paths: (2 available, best #2, table red) Additional-path-install Advertised to update-groups: 1 Refresh Epoch 1 110, imported path from 1:1:110.0.0.11/32 10.0.0.2 (metric 21) from 10.0.0.6 (10.0.0.6) Origin IGP, metric 0, localpref 100, valid, internal, backup/repair Extended Community: RT:1:100 Originator: 10.0.0.2, Cluster list: 10.0.0.6, recursive-via-host mpls labels in/out nolabel/16 Refresh Epoch 1 110, imported path from 1:2:110.0.0.11/32 10.0.0.1 (metric 21) from 10.0.0.6 (10.0.0.6) Origin IGP, metric 0, localpref 2000, valid, internal, best Extended Community: RT:1:100 Originator: 10.0.0.1, Cluster list: 10.0.0.6, recursive-via-host mpls labels in/out nolabel/17 PE3# Backup/Repair Path Primary Path PE1 Paths flagged as recursevia-host by default for vpnv4 (See later) 55

PE1 Monitoring BGP-PIC: IOS BGP Path List Primary Entry PE3 PE3#sh ip cef vrf red 110.0.0.11/32 internal i list lock adj buck choic chain contains path extension list path DD944670, path list DD60E02C, share 1/1, type recursive, for IPv4, flags must-belabelled, recursive-via-host path DD9449F0, path list DD60E2AC, share 1/1, type attached nexthop, for IPv4 nexthop 10.1.2.2 Ethernet0/1 label 17, adjacency IP adj out of Ethernet0/1, addr 10.1.2.2 DE6DF558 path DD944A60, path list DD60E2AC, share 1/1, type attached nexthop, for IPv4 nexthop 10.1.5.5 Ethernet0/2 label 18, adjacency IP adj out of Ethernet0/2, addr 10.1.5.5 DE6DF6B8 path DD9446E0, path list DD60E02C, share 1/1, type recursive, for IPv4, flags must-belabelled, repair, recursive-via-host path DD944AD0, path list DD60E2FC, share 1/1, type attached nexthop, for IPv4 nexthop 10.1.5.5 Ethernet0/2 label 19, adjacency IP adj out of Ethernet0/2, addr 10.1.5.5 DE6DF6B8 PE3# IGP Path List BGP Path List Backup (repair) Entry IGP Path list (to backup next-hop) 56

PE1 Monitoring BGP-PIC: IOS-XR PE3 RP/0/5/CPU0:C3#show bgp vrf VPN_1 10.1.0.0/22 BGP routing table entry for 10.1.0.0/22, Route Distinguisher: 100:100 Paths: (2 available, best #1) Not advertised to any peer Path #1: Received by speaker 0 Not advertised to any peer 101 10.1.1.1 (metric 120) from 172.16.1.7 (10.1.1.1) Received Label 66 Origin EGP, localpref 100, valid, internal, best, group-best, import-candidate, imported Received Path ID 0, Local Path ID 1, version 28702 Extended community: RT:100:1 Originator: 10.1.1.1, Cluster list: 172.16.1.7 Path #2: Received by speaker 0 Not advertised to any peer 101 10.2.1.1 (metric 145) from 172.16.1.7 (10.2.1.1) Received Label 66 Backup/Repair Path Origin EGP, localpref 100, valid, internal, backup, add-path, import-candidate, imported Received Path ID 0, Local Path ID 2, version 155502 Extended community: RT:100:1 Originator: 1.2.1.1, Cluster list: 172.16.1.7 57

PE1 Monitoring BGP-PIC: IOS-XR PE3 RP/0/5/CPU0:C3#show route vrf VPN_1 10.1.0.0/22 Routing entry for 10.1.0.0/22 Known via "bgp 100", distance 200, metric 0 Tag 101 Number of pic paths 1, type internal Installed Jan 25 03:21:36.026 for 1d19h Routing Descriptor Blocks 10.1.1.1, from 172.16.1.7 Backup/Repair Path shows up in routing table as well! Nexthop in Vrf: "default", Table: "default", IPv4 Unicast, Table Id: 0xe0000000 Route metric is 0 10.2.1.1, from 172.16.1.7, BGP backup path Nexthop in Vrf: "default", Table: "default", IPv4 Unicast, Table Id: 0xe0000000 Route metric is 0 No advertising protos. RP/0/5/CPU0:C3# 58

PE1 Monitoring BGP-PIC: IOS-XR PE3 RP/0/5/CPU0:C3#show cef vrf VPN_1 10.1.0.0/22 10.1.0.0/22, version 4, internal 0x40040001 (ptr 0x9e1923b8) [1], 0x0 (0x0), 0x4100 (0x9ed2b0d4) Updated Jan 25 03:21:36.793 Prefix Len 22, traffic index 0, precedence routine (0) via 10.1.1.1, 3 dependencies, recursive [flags 0x10] path-idx 0 next hop VRF - 'default', table - 0xe0000000 next hop 10.1.1.1 via 16393/0/21 next hop 10.10.13.1/32 PO0/0/1/0 labels imposed {16393 66} via 10.2.1.1, 2 dependencies, recursive, backup [flags 0x110] path-idx 1 next hop VRF - 'default', table - 0xe0000000 next hop 10.2.1.1 via 16032/0/21 next hop 10.12.16.2/32 PO0/0/0/0 labels imposed {16032 66} RP/0/5/CPU0:C3# 59

Advanced BGP PIC Topics

FIB Recursion and Summaries/Default Routes 10.0.0.3/32 BGP Net 172.16.0.0/16 Next Hops: 10.0.0.3 b: 10.0.0.4 Hmm, should I resolve my primary 10.0.0.3 via the /24 summary, or do I need to invoke the backup???? 10.0.0.4/32 10.0.0.0/24 0.0.0.0/0 IP Routing Table 10.0.0.3/32 10.0.0.4/32 10.0.0.0/24... Default behaviour is for the FIB to recurse through the longest match prefix This could include a summary or a default route, which could be a valid path to the destination However: In most designs, a /32 next-hop going away next-hop router is gone for good, there is no point to find a longer match Similar problem happens on the control plane when tracking the BGP next-hop 61

FIB Recursion and Summaries/Default Routes IOS doesn t recurse on summaries/default when PIC Edge is enabled or explicitly via: when next-hop is directly connected (ebgp) recurse-through-connected flag router bgp... address-family... bgp recursion host IOS-XR No issue for labeled paths (i.e. vpnv4/6/rfc3107 next-hops), always only recursed via /32s New in 4.2: For unlabelled paths, including VRF prefixes: Does not recurse via default route or via BGP summaries, or when candidate prefix length is less than configured: router bgp... address-family... nexthop resolution prefix-length minimum {32 128} 62

BGP Label Allocation Modes BGP Layer3 VPN generally support multiple MPLS Label Allocation modes on the PE One label per prefix (default) p1, L=20 p2, L=61 p3, L=22 p4, L=34 p1 p2 p3 p4 CE1 CE2 One label per CE p5, L=51 p6, L=51 p7, L=67 p8, L=67 p5 p6 p7 p8 CE1 CE2 One label for all routes per VRF Requires aggregate lookup p10, L=88(a) p11, L=88(a) p12, L=88(a)... p10 p11 p12 CE1 CE2 The label allocation mode has an impact to BGP PIC 63

BGP Label Allocation Modes and BGP PIC: Per-Prefix Default label allocation mode Normal label switched path to primary PE FIB: p1 via CE1, local label 111 backup:, L=222 LFIB: L 111: untagged via <ce1> backup: swap with <pe2>,222 PE1 (active) p1 CE1 p1 (standby) p1 via PE1, L=111 backup: CE1, local label 222 L 222: untagged via <ce1> 64

BGP Label Allocation Modes and BGP PIC: Per-Prefix Default label allocation mode Following PE-CE failure, primary PE1 re-routes traffic to standby (PIC-Edge) while control plane converges FIB: LFIB: p1 via CE1, local label 111 backup:, L=222 L 111: untagged via <ce1> backup: swap with <pe2>,222 PE1 (active) p1 CE1 p1 (standby) p1 via PE1, L=111 backup: CE1, local label 222 L 222: untagged via <ce1> 65

BGP Label Allocation Modes and BGP PIC: Per-VRF Per-VRF requires additional IP lookup to find the final destination This creates a transient loop in an active/standby environment with best-external... PE1 allocates per-vrf label 50 p1 via CE1, backup:, L=30 p2 via CE1, backup:, L=30 PE1 (active) p1 p2 L 50: strip label, L3 lookup in <vrf> backup: swap with <pe2>,30 CE1 p1 via PE1, L=50 backup: via CE1 p2 via PE1, L=50 backup: via CE1 p1 p2 (standby) L 30: strip label, L3 lookup in <vrf> PE1 allocates per-vrf label 30 66

BGP Label Allocation Modes and BGP PIC: Per-VRF Per-VRF requires additional IP lookup to find the final destination This creates a transient loop in an active/standby environment with best-external Following PE-CE link failure until has converged via BGP control plane convergence p1 via CE1 backup:, L=30 p2 via CE1 backup:, L=30 Loop! PE1 (active) p1 p2 L 50: strip label, L3 lookup in <vrf> backup: swap with <pe2>,30 CE1 p1 via PE1, L=50 backup: via CE1 p2 via PE1, L=50 backup: via CE1 (standby) p1 p2 L 30: strip label, L3 lookup in <vrf> 67

BGP Label Allocation Modes and BGP PIC: Per-VRF Per-VRF requires additional IP lookup to find the final destination This creates a transient loop in an active/standby environment with best-external Following PE-CE link failure until has converged via BGP control plane convergence p1 via CE1 backup:, L=30 p2 via CE1 backup:, L=30 PE1 (active) p1 p2 L 50: strip label, L3 lookup in <vrf> backup: swap with <pe2>,30 CE1 p1 via PE1, L=50 backup: via CE1 p2 via PE1, L=50 backup: via CE1 (standby) p1 p2 L 30: strip label, L3 lookup in <vrf> 68

BGP Label Allocation Modes and BGP PIC: Per-VRF Per-VRF requires additional IP lookup to find the final destination This creates a transient loop in an active/standby environment with best-external Following PE-CE link failure until has converged via BGP control plane convergence p1 via, L=30 p2 via, L=30 PE1 (active) p1 via CE1 p2 via CE1 p1 p2 p1 p2 CE1 L 30: strip label, L3 lookup in <vrf> (standby) 69

BGP Label Allocation Modes and BGP PIC: Per-CE Per-CE does not require a 2 nd lookup, so no issues with best-external, but: BGP PIC with Per-CE label allocation would require that a) All PEs hosting redundant CE connections are configured with per-ce b) All prefixes are advertised over all redundant connections Example: In below situation: which label (123 or 456) should PE1 swap for label 50 when PE1- CE1 link goes down? PE1 can t be sure! p1 via CE1, per-ce label=50 alternate:, L=123 p2 via CE1, per-ce label=50 alternate:, L=456 PE1 (per-ce label) L 50: untagged, via <ce1> backup: swap with <pe2>,???? p1 p2 CE1 p1 via CE1, per-pfx label=123 alternate:, L=50 p2 via CE2, per-pfx label=456 alternate:, L=50 p1 p2 (per prefix label) L 30: strip label, L3 lookup in <vrf> CE2 70

BGP Label Allocation Modes and BGP PIC: Per-Prefix Per-prefix has a 1:1 correspondence between label and prefix, so no ambiguities when it comes to swapping backup labels Also there are also no issues with aggregate lookups, so no problem with transient loops Full BGP PIC Edge Support!! p1 via CE1, local label 111 backup:, L=222 p2 via CE1, local label 333 backup:, L=444 PE1 (active) p1 p2 L 111: untagged via <ce1> backup: swap with <pe2>,222 L 333: untagged via <ce1> backup: swap with <pe2>,444 CE1 p1 via PE1, L=111 backup: CE1, local label 222 p2 via PE1, L=333 backup: CE2, local label 444 (standby) p1 p2 CE2 L 222: untagged via <ce1> L 444: untagged via <ce1 71

BGP Label Allocation Modes: Summary Per-Prefix Per-CE Per-VRF *) Possible transient BGP PIC Edge Fully supported Not supported loop BGP Best-External Yes Yes Possible transient loop eibgp Multipath Yes Yes No ECMP PE-CE Yes No Yes *) Future: Resilient Per-CE Label compatible with PIC-Edge 72

Inter-AS MPLS-VPN Deployments Essentially three different methods to inter-connect two MPLS-VPN networks: Option A: Individual (sub)interfaces per VPN, each SP treats the other as CE No special PIC considerations ASBRs ebgp ASBRs Option B: One interface for all VPNs, direct ebgp sessions exchanging vpnv4/6 routes One problem, see next slide ebgp Option C: One interface, ebgp session exchanging vpnv4/6 routes between RRs ebgp next-hop information (remote PEs) exchanged via direct ebgp or via IGP+LDP ebgp No special PIC considerations 73

Inter-AS Option B and BGP PIC Edge Using unique RDs on PEs ensures path diversity by making the two updates distinguishable while they are sent through the vpnv4 ibgp cloud But this also prevents the ASBRs in the local AS to make a connection between these updates for PE backup purposes ASBR1 doesn t know that it could use rd2:p1 to back up PE1 We re working on a solution to this problem Common RD and Add-Path could be used rd1:p1 via PE1 rd2:p1 via ASBR1 ASBR2 Hmm, how can I protect CE1 against PE1 failure? rd1:p1 via PE1 rd2:p1 via PE1 p1 74

Wrapping Up

Summary Routing Fast Convergence essential to deliver today s critical applications IGP conservatively meet sub-second convergence requirements Timer tuning required in IOS, IOS-XR and NX-OS optimized by default IGP needs to be kept slim BGP PIC enables BGP to achieve the same level of convergence, no matter how many prefixes we deal with Availability of alternate path needs to be kept in mind whenever designing BGP services 76

BGP PIC Implementation IOS (7600, ASR1k), NX-OS PIC-Core 7600 12.2(33)SRB: IPv4, non-ecmp 12.2(33)SRC: IPv4, non-ecmp + ECMP / vpnv4, non-ecmp 15.0(1)S: IPv4+vpnv4, non-ecmp and ECMP ASR1k: XE2.5.0 NX-OS: 5.2 PIC-Edge 7600, 7200: 12.2(33)SRE ASR1k: XE3.2.0 (v4), 3.3.0(v6) NX-OS: Radar 77

BGP PIC Implementation IOS-XR BGP PIC Core 3.4 CRS, 3.3 12k, 3.7 ASR9k BGP PIC Edge Multipath: 3.5 Unipath: 3.9 Development continues several PIC Core & Edge enhancements have recently been released in 4.2 Like 3107 + Label, additional PIC triggers and many more 78

Practice BGP PIC in LABRST-2007 Visit the Labs in Word of Solutions to get Hands On with BGP PIC BGP Prefix Independent Convergence (PIC) BGP Best-External BGP Diverse Path 79

Related Sessions At CiscoLive London 2013 BRKSPG-2402 - Best Practices to Deploy High-Availability in Service Provider Edge and Aggregation Architectures BRKRST-2333 - Network Failure Detection Previous CiscoLive (Check CiscoLive365.com) BRKRST-3363 - Routed Fast Convergence and High Availability BRKIPM-2000 - Routing Resiliency - Latest Enhancements TECRST-3190 - Fast Convergence Techtorial 80

Call to Action Visit the Cisco Campus at the World of Solutions to experience Cisco innovations in action Get hands-on experience attending one of the Walk-in Labs Schedule face to face meeting with one of Cisco s engineers at the Meet the Engineer center Discuss your project s challenges at the Technical Solutions Clinics

82