BGP Advanced Features and Enhancements George Wu TCOM610 Conditional Route Injection Network: originate route into BGP if there is corresponding routes in IP routing table Aggregate-address: inject route into BGP if there is corresponding more specific routes in BGP table Route-distribute: distribute routes into BGP table from other routing protocols Conditional route inject: inject any route into BGP based on pre-set conditions (more specific of aggregated routes etc) Used to inject more specific into BGP based on existence of aggregated route or originate default route based on certain route existence 1
Conditional Route Injection bgp inject-map ORIGINATE exist-map LEARNED_PATH route-map LEARNED_PATH permit sequence-number match ip address prefix-list ROUTE match ip route-source prefix-list ROUTE_SOURCE -- route must come from this source route-map ORIGINATE permit 10 show ip bgp injected-paths bgp inject-map {inject-map-name} exist-map {exist-mapname}[copy-attributes] Example router bgp 109! Inject twp more specifics based on existence of aggregate bgp inject-map ORIGINATE exist-map LEARNED_PATH! route-map LEARNED_PATH permit 10 match ip address prefix-list ROUTE match ip route-source prefix-list ROUTE_SOURCE! route-map ORIGINATE permit 10 set ip address prefix-list ORIGINATED_ROUTES set community 14616:555 additive! ip prefix-list ROUTE permit 10.1.1.0/24! ip prefix-list ORIGINATED_ROUTES permit 10.1.1.0/25 ip prefix-list ORIGINATED_ROUTES permit 10.1.1.128/25! ip prefix-list ROUTE_SOURCE permit 10.2.1.1/32 2
Example Inject default to BGP neighbor based on existence of certain routes router bgp 100 neighbor INT-DEFAULT default-originate routemap CONDITIONS ip prefix-list CONDITIONS permit 100.10.0.0/16 ip prefix-list CONDITIONS permit 115.165.0.0/16 ip prefix-list CONDITIONS permit 210.18.0.0/16 Load-Balancing If multiple routes from different protocols, choose the one with the lowest administrative distance (static < ebgp < OSPF < ibgp) Connected = 0, static = 1, ebgp = 20, EIGRP = 90, OSPF = 100, ISIS = 115, eeigrp = 170, ibgp = 200 Some protocols support unequal load-balancing (IGRP, EIGRP), while others support equal-cost load-balancing (ISIS, OSPF) up to certain limit Load-balancing could be either per-packet (not preferred due to out of order packets) or per destination (per flow) 3
BGP Load-balancing BGP usually only picks one best route according to BGP decision process If multiple ebgp links over two same routers: use loopback address peering and static route pointing to loopback address If multiple ebgp links over different routes (but still one router on one side): use ebgp multi-path (maximum-path n) Multi-homed to same ISP over multiple routers: use route-policy to achieve load-sharing (as-path prepend and set local-pref) Multi-homed to multi-providers: use route-policy to achieve load-sharing BGP multi-path load-balancing BGP usually pick one best route and use that route for routing E-BGP multi-path: a local router learns the same prefix from two ebgp peerings over the same external AS, it will usually pick the lowest route-id route; but with ebgp-multipath, it will loadbalancing over two prefix I-BGP multi-path: all BGP attributes are the same except next-hop; BGP-RR will void this setting since RR will only advertise one best route Also apply to MPLS-VPN case 4
BGP multi-path commands maximum-paths ibgp maximum-number maximum-paths eibgp number-of-paths Router# show ip bgp 10.22.22.0 BGP routing table entry for 10.22.22.0/24, version 119 Paths:(6 available, best #1) Multipath:iBGP Flag:0x820 Advertised to non peer-group peers: 10.1.12.12 22 10.2.3.8 (metric 11) from 10.1.3.4 (100.0.0.5) Origin IGP, metric 0, localpref 100, valid, internal, multipath, best Originator:100.0.0.5, Cluster list:100.0.0.4 22 10.2.1.9 (metric 11) from 10.1.1.2 (100.0.0.9) Origin IGP, metric 0, localpref 100, valid, internal, multipath Originator:100.0.0.9, Cluster list:100.0.0.2 22 10.2.5.10 (metric 11) from 10.1.5.6 (100.0.0.10) Origin IGP, metric 0, localpref 100, valid, internal, multipath Originator:100.0.0.10, Cluster list:100.0.0.6 BGP link bandwidth Advertise the bandwidth of an autonomous system exit link as an extended community This feature supports both internal BGP (ibgp), external BGP (ebgp) multi-path load balancing, and multi-path load balancing in Multiprotocol Label Switching (MPLS) Virtual Private Networks (VPNs). Link bandwidth extended community indicates the preference of an autonomous system exit link in terms of bandwidth Feature available for directly connected peers only (no ebgp multi-hop) Requires BGP multi-path load-balancing To achieve unequal cost load-balancing over multiple links 5
BGP Policy Update BGP is stateful and build on top of TCP BGP only exchanges full routing table at the beginning and incremental updates afterwards What if you or your BGP neighbor change routing policy over an established sessions? Deny some of the existing routes Raise local-pref for some routes Attach new community for some routes Old Approach: you have to clear the BGP sessions and re-establish session from the beginning again for the new policy to take effect Cause network BGP instability Increase router load What is best time to clear the session? Time-zone difference BGP soft reset Hard reset invalidates the cache and results in a negative impact on the network performance when the information in the cache becomes unavailable. A hard reset is also disruptive because active BGP sessions are torn down. A soft reset, which is performed on a per-neighbor basis, does not clear the BGP session and facilitates the application of new policies. There are two methods of performing a soft reset: A dynamic inbound soft reset is used to generate inbound updates from a neighbor. An outbound soft reset is used to send a new set of updates to a neighbor. both BGP peers must support the soft route refresh capability 6
Soft reset commands clear ip bgp * ip-address peer-group-name peer-asn soft in clear ip bgp * address peer-group-name peer-asn soft out neighbor ip-address peer-group-name softreconfiguration inbound router bgp 100 neighbor 131.108.1.1 remote-as 200 neighbor 131.108.1.1 soft-reconfiguration inbound BGP Capability Negotiations BGP speaker receives an OPEN message with one or more unrecognized Optional Parameters, the speaker must terminate BGP peering MAY include an Optional Parameter, called Capabilities. Optional Parameter carried by the OPEN message If not supported, may terminate peering or establish without the optional capability Known Capability parameters: MP-BGP extension (AFI/SAFI), Route Refresh, Outbound Route Filtering 7
Capability Parameters +------------------------------+ Capability Code (1 octet) +------------------------------+ Capability Length (1 octet) +------------------------------+ Capability Value (variable) +------------------------------+ new Error Subcode Unsupported Capability (7) Route-refresh capability BGP speaker uses BGP Capabilities Advertisement [BGP- CAP]. This capability is advertised using the Capability code 2 and Capability length 0. BGP speaker is capable of receiving and properly handling the ROUTE-REFRESH message should advertise the Route Refresh Capability to the peer using BGP Capabilities advertisement BGP speaker shall re-advertise to that peer the Adj-RIB- Out of the <AFI, SAFI> carried in the message, based on its outbound route filtering policy. 8
Route-refresh messages Type: 5 - ROUTE-REFRESH Message Format: One <AFI, SAFI> encoded as 0 7 15 23 31 +-------+-------+-------+-------+ AFI Res. SAFI +-------+-------+-------+-------+ AFI - Address Family Identifier (16 bit). Res. - Reserved (8 bit) field. Should be set to 0 by the sender and ignored by the receiver. SAFI - Subsequent Address Family Identifier (8 bit). Max-prefix Configure max-prefix limit from a peer To protect accidental mis-configuration or malicious attacks neighbor {ip-address peer-group-name} {maximumprefix maximum [threshold]} [restart restart-intervalmins] [warning-only] Default threshold: 75% of the max-prefix Router will try to re-establish BGP session after the configured minutes. Default: to stay down forever You may want to enable warning-only to notify engineers to correct the problem rather than shutting down the session 9
Max-prefix example router bgp 101 neighbor 192.168.6.6 maximum-prefix 500 warning-only router bgp 101 neighbor 192.168.6.6 maximum-prefix 2000 restart 30 Configuration commands neighbor ip-address dmzlink-bw router bgp 100 no synchronization bgp log-neighbor-changes neighbor 172.16.1.18 remote-as 200 neighbor 172.16.1.18 dmzlink-bw neighbor 172.16.1.22 remote-as 200 neighbor 172.16.1.22 dmzlink-bw maximum-paths 6 10
BGP dampening (RFC2439) Provide a mechanism capable of reducing router processing load caused by instability the BGP decision process and adding/removing forwarding entries To prevent sustained routing oscillations without sacrificing route convergence time for generally well behaved routes Two approaches: use timer adjustment or route damping BGP Timer Approach Timer approach: no space overhead slow down every s route s convergence (even good behaved routes) Need very long timer (minutes or hours) to effectively reduce flapping route s churn Timers associated with BGP route advertisement: MinRouteAdvertisementInterval timer: 30 seconds MinASOriginationInterval timer: 15 seconds Short timers combined with route damping and BGP update packing provides good performance 11
BGP Route Flap & Damping Keep state info for every route Require more space per route and more processing overhead How to distinguish well-behaved route and flapping routes? Tolerable to wait for packing if large amount of updates received in a short time Hold time for flapping route should be proportional to the expectation of the route s future stability Use a route s history to predicate its future behavior If a route with stable history make some quick transitions it is okay; but if for extended time, it should be suppressed Route Instability BGP specification originally cited fixed timers as a method to control frequent route changes and to better pack updates MinRouteAdvertisementInterval - 30 seconds MinASOriginationInterval - 15 seconds This had the bad side effect of delaying convergence, and also relied upon your peer to do the right thing neighbor (ip-address peer-group-name) advertisementinterval seconds default is 30 seconds for EBGP, 5 seconds for IBGP 12
Route damping algorithm Each route is assigned a figure of merit The merit is increased for each time that route flaps The routes with high value of merit over threshold are suppressed The merit is decreased exponentially (and by a computationally efficient algorithm) Control by half-life time If the merit is down to certain reuse threshold, the flapped routes will be reused again BGP Damping Parameters 13
Parameters Parameters cutoff threshold reuse threshold maximum hold down time decay half life while reachable decay half life while unreachable decay memory limit (used to limit internal array sizes) Default Parameters Defaults penalty - default 1000 half-life - default 15 minutes, the number of minutes it takes for the penalty to decay by 1/2 suppress limit - default 2000, If a route is suppressed the penalty must decay to this value to be unsuppressed reuse limit - default 750, If a route is suppressed the penalty must decay to this value to be unsuppressed maximum suppress time (mins) - default 4 x half-life, The maximum number of minutes a route may be suppressed Example bgp dampening 20 1000 10000 150 14
Damping Parameters Calculated parameters: max-penalty The maximum penalty a route may have that will allow the penalty to decay to reuse-limit within max-suppress-time max-penalty = reuse-limit * 2^(max-suppress-time/half-life) If half-life is 30, reuse-limit is 800, and max-suppress-time is 60 then the max-penalty would be 3200. If we allowed the penalty to reach 3201 it would be impossible for the penalty to decay to 800 within 60 minutes. IOS will generate a warning message if the max-penalty is above 20,000 or less than the suppress-limit. Configuring BGP Damping router bgp 1000 bgp dampening [half-life reuse suppress max-suppresstime] [route-map map] for global BGP route damping parameters set dampening (in route-map configuration, to selectively apply dampening show ip bgp dampened-paths show ip bgp flap-statistics clear ip bgp dampening (in CLI) clear ip bgp flap-statistics 15
BGP Damping A route can only be suppressed when receiving an advertisement. Not when receiving a WITHDRAW. Attribute changes count as a flap (1/2). In order for a route to be suppressed the following must be true: The penalty must be greater than the suppress-limit An advertisement for the route must be received while the penalty is greater than the suppress-limit A route will not automatically be suppressed if the suppresslimit is 1000 and the penalty reaches 1200. The route will only be suppressed if an advertisement is received while the penalty is decaying from 1200 down to 1000. Damping Example Small suppress window: Half-life of 30 minutes, reuse-limit of 800, suppresslimit of 3000, and max-suppress-time of 60 max-penalty is 3200 Advertisement must be received while penalty is decaying from 3200 down to 3000 for the route to be suppressed A 3 min 45 second (rough numbers) window exist for an advertisement to be received while decaying from 3200 to 3000. 16
Example II No window: Half-life of 30 minutes, reuse-limit of 750, suppresslimit of 3000, and max-suppress-time of 60 max-penalty = 750 * 2^(60/30) = 3000 Here the max-penalty is equal to the suppress-limit The penalty can only go as high as 3000. The decay begins immediately, so the penalty will be lower than 3000 by the time an advertisement is received. A route could consistently flap several times a minute and never be suppressed Damping in route-map Use this in a route map configuration to modify dampening parameters based on the match clauses Useful to damp specific peers Useful to damp differently based on address length The same parameters as bgp dampening command 17
Clearing Damping Stats Clearing the flap statistics on a route... clear ip bgp flap-statistics [(regexp regexp) (filter-list list) )(address mask)] clear ip bgp address flap-statistics Parameters regexp is an aspath regexp filter-list is an aspath list number address and mask can identify an address to be cleared Show flap stats Displaying the BGP flap statistics show ip bgp flap-statistics [(regexp regexp) (filter-list list) (address mask [longer])] Parameters regexp - aspath regexp filter-list - as path access-list address/mask longer - do the more specifics, too 18
Example router bgp 1239 bgp dampening route-map flap-dampen ip as-path access-list 77 deny ˆ$ ip as-path access-list 77 deny ˆ617[4-7]_ ip as-path access-list 77 permit.* access-list 119 permit ip any 255.255.224.0 0.0.31.255 access-list 124 permit ip any 255.255.255.0 0.0.0.255 route-map flap-dampen permit 10 match ip address 124 match as-path 77 set dampening 45 125 2000 255 route-map flap-dampen permit 20 match ip address 119 match as-path 77 set dampening 30 750 2000 45 route-map flap-dampen permit 30 match as-path 77 set dampening 15 750 2000 30 Damping or not? Turn on damping might cause flapping routes not reachable for extended time If routers can handle the overhead of flapping routes, why do you need to turn on damping? If damped, cause outages for those routes Might be best to do selective damping on perpeer base Don t damp customer routes, only damp peer routes? 19
MP-BGP (RFC2858) Current BGP-4 is used to advertise IPv4 address reachability only Extends BGP for multiple network layer protocols (IPv6, multicast, MPLS-VPN, CLNS etc) Introduce optional non-transitive attributes (MP_REACH_NLRI, MP_UNREACH_NLRI) Introduce new BGP peer capability to determine if a peer supports MP-BGP extensions; Exchange of MP-NRLI must be exchanged at session set-up New Attributes One or more of triples: (AFI, SubAFI, NRLI) MP_REACH_NLRI is used to advertise a feasible route to a peer advertise the NL address as the next hop to the destinations listed in the NLRI report the Subnetwork Points of Attachment (SNPAs) within the local system MP_UNREACH_NLRI is used to withdraw multiple unfeasible routes 20
MP-BGP Reachability Attributes AFI (2 octets) Sub AFI (1 octet) Len of Next Hop NetAddress (1 octet) Next Hop NetAddress (variable) Number of SNPAs (1 octet) Len of 1 st NPA(1 octet) First SNPA (variable). Length of Last SNPA (1 octet) Last SNPA (variable) Network Layer Reachability Information (variable) AFI: IP=1, IPv6=2, IPX=11 Sub AFI: Unicast=1, Multicast=2, Uni/Multi=3, Label=4, VPN=128 MP-BGP Unreacability Attributes +---------------------------------------------------------+ Address Family Identifier (2 octets) +---------------------------------------------------------+ Subsequent Address Family Identifier (1 octet) +---------------------------------------------------------+ Withdrawn Routes as NRLI (variable) +---------------------------------------------------------+ NRLI Encoding: +---------------------------+ Length (1 octet) +---------------------------+ Prefix (variable) +---------------------------+ 21
NRLI Encoding Length: length in bits of the address prefix. A length of zero indicates a prefix that matches all addresses Prefix: address prefix followed by enough trailing bits to make the end of the field fall on an octet boundary. Note that the value of trailing bits is irrelevant Example: 210.8.0.0/23: len=23, prefix=210.8 MP-BGP Update 22
Next-Hop V2 draft-lefaucheur-mp-nh-01cisco draft to use MP-BGP for different NRLI reachability In [MP-BGP], the SAFI field is used to further qualify the semantics of the NLRI (e.g. unicast, multicast, label, VPN...). SAFI field could potentially be used to also convey the Network Layer Protocol of the next hop information (currently next-hop must have the same AFI) NEXT_HOP-v2 attribute (Type Code TBD ): to contain AFI/SAFI/length etc To be used for cases like IPv4 VPNs over IPv6 Core, IPv4 over IPv6 Core, L2 VPNs over IPv6, IPv4 VPNs over IPv4 with Tunnel Encaps MPLS L3-VPN RFC2547 specifies L3-BGP MPLS VPN Follows Peer to Peer VPN model Provides full-mesh connectivity among the same VPN sites Possible to handle hub-spoke and complicated VPN topology Handle overlapping IPv4 address among different VPNs Introduce new attributes: RD: Route Distinguisher VPNv4 address family RT: Route Target Label 23
Route Distinguisher (RD) To differentiate 10.0.0.0/8 in VPN-A from 10.0.0.0/8 in VPN-B Total 64-bit quantity Configured as ASN:Index or IPADDR:Index format Purely to make a route unique Unique route is now RD:Ipaddr (96 bits) plus a mask on the IPAddr portion So customers don t see each others routes Route Target! ip vrf red rd 1:1 route-target export 1:1 route-target import 1:1 To control policy about who sees what routes 64-bit quantity (2 bytes type, 6 bytes value) Carried as an extended BGP community Typically written as ASN:Index Each VRF imports and exports one or more RTs Exported RTs are carried in VPNv4 BGP Imported RTs are local to the box A PE that imports an RT installs that route in its routing table 24
VPNv4 In BGP for IP, 32-bit address + mask makes a unique announcement In BGP for MPLS-VPN, (64-bit RD + 32-bit address) + 32-bit mask makes a unique announcement Since the route encoding is different, use a different address family in BGP VPNv4 = VPN routes for IPv4 As opposed to IPv4 or IPv6 or multicast-rpf, etc VPNv4 announcement carries a label with the route If you want to reach this unique address, get me packets with this label on them CE VPN Operations Paris BGP, OSPF, RIPv2 update 149.27.2.0/24,NH=CE-1 PE-1 VPN-v4 update: RD:1:27:149.27.2.0/24, Next-hop=PE-1 RT=VPN-A Label=(28) Service Provider Network PE-2 London CE PE routers translate into VPN-V4 route Assigns an RD, SOO (if configured) and RT based on configuration Re-writes Next-Hop attribute (to PE loopback) Assigns a label based on VRF and/or interface Sends MP-BGP update to all PE neighbors 25
CE VPN Operations Paris VPN-v4 update is translated into IPv4 address and put into VRF VPN-A as RT=VPN- A and optionally advertised to any attached sites CE London BGP, OSPF, RIPv2 update 149.27.2.0/24,NH=CE-1 PE-1 VPN-v4 update: RD:1:27:149.27.2.0/24, Next-hop=PE-1 RT=VPN-A Label=(28) Service Provider Network PE-2 Receiving PE routers translate to IPv4 Insert the route into the VRF identified by the RT attribute (based on PE configuration) The label associated to the VPN-V4 address will be set on packets forwarded towards the destination L3VPN Operations Regular IP packets between PE and CE Label switching within the provider networks label stack Outer label: get this packet to the egress PE Inner label: get this packet to the egress CE MPLS Core forward packets based on top label Outer label is from LDP/RSVP, Inner label is from BGP Core does not run VPNv4 BGP! VPNv4 IBGP peering is established between PE routers only CE does not know it s in an MPLS-VPN Separate IPv4 BGP peering and VPNv4 MP-BGP peering 26