DD2491 p2 2011. Intro to BGP. Olof Hagsand KTH CSC



Similar documents
DD2491 p Inter-domain routing and BGP part I Olof Hagsand KTH/CSC

BGP overview BGP operations BGP messages BGP decision algorithm BGP states

BGP Terminology, Concepts, and Operation. Chapter , Cisco Systems, Inc. All rights reserved. Cisco Public

How To Understand Bg

Border Gateway Protocol (BGP)

Inter-domain Routing Basics. Border Gateway Protocol. Inter-domain Routing Basics. Inter-domain Routing Basics. Exterior routing protocols created to:

Routing Protocol - BGP

Module 7. Routing and Congestion Control. Version 2 CSE IIT, Kharagpur

APNIC elearning: BGP Attributes

E : Internet Routing

Exterior Gateway Protocols (BGP)

BGP Attributes and Path Selection

Using the Border Gateway Protocol for Interdomain Routing

BGP Best Path Selection Algorithm

Load balancing and traffic control in BGP

The ISP Column. An Introduction to BGP the Protocol

Examination. IP routning på Internet och andra sammansatta nät, DD2491 IP routing in the Internet and other complex networks, DD2491

Chapter 6: Implementing a Border Gateway Protocol Solution for ISP Connectivity

--BGP 4 White Paper Ver BGP-4 in Vanguard Routers

Internet inter-as routing: BGP

APNIC elearning: BGP Basics. Contact: erou03_v1.0

JUNOS Secure BGP Template

BGP Router Startup Message Flow

BGP4 Case Studies/Tutorial

Chapter 49 Border Gateway Protocol version 4 (BGP-4)

Route Discovery Protocols

BGP Basics. BGP Uses TCP 179 ibgp - BGP Peers in the same AS ebgp - BGP Peers in different AS's Private BGP ASN. BGP Router Processes

Load balancing and traffic control in BGP

Bell Aliant. Business Internet Border Gateway Protocol Policy and Features Guidelines

Border Gateway Protocol BGP4 (2)

Understanding Virtual Router and Virtual Systems

Border Gateway Protocol (BGP-4)

Routing in Small Networks. Internet Routing Overview. Agenda. Routing in Large Networks

How To Set Up Bgg On A Network With A Network On A Pb Or Pb On A Pc Or Ipa On A Bg On Pc Or Pv On A Ipa (Netb) On A Router On A 2

DD2491 p MPLS/BGP VPNs. Olof Hagsand KTH CSC

JNCIA Juniper Networks Certified Internet Associate

BGP Advanced Routing in SonicOS

BGP Routing. Course Description. Students Will Learn. Target Audience. Hands-On

Multihomed BGP Configurations

Introduction to Routing

Configuring BGP. Cisco s BGP Implementation

Lecture 18: Border Gateway Protocol"

Internet inter-as routing: BGP

GregSowell.com. Mikrotik Routing

Interdomain Routing. Project Report

> Border Gateway Protocol (BGP-4) Technical Configuration Guide. Ethernet Routing Switch. Engineering

IK2205 Inter-domain Routing

Basic Configuration Examples for BGP

CS 457 Lecture 19 Global Internet - BGP. Fall 2011

Inter-domain Routing. Outline. Border Gateway Protocol

IPv6 over MPLS VPN. Contents. Prerequisites. Document ID: Requirements

BSCI Module 6 BGP. Configuring Basic BGP. BSCI Module 6

Chapter 33 BGP Configuration Guidelines

Internet Firewall CSIS Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS net15 1. Routers can implement packet filtering

Gateway of last resort is to network

MPLS VPN. Agenda. MP-BGP VPN Overview MPLS VPN Architecture MPLS VPN Basic VPNs MPLS VPN Complex VPNs MPLS VPN Configuration (Cisco) L86 - MPLS VPN

Understanding Route Redistribution & Filtering

basic BGP in Huawei CLI

Simple Multihoming. ISP/IXP Workshops

BGP: Frequently Asked Questions

Configuring and Testing Border Gateway Protocol (BGP) on Basis of Cisco Hardware and Linux Gentoo with Quagga Package (Zebra)

DD2491 p BGP-MPLS VPNs. Olof Hagsand KTH/CSC

Fireware How To Dynamic Routing

BGP Operations and Security. Training Course

HP Networking BGP and MPLS technology training

CS551 External v.s. Internal BGP

- Border Gateway Protocol -

netkit lab bgp: prefix-filtering Università degli Studi Roma Tre Dipartimento di Informatica e Automazione Computer Networks Research Group

Advanced Networking Routing: RIP, OSPF, Hierarchical routing, BGP

BGP Advanced Features and Enhancements

Simple Multihoming. ISP Workshops. Last updated 30 th March 2015

Application Note: Securing BGP on Juniper Routers

Border Gateway Protocol Best Practices

Active measurements: networks. Prof. Anja Feldmann, Ph.D. Dr. Nikolaos Chatzis Georgios Smaragdakis, Ph.D.

MPLS VPN Route Target Rewrite

Understanding Route Aggregation in BGP

Outline. EE 122: Interdomain Routing Protocol (BGP) BGP Routing. Internet is more complicated... Ion Stoica TAs: Junda Liu, DK Moon, David Zats

Transitioning to BGP. ISP Workshops. Last updated 24 April 2013

Week 4 / Paper 1. Open issues in Interdomain Routing: a survey

6.263 Data Communication Networks

Textbook Required: Cisco Networking Academy Program CCNP: Building Scalable Internetworks v5.0 Lab Manual.

Advanced BGP Policy. Advanced Topics

Dove siamo? Architecture of Dynamic Routing

Today s Agenda. Note: it takes years to really master BGP Many slides stolen from Prof. Zhi-Li Zhang at Minnesota and from Avi Freedman s slides

BGP Techniques for Internet Service Providers

IP Routing Tecnologie e Protocolli per Internet II rev 1

Border Gateway Protocol (BGP) Conformance and Performance Testing

BGP Multihoming Techniques

Routing with OSPF. Introduction

Border Gateway Protocol Security

UPDATE = [Withdrawn prefixes (Optional)] + [Path Attributes] + [NLRIs].

BGP Techniques for Internet Service Providers

What's inside the cloud?!

Interdomain Routing. Outline

Border Gateway Protocol, Route Manipulation, and IP Multicast

Network Level Multihoming and BGP Challenges

BGP Multihoming Techniques

BGP Techniques for Internet Service Providers

BGP. 1. Internet Routing

Configuring BGP Services

Transcription:

DD2491 p2 2011 Intro to BGP Olof Hagsand KTH CSC 1

Literature Main course book Russ White, Danny McPehrson, Srihari Sangli, "Practical BGP", Addison-Wesley, ISBN 0-321-12700-5. Follow reading instructions RFC 4271 Junos Cookbook Many vendor pages See web literature page 2

Inter-domain routing The objective of inter-domain routing is to bind together the hundreds of thousands of independent IP networks that constitute the Internet Perspective from one network: Spread routing information to the outside world Originate and aggregate address prefixes Announce prefixes to other domains Tag prefixes with routing information Receive information from the outside world Receive and choose (filter) between prefixes from other domains Transfer information through your routing domain Received information from one domain may be transferred (and possibly modified) to other domains 3

What is BGP? Border Gateway Protocol version 4 An inter-domain routing protocol Defined in RFC 4271 Uses the destination-based forwarding paradigm No other relations can be expressed: sources, tos, link load Uses path-vector routing Views the Internet as a collection of autonomous systems Exchanges information between peers using TCP as underlying protocol Maintains a database (RIBs) of network layer reachability information (NLRI:s) Supports a toolkit of mechanisms to express and enforce policy decisions at AS level 4

IGP/EGP ISP EGP Customer IGP IGP EGP Exterior Gateway Protocol. Runs between networks/domains (inter-domain) Examples: BGP, static routing Note that BGP can also run internally in a network: IBGP IGP Interior Gateway Protocol. Runs within a network/domain (intra-domain) Examples: RIP, OSPF, IS-IS. 5

Why cant we use an IGP? Router protocols carry reachability, path information and policy IGPs do not carry policy information Do not need to since they all have the same policy within a routing domain Speed of convergence most important in IGP Link-state protocols Link-state protocols do not scale with respect to nodes the internet is too large. You want to separate internal and external routes IGPs reveal too much detail about your network, ie the topology But there have been many research proposals for a more homogenous routing protocol, such as a link-state protocol using 'hiding' of internal topology (eg areas) and adding of policies. See page 3 in 'Practical BGP' 6

Autonomous Systems (AS) A set of routers that has a single routing policy, that run under a single technical administration A single network or group of networks University, business, organization, operator This is viewed by the outside world as an Autonomous System All interior policies, protocols, etc are hidden within the AS Represented in the Internet by an Autonomous System Number (ASN). 0-65535 Example: ASN 1653 for SUNET Currently, operators are switching to four-byte ASNs RFC 4893: BGP Support for Four-octet AS Number Space 7

General AS graph AS1 AS2 AS3 AS4 AS5 AS6 AS7 AS8 AS9 8

Whois example gelimer.kthnoc.net> whois -h whois.ripe.net AS1653 aut-num: AS1653 as-name: SUNET descr: SUNET Swedish University Network import: from AS42 accept AS42 export: to AS42 announce AS-SUNET import: from AS702 accept AS702:RS-EURO AS702:RS-CUSTOMER export: to AS702 announce AS-SUNET import: from AS2603 accept any export: to AS2603 announce AS-SUNET import: from AS2831 accept AS2831 AS2832 export: to AS2831 announce any import: from AS2833 accept AS2833 export: to AS2833 announce any import: from AS2834 accept AS2834 export: to AS2834 announce any gelimer.kthnoc.net> whois -h whois.ripe.net AS-SUNET as-set: AS-SUNET descr: SUNET AS Macro descr: ASes served by SUNET members: AS1653, AS2831, AS2832, AS2833, AS2834, AS2835, AS2837 members: AS2838, AS2839, AS2840, AS2841, AS2842, AS2843, AS2844 members: AS2845, AS2846, AS3224, AS5601, AS8748, AS8973, AS9088 members: AS12384, AS15980, AS16251, AS20513, AS25072, AS28726 members: AS-NETNOD (Edited example) 9

ISP Tiers Tier-1 (TeliaSonera (1299), Sprint(1239), Verizon(701), Deutche Telekom(3320), NTT(2914), Level3(3356), AT&T(7018),...) Bacbone networks that can reach every other network. They do not pay anyone else for transit, since they exchange traffic (peer) with all other tier-1 networks. Everyone else pays to peer with tier-1 ISPs, to get connectivity the whole Internet. Tier-2 Large regional ISPs. They buy transit from Tier-1 networks to reach some parts of the Internet. Engages heavily in peering to avoid paying for transit. Tier-3 Smaller ISPs. Buys transit form either tier-1 or tier-2 networks. Exchanges traffic at a single IX. Often single-homed. (All these definitions are open for dispute.) 10

RIPE All public IP addresses and AS numbers is provided by IANA (Internet Assigned Numbers Authority). www.iana.org. IANA also handles the top level domains and protocol numbers. IANA delegate IP blocks and AS numbers to RIRs (Regional Internet Registry) so they in turn can delegate space to LIRs (Local Internet Registry). RIPE NCC (Réseaux IP Européens Network Coordination Centre) is the European RIR Also handles non European countrys like Israel, Iraq, Iran, Russia and many more. www.ripe.net RIPE NCC is one of five RIRs in the world ARIN, American Registry for Internet Numbers, www.arin.net LACNIC, Latin American and Caribbean Internet Addresses Registry, www.lacnic.net APNIC, Asia Pacific Network Information Centre, www.apnic.net AfriNIC, the African Network Information Centre, www.afrinic.net 11

PI and PA address space There are two classes of IP addresses Provider Independent (PI) Provider Aggregatable (PA) PA space is block of prefixes delegated to a LIR (Local Internet Registry) and to be used by themselves and their customers. If you use PA space and decide to change ISP the address space have to be returned. PI space is addresses that end customers can request directly from RIPE. Good for the end customer, but 'bad' for the ISP. RIPE will also start charging for PI space 12

Cost and peering relations Full Internet connectivity AS1 AS2 You pay for transit traffic. Transit -$ -$ NSPs ISPs AS3 Peer 0 AS4 AS5 +$ +$ Customer Stubs/ Customers AS6 AS7 AS8 AS9 13

Peering relations Full Internet connectivity AS1 AS2 Note where there are no traffic arrows! Transit NSPs ISPs AS3 Peer AS4 AS5 Traffic Customer Stubs/ Customers AS6 AS7 AS8 AS9 14

Peering relations An abstract way of defining peering relations is for example: Prefix sets: Define a customer set, a peering set and a transit set Example rules: Customer prefixes should be announced to transit and peers Peer and transit prefixes should be announced to customers Prefer prefixes from peers over prefixes from transit Do not accept illegal prefixes (RFC 1918 for example), or unknown prefixes from customers Load balance over several transit providers Filter traffic (eg src addresses) according to the prefixes announced 15

Customer / ISP Relations: Stub AS Provider Announced networks, traffic flows in other direction Customer n1, n2 Typical customer topology Can use default route to reach the Provider and Internet Customer can use address block of provider Customer does not need to be a separate AS Typically use static routing but can also use BGP Less common: Use a separate IGP (eg RIP) only to exchange routes between border routers. 16

Multi-homed customer ISP ISP ISP Customer Customer Customer Customer can be multi-homed for reliability and performance reasons Load sharing or geographical traffic distribution Multi-homed non-transit AS Non Transit AS does now allow external traffic to pass through What to think about: How to announce the prefixes Default routes Symmetrical routing Packet filtering, Address aggregation, etc 17

Customer with multiple providers ISP1 ISP2 Customer n1, n2 For several upstream providers address aggregation is an important issue: Which address block should the Customer use? From ISP1 or ISP2? From both? Or an independent address block? 18

Provider: Multi-homed Transit AS ISP1 ISP2 n3, n4 n5, n6 Multi-homed transit AS n1, n2 Transits traffic within own network This the most general configuration and is how a provider works. Need PI addresses 19

BGP sessions BGP connections (peerings) are setup manually and use TCP Two peers must have IP connectivity Things to think about (see Router A): How are routes imported into AS1653? How are routes propagated to AS923? Which are the BGP nexthops? How is external traffic sent through AS1653? AS923 EBGP IBGP AS1653 A EBGP AS42 20

Example JunOS configuration routing-options { autonomous-system 1653 } protocol bgp { group external-peers { type external; peer-as 42; neighbor 192.168.200.13; } group internal-peers { type internal; local-address 192.168.24.1; neighbor 192.168.16.1; neighbor 192.168.6.1; } } AS923 EBGP IBGP AS1653 A EBGP AS42 21

Path vector protocol In a distance-vector protocol, vectors with destination information are distributed between routers: Example: <dst: 10.1.10/24, metric: 5, nexthop: 10.2.3.4> Distance-vector has problems with converging Example: count-to-infinity Path-vector extends the information with a path to the destination This enables immediate loop detection Several other attributes associated with path Also, in BGP, the path vector uses AS:s, not IP addresses This hides internal structure in the domains Loop detection only on AS-numbers! Example: <dst: 10.1.10/24, path: AS1:AS3:AS5, nexthop: 10.2.3.4> 22

BGP Operation neighbor/peer Establish BGP session (TCP port 179) neighbor/peer BGP speaker OPEN messages BGP speaker UPDATE messages KEEPALIVE messages NOTIFICATION messages 23

BGP protocol operation OPEN messages to initiate a connection and exchange capabilities UPDATES contain A set of path attributes A set of prefixes sharing the path attributes A set of withdrawn routes BGP compares the AS path and other attributes to select the best path for a prefix Same prefix may be received from several peers Path attributes describe properties of the route How it was generated, which is the nexthop, various metrics, etc NOTIFICATION to signal errors KEEPALIVE to check liveness of peer 24

BGP Connections All updates are incremental Basic case: No refreshes This assumes reliable delivery! Therefore BGP runs over TCP Fragmentation, Acks, Flow control, Congestion control Byte stream No automatic neighbour discovery This assumes IP connectivity between peers! But via other mechanism than BGP Either IGP, static, or directly connected. So BGP connections can rely on an IGP BGP does not use TCP keepalives (which by default is on the order of hours) A Direct BGP connection C Multi-hop BGP connection B 25

Path attributes Path attributes describe some property of a destination prefix You 'tag' a prefix with path attributes describing some property of the route Categories: Well-known: All BGP implementations must recognize them Mandatory: Must always be present in all updates Discretionary: May or may not be sent in an update Transitive: Must be passed on to next peer Optional: All BGP implementations need not recognize them Optional + Transitive has proven to be an excellent way to introduce new features seamlessly! If a router does not recognize it, it just passes it along to its peers. Therefore, almost all novel attributes are optional + transitive 26

BGP Finite State Machine Active Connect fails Retry timer tcp connect Connect tcp connect TCP connection Close tcp Connect succeeds Send OPEN Idle Capability mismatch Close tcp OpenSent Rcv OPEN Send KEEPALIVE NOTIFICATION NOTIFICATION NOTIFICATION KEEPALIVE NOTIFICATION UPDATE OpenConfirm Established BGP connection KEEPALIVE Practical BGP Figure 1.7 contains a more complete state-machine 27

BGP message header BGP message header format 0 7 15 23 32 Marker Length Type Marker field: Authentication of incoming BGP messages Detect loss of synchronization between two BGP peers Length field: total message length including the header Type field: indicates the message type 28

OPEN message 0 7 15 23 32 Version My autonomous system Hold time BGP identifier Opt parm len Optional parameters (variable) Version: version of BGP message (current is 4) My AS: ASN of the BGP speaker Hold Time: Maximum interval between KEEPALIVE or UPDATE messages BGP Identifier: Sender s BGP ID Optional Parameter length: total length of the Optional Parameter field Optional Parameter: use in BGP session negotiation 29

UPDATE message 0 15 Withdrawn routes length (2bytes) Withdrawn routes Withdrawn routes (variable) Total path attribute length (2bytes) Path attributes (variable) Network Layer Reachability Information (variable) Path attribute NLRI 3 basic blocks of UPDATE message: Withdrawn routes Path attributes Network Layer Reachability Information (NLRI) 30

Withdrawn Routes Withdrawn Routes Length: total length of Withdrawn Routes field 0 means no routes being withdrawn and Withdrawn Routes field is not present in this UPDATE message Withdrawn Routes field: contains list of prefixes that are being withdrawn Each prefix is encode as a 2-tuple of <length, prefix> (CIDR) Withdrawn routes length (2bytes) Withdrawn routes (variable) 31

Path Attributes Total Path Attribute Length: total length of the Path Attribute field '0' indicates that neither NLRI field nor the Path Attribute field is present in the UPDATE message Path Attributes: A sequence of path attributes is presents in every UPDATE message except message that carries only withdrawn routes Each part attribute is a triple of <attribute type, attribute length, attribute value> Total path attribute length (2bytes) Path attributes (variable) 32

Path Attribute type Attribute type: consists of type code and flags Type code: contains the attribute code maintained by IANA Flags Bit 0: well-known (0) or optional (1) Universally known? Bit 1: for optional attribute; non-transitive (0) or transitive (1) Pass the attribute to other neighbors? Bit 2: for optional transitive attribute; complete (0) or partial (1) Is attribute known by all on the path? Bit 3: for attribute length; one octet (0) or two octets (1) Lower-order four bits: unused and always set to 0 0 1 2 3 4 7 15 Attribute flags Attribute type code Unused (must be 0) Optional bit Transitive bit Partial bit Extended Length bit 33

Network Layer Reachability Information (NLRI) Contains list of prefixes that are being advertised Each prefix is encoded as a 2-tuple of <length, prefix> NLRI length = UPDATE message length HDR length Total length of Path Attributes field Total length of Withdrawn Routes field NOTE: RFC 4760: Multiprotocol extensions for BGP-4 places a generalized NLRIs in an NLRI- attribute This means that NLRI for non-ipv4 protocols is obsolete! 34

NOTIFICATION message 0 7 15 23 32 Error Error subcode Data (variable) Error code: indication the type of notification Error subcode: more specific information about the nature of the error Data: contains data relevant to the error e.g. Bad header, illegal ASN Data Length = Message Length - 21 Example 0 7 15 23 32 1 2 5000 Error: 1 Message Header Error Error subcode: 2 Bad Message Length Data: 5000 is the erroneous length 35

KEEPALIVE message Periodically sent to determine whether peers are reachable Sent at a rate that ensures that hold time will not expire Recommended rate is one-third of the Hold Timer Must not be sent more frequently than one per second If Hold Timer is 0, periodic KEEPALIVE must not be sent 36

Path attributes categories Path attributes are characterized according to categories: Well-known: All BGP implementations must recognize them Mandatory: Must always be present in all updates Discretionary: May or may not be sent in an UPDATE Transitive: Must be passed on to next peer Optional: All BGP implementations need not recognize them Optional + Transitive has proven to be an excellent way to introduce new features seamlessly! If a router does not recognize it, it just passes it along to its peers. Therefore, almost all novel attributes are optional + transitive 37

Early BGP path attributes Type code 1: ORIGIN (RFC4271) Type code 2: AS_PATH (RFC4271) Type code 3: NEXT_HOP (RFC4271) Type code 4: MULTI_EXIT_DISC (RFC4271) Type code 5: LOCAL_PREF (RFC4271) Type code 6: ATOMIC_AGGREGATE (RFC4271) Type code 7: AGGREGATOR (RFC4271) Type code 8: COMMUNITY (RFC1997) 38

ORIGIN Well-known mandatory Defines the origin of the path information Types: IGP (0) INCOMPLETE (2) NLRI is internal to the originating AS (eg learnt via IGP) NLRI is learned by some other means (eg static route) BGP prefers the path with the lowest origin type In Junos, for example, ORIGINs are 0 by default 39

ORIGIN example R1 export direct R2 export static R4 export IGP EBGP AS2 RTD R3 learnt via EBPG IBGP AS1 RTB R1 origin: INCOMPLETE R2 origin: INCOMPLETE R3 origin: (defined by origin) R4 origin: IGP R1-R4 are routes imported into AS1s BGP in different ways R1/R2: Direct / static R3: EBGP R4: IGP 40

AS_PATH Well-known mandatory Contains a sequence of AS path segments <path segment type, path segment length, path segment value> AS_SET (1): unordered set of ASes a route traversed AS_SEQUENCE (2): ordered set of ASes a route traversed A BGP speaker prepends its ASN to the AS_PATH list when sending routes to external BGP peers (not to internal peers) Loop detection Shorter AS_PATH is preferred 41

AS_PATH (cont.) AS2 AS1 192.16.1.0/24 X AS3 AS4 The AS-PATH is used to break loops (between AS:s) AS1 announces 192.16.1.0/24 to AS2 and detects its own ASN when received from AS4 42

AS SET AS1 192.16.2.0/24 192.16.2.0/24 AS Path: 1 192.16.2.0/23 AS Path: 3 {1 2} 192.16.3.0/24 AS Path: 2 AS3 AS4 AS2 192.16.3.0/24 As alternative to sequences of AS:s, a set denotes a set of AS:s. Useful in aggregation at AS-level Necessary to detect loops 43

AS_PATH Manipulation AS_PATH often manipulated to affect routing behavior you can effect AS:s far away, not just your neighbor The AS_PATH can be lengthened to make a path less preferable This affects all ASes that receives this prefix update Unlike the MED that only can affect how a neighboring AS sends traffic to you Affects how incoming traffic is routed Is achieved by prepending dummy ASNs to the AS_PATH 44

Routing case before Manipulation AS10 10G AS20 10G AS30 192.16.0.0/24 20 10 192.16.0.0/24-10 10G 192.16.0.0/24 30 20 10 IX 192.16.0.0/24 1G AS40 10G 192.16.0.0/24 40 10 AS10 has two links to the rest of the world 10Gbit/s and 1Gbit/s In this case, traffic from the Internet Exchange IX will follow shortest AS_PATH Traffic will thus use the incoming 1G link How can AS10 steer traffic from the IX to take the 10G link instead? 45

Routing case after manipulation 10G AS20 10G AS30 192.16.0.0/24 30 20 10 192.16.0.0/24 20 10 10G AS10 192.16.0.0/24-10 IX 192.16.0.0/24 1G AS40 10G 192.16.0.0/24 10 10 10 192.16.0.0/24 40 10 10 10 To force incoming traffic to go through the 10G link, AS10 manipulates the AS_PATH Insert dummy AS numbers when sending UPDATEs to AS40 Make the AS_PATH over the 10G link become the shorter one Best practice: bogus AS number should be duplicate of own Otherwise, the number can be misleading and cause routing loops 46

NEXT_HOP Well-known mandatory Defined IP address of the router that should be used as the next hop to the destinations listed in NLRI Some differences between EBGP and IBGP EBGP nexthop typically between directly connected addresses IBGP often between loopback addresses 47

NEXT_HOP Well-known mandatory Defined IP address of the router that should be used as the next hop to the destinations listed in NLRI Next-hop concept for BGP External peer: IP address of the external peer that announced the route Internal peer: Locally originated routes: IP address of the peer that originated the route Routes learned from external: IP address of the external peer from which the route was learned Route on multi-access medium: IP address of interface of the router connected to the medium that originated the route More about this in later slides 48

NEXT_HOP on Multiaccess Media When advertising route on a multi-access media, the next hop can be an IP address of the interface of the router connected to the medium that originated the route This is called third-party next-hop 10.0.0.0/24 EBGP.1.2.3 OSPF 11.0.0.0/24 via 10.0.0.3 11.0.0.0/24 49

MULTI-EXIT-DISCRIMINATOR (MED) Optional non-transitive Also called 'metric' Used on external links to discriminate among multiple links to the same neighboring AS Lower MED is preferred MED received from external peer must not be propagated to other neighboring AS:s 50

MULTI-EXIT-DISCRIMINATOR AS1 RTA RTB Link A MED=70 Link B MED=120 RTD AS2 RTC RTE 192.16.0.0/24 AS2 prefers to receive traffic to 192.16.0.0/24 on link A and use link B as backup RTC announces the prefix with low MED, RTE announces the prefix with high MED. 51

MULTI-EXIT-DISCRIMINATOR (cont.) AS1 RTA RTB 192.16.0.0/24 MED=70 192.16.1.0/24 MED=120 Link A Link B 192.16.0.0/24 MED=120 192.16.1.0/24 MED=70 RTD AS2 RTC 192.16.0.0/24 192.16.1.0/24 RTE AS2 wishes to load-balance traffic using MED. Receive traffic to 192.16.0.0/24 on Link A Receive traffic to 192.16.1.0/24 on Link B Note the de-aggregation. AS1 should aggregate the two prefixes into one. 52

MED in many AS:s AS1 RTA MED=50 RTD AS3 MED=70 MED=120 RTE AS4 RTB AS2 RTC 192.16.0.0/24 AS1 will select RTB over RTC, but chooses between RTB and RTD using other means MEDs can not be compared from different ASs by default You can configure to compare MEDs between ASs 53

Using MED as tie-breaker* The use of MED as tie-breaker is controlled by several sub- settings CISCO for example, has a non-deterministic comparison by default based on age of the routes (newer routes are pairwise compared). cisco-non-deterministic parameter in JunOS. deterministic-med in CISCO You can also set always-use-med to use MED comparisons from different AS:s Can be useful if you are among a group of AS:s that trust each other. 54

LOCAL-PREF Well-known discretionary Used as local policy to set degree of preference of routes when announcing to other internal peers Used locally within the AS A higher local preference is preferred(!) 55

LOCAL_PREF (cont.) AS1 192.16.0.0/24 AS2 AS3 T1 Link T3 Link IBGP Set LOCAL_PREF = 200 RTB AS4 RTC Set LOCAL_PREF = 300 AS4 prefers to send traffic to 192.16.0.0/24 on the T3 Link. 56

Metrics: MED and LOCAL_PREF Multi-Exit Descriminator (MED) is announced to other AS:s Used by your neighbors to tell you how they want to receive traffic LOCAL_PREF announced internally Used by you to steer traffic internally (to your neighbors) LOCAL_PREF overrides MED Lower MED preferred Higher LOCAL_PREF preferred 57

ATOMIC_AGGREGATE Well-known discretionary Set to indicate information loss There may be longer prefixes to AS:s not in AS_PATH Alternative to using AS_SET Receiving BGP speaker must not de-aggregate the route BGP speaker that receives a route with this attribute needs to be aware of that actual path to destinations may not be the path specified in the AS_PATH attribute of the route Should not be set when the aggregate carries some extra information that indication from where the aggregated information came 58

BGP Decision Process If next hop inaccessible, ignore route Prefer highest local preference value Prefer shortest AS_PATH Prefer lowest origin type (IGP, EGP, INCOMPLETE) Prefer lowest MED value (if from same AS) Prefer routes from EBGP over IBGP Prefer routes with lowest IGP metric Nexthop Prefer route from peer with lowest router id Prefer route from peer with lowest address 59

Vendor specific tie-break CISCO has several own rules Weight (cisco-specific) Routes are compared by default pairwise in the order they arrived (non-deterministic). Example: entry1: AS(PATH) 500, med 150, external, routerid 172.16.13.1 entry2: AS(PATH) 100, med 200, external, routerid 1.1.1.1 entry3: AS(PATH) 500, med 100, internal, routerid 172.16.8.4 Entry3 is chosen. Why? Turn this off with bgp deterministic-med Juniper is somewhat more standard compliant 60

IBGP loopback peering AS2 RTB RTC eth0 lo eth1 RTE RTD IBGP peering is usually done with loopbacks. Why? More stable: Not tied to single physical path, if a link/interface goes down, another route may be chosen. But IBGP needs an IGP so that the loopbacks can be reached And TCP connections can be established 61

EBGP peering NEXTHOP is 192.168.200.2 How do I reach it? RTC AS1 IGP IBGP RTA.1 DMZ: 192.168.200.0/30 EBGP.2 RTB AS2 EBGP peering is typically made over a physical directly connected link Demilitarized zone (DMZ) that does not belong to either AS But if routes learned via EBGP uses the DMZ address as nexthop The DMZ must be redistributed via IGP! But the DMZ is not really part of the AS,... Although the DMZ is usually a part of one of the AS:s address ranges 62

NEXTHOP is 10.0.0.1 RTC AS1 IGP IBGP EBGP: Next-hop self lo: 10.0.0.1 RTA.1 NEXTHOP is 192.168.200.2 DMZ: 192.168.200.0/30 EBGP.2 RTB AS2 Alternative: Set next-hop-self Announce routes using the loopback address of the border router as nexthop DMZ does not need to be distributed within the AS But RTA still uses the directly connected DMZ address 63

EBGP nexthop: recursive lookup 12.0.0.0/30.1 RTC RTD IGP IBGP AS1 lo: 10.0.0.1 RTA.1 DMZ: 192.168.200.0/30 EBGP.2 RTB AS2 130.2.3.0/24 RTC:s routing table alternatives Route Nexthop Protocol 130.2.3.0/24 192.168.200.2 IBGP 192.168.200.0/30 12.0.0.1 IGP 12.0.0.0/30 - direct DMZ nexthop Route Nexthop Protocol 130.2.3.0/24 10.0.0.1 IBGP 10.0.0.1/32 12.0.0.1 IGP 12.0.0.0/30 - direct Next-hop self 64

Alternative EBGP peering: redundancy/load balancing AS1 RTA EBGP RTB AS2 Peer between loopback address Set equal cost to nexthop using IGP or static routes Load balance between links Or use one link as redundant link 65

Alternative EBGP peering: redundancy/load balancing AS1 AS2 Peer between loopback address Load balancing between several links/redundant link But next-hop may now be dependent on IGP/IBGP in other AS 66

How to transit traffic? AS1 AS2 RTA EBGP RTB RTC AS3 RTD RTE EBGP RTF How does RTC and RTD know how to forward transit traffic between RTA and RTF? You cannot use default routes. Why? You may use IGP to distribute external routes. But most common is to use IBGP to distribute external routes internally. 67

How to transit traffic using IGP (don't do it) AS1 IGP RTC AS2 RTA EBGP RTB IGP IBGP IGP IGP AS3 RTD RTE EBGP RTF You can inject BGP routes into your IGP as external routes Scales badly High memory consumption for the IGP and will take time to converge There is also a problem with synchronization between the IGP and EBGP Can you announce a route even though your IGP has not converged? 68

Synchronization between IGP and BGP AS1 192.16.124.0/24 IGP RTC AS2 Is 192.16.124.0/24 propagated in my AS? RTA 192.16.124.0/24 EBGP RTB IGP IBGP IGP RTD IGP RTE 192.16.124.0/24 EBGP AS3 RTF If an AS provides transit service to another AS, a BGP speaker should not advertise a route to external peer unless all routers within an AS learned about that route via IGP RTE checks that 192.16.124.0/24 is reachable via IGP before announcing it to RTF via BGP If you ignore this: no synchronization 69

Using IBGP: Full mesh AS1 RTC AS2 RTA EBGP RTB AS3 RTD RTE EBGP RTF With IBGP all transit routers know all external routes. Note that all transit routers need to speak IBGP (what happens if they dont?) But loop prevention in BGP is via AS_PATH There is no change in AS_PATH between internal peers! Loop prevention in IBGP: All IBGP speakers are fully meshed Never reannounce routes to an IBGP peer learned from another IBGP peer 70

IBGP full mesh So IBGP needs to be fully meshed in order for: All internal routes to receive all external routes Loop prevention (no difference in AS_PATH) The number of TCP connections in an AS: n*(n-1)/2 Not practical for large transit networks, but new routers are pushing the limit upwards due to higher route-processor performance Two ways to remove this scaling limitation (other lecture): Route reflectors (RFC 4456) AS confederations (RFC 5065) 71

Route advertisement rules BGP next-hop must be reachable Consequence: If IGP fails, BGP route is not announced Advertise only active BGP routes to peers Juniper specific Consequence: If same route from IGP is active, BGP route is not announced! Turn off in JunOS with: advertise inactive Never forward IBGP routes to IBGP peers (full mesh) 72

Private ASes Sometimes it is not necessary to have a public ASN IANA has reserved the ASN range 64512 65535 for internal use within a system Can be used for customers that are single-homed or multi-homed to the same provider Private ASNs must not be announced globally Providers must strip private ASNs before announcing the prefixes on to the rest of the Internet Purpose of private AS:s is to conserve AS numbers and hide networks 73

Private ASes, cont d 192.16.0.0/24-1 AS1 R2 R2 AS1 R3 AS7 192.16.0.0/24-65001 R1 AS65001 192.16.0.0/24 Prefixes originating from AS65001: AS_PATH of 65001 AS1 propagates the prefix but strips the private ASN 74

Extensions BGP is under constant development New operational problems and new technologies require extensions to the protocol Extensions are introduced, standardized, and implemented Implementation of extensions: Negotiated via BGP capabilities when peering is set up. Sent as optional transitive attributes and either recognized or not 75

Extensions example BGP extensions BGP communities attribute Route refresh capability BGP multipath Capabilities advertisement BGP route reflection Multi-protocol extensions Graceful restart Four-byte AS Autonomous system confederations TCP extension TCP MD5 signature option (RFC1997) (RFC2918) (RFC3107) (RFC3392) (RFC4456) (RFC4760) (RFC4724) (RFC4893) (RFC5065) (RFC2385) 76

Capabilities Advertisement An 'extension to negotiate extensions' Announce supported capabilities to the peer with OPEN message using options parameter Some capabilities are (~same as previous slide) Value Description Reference 0 Reserved RFC3392 1 Multiprotocol Extensions for BGP-4 RFC4760 2 Route Refresh Capability for BGP-4 RFC2918 3 Cooperative Route Filtering Capability 4 Multiple routes to a destination capability RFC3107 5-63 Unassigned 64 Graceful Restart Capability RFC4724 65 Support for 4-octet AS number capability RFC4893 66 Deprecated (2003-03-06) 67 Support for Dynamic Capability (capability specific) 68-127 Unassigned 128-255 Vendor Specific 77

The COMMUNITY attribute RFC 1997 defines a 4-byte COMMUNITY attribute as optional transitive A group of destinations that share some common property Used to simplify routing policies based on logical property rather than IP prefix or ASN Format First 2-bytes ASN, last 2-bytes defines a value (ASN:value) Example 5678:90 (0x162E005A) A route can have more than one community attribute BGP speaker can add and modify a community attribute before passing routes on to other peers 78

NO_EXPORT example AS1 192.16.0.0/24 192.16.0.0/24 NO_EXPORT 192.16.0.0/23 AS2 192.16.0.0/23 192.16.1.0/24 192.16.1.0/24 NO_EXPORT 192.16.0.0/23 NO_EXPORT is a well-known community (0xFFFF FF01) a route should not be advertised to peers outside an AS Other well-known: NO_ADVERTISE: a route should not be advertised to other BGP peers 79

Extended communities I T Type[Subtype] Data 1 byte 1 byte 6-7 bytes Extended communites are 8-bytes with more structure RFC 4360 Type 0: <ASN:2; Data:4> Type 1: <IPv4:4; Data:2> Type 2: <ASN:4; Data:2> Example: Used in VPNs to tag VPN-specific information 80

Use of Communities Communities are used extensively in modern networks for defining policies Both internally and between networks (if they have agreed) You tag a route with a community and use this information to implement a policy Tag backbone routes Tag routes you wish to advertise to peers Tag routes with VPN label Eg: SUNET uses communities to tag routes with academic and nonacademic sites See Practical BGP p 217 ff, for more examples 81

Community configuration example (1) Tagging a community at the edge (or by the other peer): community academic members 3244:11; policy-statement add-academic { route-filter 172.16.0.0/8 upto /16 { community add academic; next policy; } route-filter 192.168.0.0/8 upto /24 { community add academic; next policy; } } } 82

Community configuration example (2) Using the community to implement a policy: community academic members [3244:11]; # regexp policy-statement from-academic { from { protocol bgp; community academic; } then as-path-prepend 201;201 ; } 83

Multiprotocol extension for BGP-4 Support routing of other network layer protocols than IPv4 NLRI and NEXTHOP fields in the UPDATE message are IPv4 specific Use a generalized address form using: AFI - Address Family Identifier SAFI - Subsequent Address Family Identifier Two new attributes replace NLRI and NEXTHOP Multiprotocol Reachable NLRI (MP_REACH_NLRI) Contains NLRI and nexthop Multiprotocol Unreachable NLRI (MP_UNREACH_NLRI) Used in multicast, IPv6, VPNs, etc. 84

Multiprotocol extension for BGP-4 Examples IPv4 unicast: AFI=1, SAFI=1 IPv4 multicast: AFI=1, SAFI=2 L3VPN: AFI=1, SAFI=128 IPv6 unicast: AFI=2, SAFI=128 IPX: AFI=11 85

Route refresh capability By default, BGP has no mechanism to dynamically request for re-advertisement of routes from a peer If a route (its attributes) does not change, a BGP speaker does not re-announce it Therefore, a receiver needs to cache all previous routes Even if not required at a specific moment This places a lot of load (memory) on the receiver Suppose a router changes input policy and does not want to store all data from all neighbors just in case With the ROUTE-REFRESH message, a router can request to get the complete Adj-RIB-Out from a neighboring router 86

TCP MD5 Signature option Provides a mechanism for TCP to carry a digest message in each TCP segment using a shared secret MD5 message digest algorithm Verification of authenticy (no encryption) manually configured Protects tcp header, parts of the ip header (addresses) and the whole TCP payload (ie whole BGP message) Helps BGP protect itself from spoofed TCP segments, TCP SYN/RST, data injection, Problem MD5 algorithm has been found to be vulnerable to collision search attack Performance issue from calculating and comparing digests Few actually use it (less than 10% - G Huston 2009) Practical BGP: pages 333-336 87

Graceful restart When a BGP peering goes down (eg TCP connection is reset, closed, or Keepalive fires) a BGP peer stops forwarding to that peer Ie, if the control plane fails, it is seen as a sign that the data-plane fails The other peers switches over the routes to other peers Then when the router comes back up (BGP peering established) they may switch over to the restarted router again But most hw routers have separate forwarding and control planes So forwarding can (in principle) continue without the control plane At least for a limited period of time before forwarding data becomes stale Graceful restart: Tell your peers that: You are going down but you will be back, please continue forwarding to me until I am back. Practical BGP: pages 269-276 88

Route flap damping A route that changes (withdrawn, announce or attribute change) leads to a new UPDATE message. A route that changes too often leads to route flaps that can ripple through the Internet A route can be a bad link that goes up and down, or an instable routing state Some scenarios (clusters of routers) can even magnify flapping Only a small percentage of the Internet routes cause the majority of the flaps Flaps cause many UPDATE messages, that causes recomputation of FIBs that cause change in traffic. Practical BGP: pages 238-241 89

Route flap damping (cont) Idea is to use history of route to predict future Introduce a penalty every time a route changes (eg 1000) Decay penalty exponentially using a half-time (eg 15 minutes) Stop announcing route using two levels: Suppress/Cutoff-threshold to stop announcing (eg 3000) Reuse threshold to start announcing (eg 750) Note: If one specific route is suppressed, a less specific route can be used for traffic Route flap dampening is most effective if everyone uses it towards the edges. Why? But is usually installed towards upstream to protect from instable remote prefixes. 90

Route flap damping example Penalty (flaps) Stop announcing Suppress limit Reuse limit Start announcing Time 91

Four-byte AS numbers 2-byte ASNs are quickly running out 4-byte ASNs have been standardized re-using the AS_PATH and a migration technique using a special 2-byte ASN: 23456. The migration technique maps all 4-byte ASNs to 23456 when NEW speakers (that have four-byte AS capability) talk to OLD speakers (those that do not have 4-byte AS capability) 92

AS numbers in BGP Where does BGP carry AS numbers? In the UPDATE message (my ASN) In the AS_PATH attribute In the Aggregator attribute In Communities attributes A NEW speaker announces 4-byte AS as a capability The capabilty also includes myasn. NEW speakers use the AS_PATH attribute for 4-byte ASNs 93

New attributes Two new attributes (optional transitive) AS4_PATH AS4_AGGREGATOR Only used to tunnel 4-byte AS information over non 4- byte AS clouds. 94

NEW speaking with OLD NEW converts all 4-byte ASNs in AS_PATH to 23456 NEW creates the attribute AS4_PATH to tunnel the 4-byte AS-path to other NEW speakers. When NEW receives a route from OLD with an AS4_PATH attribute, it constructs a new AS_PATH replacing all 23456 with the corresponding 4-byte AS:s in AS4_PATH. 95

AS70000 Example AS50 AS100 NEW OLD NEW AS_PATH: 532 90000 AS_PATH: 23456 AS_PATH: 50 532 23456 AS4_PATH: 70000 70000 532 90000 532 90000 How can loop detection work? Because you only make loop detection concerning your own AS. And that is never AS_TRANS. But AS_PATH looks strange: 23456 may appear many times 96

Neighbours JunOS Routing model Neighbours Import RIB Export Protocols FIB Protocols Note: Export policies may be applied only to active routes! Protocol Default import action Default export action direct and static accept all N/A RIP accept all RIP routes reject all BGP accept all BGP routes export all active BGP routes IS-IS accept all IS-IS routes reject all (IS-IS uses LSAs) OSPF accept all OSPF routes reject all (OSPF uses LSAs) MPLS accept all MPLS routes export all active MPLS routes 97

BGP Routing Process Model Peer Peer Peer import policy decision process RIB export policy Peer Pool of routes received from peers Import policy for filtering and attribute manipulation Decision process to select best routes Pool of routes used by router Export policy for filtering and attribute manipulation Pool of routes that the router advertises 98

BGP Routing Information Bases (BGP RIBs) CISCO version Input Policy Engine Output Policy Engine Adj-RIB-In Adj-RIB-Out Adj-RIB-In Adj-RIB-Out BGP decision process Loc-RIB Adj-RIB-In Adj-RIB-Out Adj-RIB-In Adj-RIB-Out 2001 Cisco Press 99

BGP RIBs BGP routing table consists of three parts Adj-RIB-In One per peer BGP speaker Stores routing information learned from peer Filtered/manipulated input policy engine Loc-RIB Selected best routes by decision process to each available destination Adj-RIB-Out One per peer BGP speaker Stores routing information selected for advertisement to peer Output policy applied to Loc-RIB before going into Adj-RIB-Out This is redistributed if REFRESH capability is used 100

Import/Export Policy Import policy Affects routes received from peer BGP speakers Filtering based on IP prefixes, AS_PATH and other BGP attributes Manipulates path attributes to influence its own decision process Export policy Affects routes in Loc-RIB (candidates for advertisement) In JunoS: only active BGP routes Differentiates between internal and external peers 101

AS1 10.0.0.0/24 0/0 Use 10.0.0.0/24 from AS1 Use 0/0 and 10.2.0.0/24 from AS2 10.0.0.0/24 AS3 AS2 10.0.0.0/24 10.2.0.0/24 0/0 import policy decision process RIB export policy 10.0.0.0/24 10.2.0.0/24 AS4 Deny 0/0 from AS1 Give 10.0.0.0/24 from AS1 better pref Do not propagate 0/0 Do not announce 10.2.0.0/24 to AS3 Give 10.0.0.0/24 metric 10 toward AS4 0/0 AS2 BGP 10.0.0.0/24 AS1 BGP 10.2.0.0/24 AS2 BGP BGP example policies 102

Configuring BGP in JunOS protocol bgp { mtu-discovery Global properties group external-peers { type external; Group properties peer-as 42; neighbor 192.168.200.13; neighbor 192.168.200.14; neighbor 192.168.200.14{ peer-as 93; Peer properties } } } Many configurations can be made on global, group and peer level. More specific is preferred (peer before group before global) See: http://www.juniper.net/techpubs/software/junos/junos93 103

Policy-options statements # set policy-options? as-path name reg-exp Create a named AS-PATH regular expression Example: as-path asp0 65000{4} as-path-group { [as-path] } community name members [ ids ] Example: community c0 members 701:555 damping name [options] policy-statement prefix-list name { ip-addresses } Create a named list of prefixes Example: prefix-list p0 {10.0.0.1; 192.168.1.0/24;} 104

Policy-statement matches # set policy-options policy-statement <name> term <name> from? as-path community family local-preference metric neighbor next-hop origin preference prefix-list protocol route-filter... 105

Policy-statement actions # set policy-options policy-statement <name> term <name> then? accept reject next policy next term trace Side-effects with accept: as-path-prepend community color external load-balance per-packet local-preference metric next-hop origin preference 106

BGP lab Tier1 AS65000.62 192.71.24.32/27 RTX1 1/0/0 0/0/1 1/0/1 1/0/1 1/0/0 RTX2 RTX4 2/0/0 2/0/0 1/0/0 1/0/1 AS6500X-1 RTX3 1/0/1 1/0/0 AS6500X AS6500X+1 107