BGP overview BGP operations BGP messages BGP decision algorithm BGP states 1 BGP overview Currently in version 4. InterAS (or Interdomain) routing protocol for exchanging network reachability information among BGP routers. Uses TCP on port 179 to send routing messages. BGP is a distance vector protocol, but unlike in RIP, routing messages in BGP contain complete routes. Network administrators can specify routing policies. 2 1
BGP overview (cont.) BGP routers are also called BGP speakers 3 BGP operations Two BGP routers exchanging information on a connection are called peers. Initially, BGP peers exchange the entire BGP routing table. A BGP router retains the current version of the entire BGP routing tables of all of its peers for the duration of the connection. Subsequently, only incremental updates are sent as the routing tables change. Keepalive messages are sent periodically to ensure that the connection between the BGP peers is alive. Notification messages are sent in response to errors or special conditions. 4 2
BGP operations (cont.) A route is defined as a unit of information that pairs a destination with the attributes of a path to that destination. Routes are stored in the Routing Information Bases (RIBs). A RIB within a BGP router consists of three distinct parts: Adj-RIBs-In: contains unprocessed routing information that has been advertised to the local BGP router by its peers; Loc-RIB: contains the routes that have been selected by the local BGP router's Decision Process; Adj-RIBs-Out: organizes the routes for advertisement to specific peers by means of the local speaker s UPDATE messages. 5 ebgp and ibgp BGP can also be used within an AS. BGP connections inside an AS are called internal BGP (ibgp), and BGP connections between different Ass are called external BGP (ebgp). R1 R2 ebgp ibgp AS2 R3 ebgp R4 If an AS has multiple connections to other AS's, multiple BGP speakers are needed. All BGP speakers representing the same AS must give a consistent image of the AS to the outside. Hence ibgp AS1 AS4 The purpose of ibgp is to ensure that network reachability information is consistent among multiple BGP routers in the same AS. 6 3
BGP messages BGP header format Marker: authenticates incoming BGP messages or detects loss of synchronization between a pair of BGP peers Length: indicates the total length of the message in octets, including the BGP header Type: indicates the type of the message The BGP synchronization rule states that if an AS provides transit service to another AS, BGP should not advertise a route until all of the routers within the AS have learned about the route via an IGP. 0 16 24 31 Marker Length Type 7 OPEN message 0 8 16 24 31 Marker Length Type=OPEN Version My autonomous system Optional parameter length BGP identifier Optional parameters Hold time Purpose: first message sent after TCP connection is opened Version: the protocol version number of the message My autonomous system: The AS number of the sending router Hold time: the number of seconds between the transmission of successive KEEPALIVE messages BPG identifier: identifier of the sending BGP router (one interface IP addr.) Optional parameter: a list of optional parameters 8 4
KEEPALIVE message 0 8 16 24 31 Marker Length Type=KEEPALIVE If the hold time is zero, then KEEPALIVE messages will not be sent. 9 NOTIFICATION message 0 8 16 24 31 Marker Error subcode Length Type=NOTIFICATION Error code Data When a BGP speaker detects an error, it sends a Notification and then closes the TCP conncetion. Error code: the type of error condition Error subcode: specific information about the nature of the error Data: the reason for the notification. Examples: Open message error, Update message error (bad attribute), hold timer expired, etc. 10 5
UPDATE message BGP header Unfeasible routes length (2 octets) Withdrawn routes (variable) Total path attribute length (2 octets) Path attributes (variable) Network layer reachability information (variable) Length (1 octet) Length (1 octet) Prefix (variable) Prefix (variable) Attribute type Attribute length Attribute value Attribute type Attribute length Attribute value Length (1 octet) Prefix (variable) Length (1 octet) Prefix (variable) Unfeasible routes length: the total length of the withdrawn routes field in octets. Withdrawn routes: a list of IP address prefixes for the routes that need to be withdrawn from BGP routing tables. Total path attribute length: the total length of the Path Attributes field in octets. Path attributes: a variable length sequence of path attributes. NLRI (Network Layer Reachability Information): a list of IP prefixes. 11 Update message (cont.) Attribute type Attribute length Attribute value OT P E 0 Attribute type code Attribute flag (1 octet): O bit: attribute is optional (O=1), or well-known (required) (O=0). T bit: an optional attribute is transitive (T=1), or non-transitive (T=0). Well-known attributes are always transitive. P bit: the information in the optional transitive attribute is partial (P=1), or complete (P=0). E bit: the attribute length is two octets (E=1), or one octet (E=0). Four types of attributes Well-known mandatory recognized by all BGP speakers Well-known discretionary, optional transitive, optional non-transitive Paths with unrecongnized optional transitive attributes are passed on when a BGP speaker does not recognize the attribute. But unrecognized optional non-transitive attributes should be silently dropped. 12 6
Types of attributes Attribute type code: ORIGIN (type code 1): well-known mandatory defines the origin of the NLRI - well-known mandatory 0: IGP indicates that the NLRI is interior to the originating AS 1: EGP inidicates that the NLRI is learned through BGP 2: incomplete NLRI learned through some other means AS_PATH (type code 2): well-known mandatory lists the sequence of ASs that the route have traversed to reach the destination A BGP speaker propagating a route prepends its own AS to the AS_PATH list Used to detect loops NEXT_HOP (type code 3): well-known mandatory defines the IP address of the border router that should be used as the next hop to reach the destinations listed in the NLRI MULTI_EXIT_DISC (MED) (type code 4): Multi-Exit Discriminator - optional nontransitive inter-as-metric (hop count discriminates among multiple entry/exit points to a neighboring AS and gives a hint to the neighboring AS about the preferred path. makes no sense to compare a MED value by one AS with a MED used by another AS because metrics vary from AS to AS. 13 Types of attributes (cont.) Attribute type code: LOCAL_PREF ( type code 5): well-known discretionary informs other BGP routers within the same AS of its degree of preference for an advertised route only part of ibgp; not included in ebgp exchanges ATOMIC_AGGREGATE (type code 6): well-known discretionary a BGP speaker, when presented with a set of overlapping routes from one of its peers to reach a given NLRI, informs other BGP routers that it selected a less specific route without selecting a more specific one that is included in it. Ensures that certain aggregates are not deeaggregated. a route describing a smaller set of destinations (a longer prefix) is said to be more specific than a route describing a larger set of destinations (a shorted prefix) AGGREGATOR (type code 7): optional transitive specifies the last AS number that formed the aggregate route followed by the IP address of the BGP router that formed the aggregate route. advertises which AS and which BGP speaker within that AS performed the aggregation 14 7
Example R3 will assume AS2 wants it to use R4 to reach 10.1.1.0/24 because its MED is lower Reach 10.1.1.0/24 via 10.10.1.2 10.10.3.0/24 10.10.4.1 R2 R1 10.10.4.2 ibgp 10.10.1.3 AS1 R3 10.10.1.2 10.1.1.0/24 (with MED 100) 10.10.1.1 Routing table at R2 Reach 10.1.2.0/24 through 10.10.1.2 Reach 10.10.3.0/24 through 10.10.4.1 Routing table at R3 Reach 10.1.2.0/24 through 10.10.1.2 Reach 10.10.3.0/24 through 10.10.4.2 ebgp 10.1.1.0/24 (with MED 200) 10.10.3.0/24: CIDR (Classless Interdomain Routing) notation; 24 is the number of network mask bits so the network prefix here is 10.10.3 and mask is 255.255.255.0. 205.100.0.0/22 means the mask is 255.255.252.0 so prefix range runs from 205.100.0.0 to 205.100.3.0. R4 R5 10.1.2.0/24 AS2 NEXT_HOP: R4 advertises 10.1.2.0/24 to R3 (ebgp) with a next hop of 10.10.1.2 (IP address of BGP peer) R3 should advertise 10.1.2.0/24 using ibgp with a next hop of 10.10.1.2 (as in ebgp) reason is R3 is not an immediate neighbor of R1 or R2; R1 and R2 should update their routing table information for 10.1.2.0/24 with the next-hop to reach 10.10.1.2 based on their IGP information. 15 The BGP decision algorithm After BGP router receives updates about different destinations from peers, the protocol will have to decide which paths to choose in order to reach a specific destination. BGP will choose only a single path to reach a specific destination. The decision process is based on different attributes, such as next hop, local preference, the route origin, and so on. BGP will always propagate the best path to its neighbors. 16 8
How BGP selects a path to a destination BGP selects one path as the best path to a destination; places it in its routing table and propagates the path information to its neighbors (Cisco web page) 1. If path specifies a NextHop that is inaccessible drop the update. 2. Prefer the largest Weight (Weight is a Cisco-specific concept not in BGP: locally assigned number; prefer routes with higher weights) 3. If same weight prefer largest Prefer path with largest Local Preference. 4. If same Local Preference, prefer the route that was originated by BGP running on this router. 5. If no route was originated, prefer the shorter AS_path. 6. If all paths have the same AS_path length, prefer the lowest origin code (IGP<EGP<INCOMPLETE). 7. If origin codes are the same, prefer the path with the lowest MED. 8. If all paths have the same MED, prefer the External path over Internal. 9. If all paths are still the same, prefer the path through the closest IGP neighbor. 10. Prefer the route with the lowest IP address value as specified by the BGP router ID. 17 BGP finite state machine Idle state: In this state BGP refuses all incoming BGP connections. No resources are allocated to the peer. Connect state: In this state BGP is waiting for the transport protocol (TCP) connection to be completed. Active state: In this state BGP is trying to acquire a peer by initiating a transport protocol connection. When done, it sends an OPEN message. OpenSent state: In this state BGP waits for an OPEN message from its peer. OpenConfirm state: In this state BGP waits for a KEEPALIVE or NOTIFICATION message. Established state: In the Established state BGP can exchange UPDATE, NOTIFICATION, and KEEPALIVE messages with its peer. 18 9
References Section 8.7.3 of Communication Networks by A. Leon Garcia and I. Widjaja RFC 1771 (can be obtained from www.ietf.org) Using BGP for inter-domain routing http://www.cisco.com/univercd/cc/td/doc/cisintwk /ics/icsbgp4.htm BGP case studies http://www.cisco.com/warp/public/459/bgptoc.html 19 10