Network Level Multihoming and BGP Challenges

Network Level Multihoming and BGP Challenges Li Jia Helsinki University of Technology jili@cc.hut.fi Abstract Multihoming has been traditionally employed by enterprises and ISPs to improve network connectivity. Recently, there are increasing interests in other benefits derived from multihoming. In particular, it can be applied to improve network performance, lowering bandwidth costs, and optimizing the way in which upstream links are used. Multihoming can be applied in link layer, network layer or transport layer in the network protocol stack. This paper first presents an overview of multihoming in the network layer. The focus is put on available deployments of multihoming, namely, BGP and NAT. Second, a few challenges of BGP and corresponding proposals to solve these challenges are listed. The aim is to put things in perspective, point out why the challenges are so difficult to solve at present and summarize the main lessons learned. KEYWORDS: Multihoming, BGP, NAT 1 Introduction The current Internet is a decentralized collection of networks. Each of these networks is typically known as an autonomous system (AS). Usually, an AS is under a common routing policy and managed by a single technical administration. When an AS has multiple connections to the Internet, it can be referred to as multihomed. There are lots of motivations to maintain multiple connections to the Internet: Reliability: Compared with networks that have only one connection to the Internet, a multihomed network is usually used to ensure continued operation when one connection fails. Bandwidth: Multihoming has a potential to aggregate bandwidth by providing multiple paths between source and destination pairs. Thus, it allows a network to support higher data transfer rates than what is possible with single path. Sometimes, a source might use a high-bandwidth but expensive link for its real-time traffic, and use a cheaper link for the rest of its traffic. In this case, it is useful to use multihoming technology to improve the network performance. Independence: The independence of economic, political and administrative perspectives is becoming an increasingly common requirement for enterprises and institutions. Multihoming brings some degree of provider independence. It helps to achieve better service level agreements, or get lower prices. Policy: Sometimes the traffic is based on policies beyond technical considerations. For example, an academic institution might direct the commercial traffic to the provider offering global Internet connectivity, while directing its research traffic through a national research network. This paper presents a survey of protocols and algorithms that have been proposed for multihoming in IPv4. The purpose is to provide a better understanding for multihoming technology and current research in this area. The remainder of this paper is organized as follows: Sec. 2 presents popular solutions to deploy multihoming, including BGP and NAT. Then, Sec. 3 presents some challenges raised by BGP and a few existing proposals. Sec. 4 summarizes this paper. 2 Available deployment solutions of multihoming A network can be classified as multiattached network and multihomed network depending on how many upstream Internet Service Providers(ISP) that the networks connect to(fig. 1). A multiattached network connects to one ISP with multiple connections. By contrast, a multihomed network connects to more than one ISP[11]. In this figure, stub networks contain hosts that produce or consume IP packets. That is to say, the stub networks do not carry IP packets that are not produced by or destined to their hosts. Figure 1: Multihoming: a) Multiattached network; b) Multihomed network Currently, there are two major solutions to deploy multihoming in a stub network - Border Gateway Protocol(BGP) and Network Address Translation(NAT) mechanism. This section introduces these two solutions and compares them. 2.1 BGP Multihoming Routers in an AS can use multiple interior gateway protocols, such as Intermediate System to Intermediate System (IS-IS) and Open Shortest Path First (OSPF), to exchange

routing information inside the AS[2]. On the other hand, routers use an exterior gateway protocol to route packets between ASes. BGP is an inter-autonomous System routing protocol [13]. It is used to exchange network reachability information with other Autonomous Systems(ASes) in TCP/IP networks. Based on the AS hop counts and the preference level, BGP chooses the shortest route. When the reachability information is learned by an AS from the exterior, it will be distributed within the AS so that every router in the AS could reach the routes advertised by the exterior. When reachability information is exchanged between two routers located in different ASes, the protocol is referred to as external BGP(eBGP). When reachability information is exchanged between routers inside the same AS, the protocol is referred to as internal BGP(iBGP). Next, details about address management, routing process and failure handling in multihoming networks and are discussed. In order to obtain multihoming using the BGP protocol, a stub network must have: 1. A minimum address space identified by a 24-bit address prefix or larger to deploy BGP multihoming. 2. An autonomous system number(asn). Each AS must have a unique ASN. Two schemes exist to allocate address space: provider independent address(pi address) and provider assigned address(pa address). A registry such as ARIN has demonstrated that a requirement for more than a /21 can request a minimum of a /20 of IP address space directly form ARIN. This type of IP address space is known as PI address. IP subnets (also known as routes, prefixes, net blocks) can be provided from an upstream ISP if the requirement for IP addresses is not sufficient. These subnets are commonly part of a larger block of address space that the upstream ISP has been assigned by ARIN. This type of IP address prefix is known as PA space. Different issues arise when BGP multihoming is deployed adopting different address schemes. PI addresses imply independence from a stub network s upstream providers. Due to the independence, routes with PI addresses can not be aggregated by the upstream ISPs. This leads to routing table overhead. In a scenario of a network employing PA addresses, one address management mechanism is to use only one address block assigned by one of its upstream ISPs, which is called the default address block. Other upstream ISPs maintain a specific routing table entry for the route associated with the default address block. This approach does not automatically maintain the back-up routes. Another mechanism is to logically separate the whole stub network into several subnetworks, each of which inherits a separate address prefix from the upstream ISP closest to it[11]. The problem here is the update of the routing table entries. Non-aggregated routes can be advertised across multihoming networks with PI address, and aggregated routes can be advertised with PA addresses[11]. For instance, in Fig. 2, AS65500 is a non-multihoming network and AS901 is a multihoming network. The route announcement of AS65500 "198.18.32.0/24 65500" is aggregated by its upstream ISP - AS101 because 198.18.32.0/24 is a sub-block of 198.18.0.0/16. By contrast, the route announcement of AS901 can not be aggregated by its upstream ISPs - AS101 and AS103 because AS901 s prefix is not included by that of AS101 and AS103. In Fig. 3, AS101 assigns a PA address block "198.18.1.0/19" to AS901. For outgoing traffic, AS901 sends an announcement "198.18.1.0/19 901". AS101 can combine the route of "198.18.1.0/19 901" with the one announced by AS65500 and then sends an aggregated route announcement "198.18.0.0/16 101:901". By contrast, AS103 will send "198.18.1.0/19 103:901" because it cannot aggregate the route announcement sent by AS901. For incoming traffic, routers forward the packets along the most specific route according to BGP protocol. The most specific route refers to one that has smaller address range. Here the most specific route is "198.18.1.0/19 103:901". Therefore, AS901 will receive all packets via AS103 unless links between AS901 and AS103 is not available. In this case, links between AS901 and AS101 will be used to forward traffic. Figure 2: Routing process of BGP using PI address Figure 3: Routing process of BGP using PA address: adopt one address block In Fig. 4, AS901 divides itself into two sub-networks and gets the address block 198.18.1.0/19 and 65.3.10.0/19 from the corresponding upstream ISPs respectively. Accordingly, the traffic for these two sub-networks is aggregated by the upstream ISPes - AS101 and AS103 respectively. The two sub-networks are treated by the upstream ASes as separate networks. That is to say, the upstream only accepts the outgoing traffic with a prefix that is advertised to this AS by the stub network. In this example, AS101 only accept traffic coming from 198.18.1.0/19. The problem is that if

Figure 4: Routing process of BGP using PA address: multiple sub-networks the connection between AS101 and the sub-network fails, the hosts in the sub-network become unreachable via interdomain routing. RFC2260[1] suggests two methods to handle failures for BGP multihoming with multiple PA-address prefixes. The first method is based on ebgp border router s advertisement mechanism. The ebgp border router only advertises the reachability of address prefixes to an upstream ISP, which assigns the prefixes in steady state. If the connection to this ISP is down, the ebgp border router advertises the reachability to other upstream ISPs. The second method for failure handling is via packet encapsulation. The ebgp router of a stub network can also exchange information with the provider ebgp routers that are connected to the stub network but do not directly connected to it. For example, assume that ebgp router A and B are in AS100. An ebgp router C belongs to another AS. B is directly connected to C but A is not directly connected to C. When a failure happens between B and C, C will encapsulate all the packets that should be sent to B with the IP prefix of A. Then, C sends the encapsulated packets via other connections of AS100 to A. After that, A decapsulates the received packets and routes them to the hosts inside AS100 (see[1] for details). In addition to the two methods mentioned above, a third choice is to put routes of both primary connections and back-up connections to the BGP routing tables. The routes of back-up connections are made longer via repeatedly prepending its AS number in the route. When the primary route is down, back-up routes can be used since they are available in the BGP routing tables. 2.2 NAT multihoming The basic function of NAT is to translate between the public Internet address and the internal local network address. It can be extended to implement multihoming[9]. Small networks that can not be multihomed with BGP can get multihomed with the help of NAT. In this case, the hosts in a NAT multihoming stub network share the network addresses. NAT can map address blocks assigned from each upstream ISP to the internal address space of network. The mapping is kept in a NAT router. When IP packet leaves the network, the NAT router will translate the private IP addresses into public addresses, which might belong to different ISPs. So, the network can be multihomed to several network service providers. If the NAT multihoming networks do not adopt BGP and are not involved in the inter-as routing process, NAT router can handle the failure with the pre-set traffic mapping mechanism. However, traffic loss might happen because the mapping mechanism can not be automatically updated after a failure. A second method is to use DNS server. In this scenario, a host in NAT network is bound with multiple IP addresses. If one ISP is not available, the IP address from another ISP is returned and traffic still happens. This method can reduce the traffic loss but can not avoid it. 2.3 Comparison of BGP and NAT BGP and NAT multihoming are different in at least three aspects: As a standard Internet inter-domain protocol, BGP provides the largest support for the upper level applications. By contrast, NAT does not guarantee the uniqueness of the IP address and does not support all the upper level applications. NAT multihoming avoids non-aggregation problem because in most cases the address blocks in a NAT network are assigned by an upstream ISP. This problem may exist in BGP multihoming. BGP is mainly used by large organizations. NAT is usually recommended for small size organizations which are not involved in route control. 3 Challenges associated with BGP The Internet has expanded largely in the past a few years. First, the number of ASes has increased enormously. Second, the number and diversity of applications supported in the Internet have increased rapidly as well. This tendency has placed pressure on BGP. As BGP provides information for controlling the traffic between ASes, it plays a critical role in Internet efficiency, reliability and security. However, BGP suffers from several vulnerabilities. This section analyzes these significant challenges faced by researchers in the BGP area today. 3.1 Scalability Each AS is allowed to choose its own administrative policy to decide the best route. When inter-as routing takes place, each AS advertises the routing information included in the BGP routing table to other ASes. An AS route announcement includes an IP prefix and a series of AS numbers. As mentioned earlier, the number of ASes has increased dramatically, which contribute to the routing table overhead. Another main reason for recent growth is that most stub ASes have chosen to increase their connectivity to the Internet for both resilience and load balancing reasons. To explain how multihoming affects BGP routing tables, let us consider the example in Fig. 5. Assume that AS 901 aims to achieve load balancing by originating two IP prefixes from upstream ASes. In order to load balance its inbound traffic, it chooses to advertise its prefixes so that:

Traffic targeting 65.3.10.0/19 should primarily be delivered through AS103 and AS101 is used as a backup path. Traffic targeting 198.18.1.0/19 should primarily be delivered through AS101 and AS103 is used as a backup path. Figure 5: Growth of BGP routing tables: lack of aggregation and load balancing AS901 prepends its own AS number in its BGP advertisements with the aim of identifying the specific prefixes. As mentioned earlier, the specific prefix implies the best route when the upstream ASes select routes. In this figure, AS101 and AS103 are configured differently. AS101 propagates the two BGP advertisements. AS103 sends an aggregate advertisement for 198.18.0.0/16, since which includes 198.18.1.0/19. As shown in Fig. 5, even though AS901 originates only two prefixes, AS198 receives four routes for three different prefixes. Thus, the size of the BGP routing table is increased at AS198, since it receives more than one route for the same prefix. Despite the prepending operation, all traffic from AS901 toward AS198 will be routed via AS101, because: The shortest path for 65.3.10.0/19 is via AS101. Therefore, the traffic for 65.3.10.0/19 will be sent via AS101. The BGP router always prefer more specific prefix to forward traffic. In this figure, 198.18.1.0/19 is more specific than 198.18.0.0/16. In this case, AS103 will stop aggregating AS901 s prefix. This non-aggregation causes AS 103 to advertise two prefixes to AS198. To reduce the routing table size, different routes with common characteristics can be aggregated into a single route. However, a multihomed network inherits multiple IP prefixes from different upstream ASes and thus its prefixes cannot be aggregated by all the ASes. In this example, prefix 198.18.1.0/19 belongs to AS 103, so this prefix cannot be aggregated by another ISP(AS101). Another reason for non-aggregation is that an AS may have to announce several prefixes due to address fragmentation, load balancing and failure to aggregate[4]. This example illustrates the nonaggregation caused by load balancing. Most ISPs filter the advertisements of long prefixes to cope with the routing table problem. For example, some ISPs do not allow advertising to the global Internet prefixes longer than /22. However, this strategy does not tackle the root of the problem but just works around it. Some efforts are being made to deal with this problem in IPv6. For IPv4, the problem is largely unsolved. SIMPLER[7] forces address prefix aggregation over the But it uses NAT to assign multiple ad- whole network. dresses. 3.2 Lack of Multipath Routing A BGP router can receive multiple advertisements for the same route from multiple upstream routers. For instance, in Fig. 5, the router in AS198 received two advertisements for the prefix 65.3.0.0/19. Thus, the router needs to run its BGP decision process to select the best path. BGP protocol selects only one best path. Accordingly, the BGP router advertises to its peers the best route to any given destination. This behavior causes at least two limitations. First, one best route conflicts with the concept of load balancing. In respond to this, some venders support multipath extensions in their BGP implementations. Second, given that a BGP router only advertises the best route, many alternative paths that could have been potentially used will be unknown[10]. This introduces problems to the current interdomain routing paradigm from the end-to-end quality of service(qos) and traffic engineering(te) viewpoints[3]. Efforts have been put to this issue so that a BGP router can advertise multiple routes for the same destination to its peers. However, this mechanism will make the existing problem of BGP multihoming more difficult to tackle. For example, multipath will increase the size of routing table dramatically, which in future impacts scalability issue. 3.3 Slow convergence Two BGP routers have to establish a BGP session to exchange reachability information. This session is supported by a TCP connection through which the routers exchange different types of messages: OPEN: to open a BGP session. UPDATE: to transfer reachability information. NOTIFICATION: to identify an error detected. The BGP session is shut down after this message is sent. KEEPALIVE: to verify that the peer is reachable. OPEN message can help to determine if the BGP session corresponds to an ibgp or ebgp. When a session starts, each peer will advertise its entire set of routes. Then only incremental updates and KEEPALIVE messages are exchanged. Convergence time - the time required to reroute packets when a failure happens, is an important performance metric for a routing protocol. A study[8] shows that the convergence time of BGP is rather slow. One important reason is that a single link failure can force BGP routers to exchange a large number of advertisements to explore for alternative paths toward the affected destinations. This problem is referred to as path exploration. Routers may exchange several advertisements concerning the same prefix in the process of BGP convergence. To avoid this problem, most BGP routers use a timer called minimum route advertisement interval. The default value of this timer is 30 seconds. This method prevents BGP routers from sending a new advertisement for the same prefix within 30 seconds. In this way, the number of BGP advertisements is reduced. However, it introduces another problem - delay. In

some cases, important BGP advertisements are unnecessarily delayed, which has important influence on the network performance. Some new proposals have been brought up to solve this problem. For example, BGP-RCN[12] reduces the number of BGP messages exchanged in the convergence by adding an identifier to each BGP message. This identifier indicates the root-cause of the BGP message. When a failure happens, distant routers can avoid selecting a path that is affected by the failure. However, this additional information is not built into BGP advertisements and is against the scalability of BGP. Another solution is the ghost-flushing[6]. This method improves the convergence by making the messages indicating bad news distributed quickly, while good news distributed slowly. However, it just tries to speed up the convergence of BGP instead of tackling the root of the problem, i.e., path exploration. 3.4 Lack of Qos Support Most of the studies of Qos were based on non-multihoming networks. BGP doesn t have built-in Qos capabilities since it was designed to exchange reachability information. Some applications, such as VoIP, require strong Qos to across interdomain[5]. New proposals have been put forward in recent years, but none has been appealing to be deployed in practice. One reason is that ISPs prefer over-provisioning their networks to manage Qos. More issues have to be considered before ISPs determine to use the Qos management mechanism. Such considerations include the monetary cost to deploy and maintain Qos and the possible new businesses that might be developed to tangible profit for ISPs etc. From the technical side, all the proposals referring to Qos have strong limitations at the interdomain level. 3.5 Optimizing route selection Route optimizing refers to distribute the traffic among a stub network s multiple connections to the Internet. Two aspects must be considered in order to select an optimizing route. First, the most qualified upstream provider must be chosen. Second, the traffic should be leveraged among multiple connections, which refers to load balance problem. Selection for inbound traffic is difficult. Mechanisms to implement load balance for outbound traffic are available, but no mechanism is available to implement load balance for inbound traffic without NAT. One limitation caused by NAT is that it does not support non-client/server applications since it initially was designed in the context of client/server environment. 4 Conclusion Multihoming can help enterprises meet their Internet performance, reliability, and redundancy goals. It also helps to reduce dependency on a single provider, giving them dramatically greater opportunities for bandwidth cost control and contract flexibility. Despite its promising role in future Internet connectivity solutions, multihoming still has many unsolved challenges. The purpose of this paper is to review the deployment solutions available for multihoming and discuss challenges faced by BGP multihoming. As an important interdomain routing protocol, BGP has several limitations. These limitations are becoming more and more noticeable in the last few years due to the explosive growth of the network. Current research concentrates on scalability, route selection, convergence, Qos etc. In addition to the technical factors, routing management and policies performed by different ISPs also contribute to these problems. Usually, ISPs are reluctant to introduce changes if there is no promising source of revenue. This increases the difficulties to tackle the existing problems associated with BGP. References [1] T. Bates and Y. Rekhter. Scalable Support for Multihomed Multi-provider Connectivity. Technical Report 2260, 1998. [2] S. H. cisco Systems. Bgp4 case studies/tutorial. 1995. [3] B. H. et al. Distance metrics in the internet. 2002. [4] GLBECOMM. On Characterizing BGP Routing Table Growth, January 2002. [5] IEEE. Challenges in Enabling Interprovider Service Quality in the Internet, June 2005. [6] IEEE INFOCOM. Improved BGP Convergence via Ghost Flushing. [7] IEEE INFOCOM. Practical Routing-Layer Support for Scalable Multihoming, 2005. [8] C. Labovitz, A. Ahuja, A. Bose, and F. Jahanian. Delayed internet routing convergence. 9(3):293 306, June 2001. [9] P. Morrissey. Route optimizers: Mapping out the best route. December 2003. http://www.networkcomputing.com/ showitem.jhtml?docid=1425f2. [10] Network,IEEE. Open Issues in Interdomain Routing: a Survey, November 2005. [11] Network,IEEE. A Survey of Multihoming Technology in Stub Networks: Current Research and Open Issues, May 2007. [12] D. Pei, M. Azuma, N. Nguyen, J. Chen, D. Massey, and L. Zhang. Bgp-rcn: Improving bgp convergence through root cause notification. Technical report, UCLA Computer Science Department, 2003. [13] Y. Rekhter, T. Li, and S. Hares. A border gateway protocol 4 (bgp-4). Technical report, The Internet Engineering Task Force, January 2006.