Measuring I-BGP Updates and Their Impact on Traffic



Similar documents
Impact of BGP Dynamics on Router CPU Utilization

Internet inter-as routing: BGP

Exterior Gateway Protocols (BGP)

Can Forwarding Loops Appear when Activating ibgp Multipath Load Sharing?

KT The Value Networking Company

Inter-domain Routing Basics. Border Gateway Protocol. Inter-domain Routing Basics. Inter-domain Routing Basics. Exterior routing protocols created to:

Detection and Analysis of Routing Loops in Packet Traces

BGP Routing Stability of Popular Destinations

Outline. EE 122: Interdomain Routing Protocol (BGP) BGP Routing. Internet is more complicated... Ion Stoica TAs: Junda Liu, DK Moon, David Zats

Network Level Multihoming and BGP Challenges

Quantifying the BGP routes diversity inside a tier-1 network

BGP Route Analysis and Management Systems

Border Gateway Protocol (BGP)

Internet inter-as routing: BGP

An Overview of Solutions to Avoid Persistent BGP Divergence

Multihoming and Multi-path Routing. CS 7260 Nick Feamster January

Interdomain Routing. Project Report

Using the Border Gateway Protocol for Interdomain Routing

Based on Computer Networking, 4 th Edition by Kurose and Ross

Internet Firewall CSIS Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS net15 1. Routers can implement packet filtering

Enhanced Multiple Routing Configurations For Fast IP Network Recovery From Multiple Failures

Module 7. Routing and Congestion Control. Version 2 CSE IIT, Kharagpur

Introduction to Routing

How To Understand Bg

3. Dataset size reduction. 4. BGP-4 patterns. Detection of inter-domain routing problems using BGP-4 protocol patterns P.A.

Border Gateway Protocol BGP4 (2)

BGP route propagation. Internet AS relationships, Routing policy on Internet paths. Example of commercial relationship. Transit vs.

Week 4 / Paper 1. Open issues in Interdomain Routing: a survey

Bell Aliant. Business Internet Border Gateway Protocol Policy and Features Guidelines

Network-Wide Prediction of BGP Routes

6.263 Data Communication Networks

Examination. IP routning på Internet och andra sammansatta nät, DD2491 IP routing in the Internet and other complex networks, DD2491

HP Networking BGP and MPLS technology training

Interdomain Routing. Outline

APNIC elearning: BGP Attributes

MPLS WAN Explorer. Enterprise Network Management Visibility through the MPLS VPN Cloud

Border Gateway Protocols

BGP Prefix Hijack: An Empirical Investigation of a Theoretical Effect Masters Project

Understanding BGP Next-hop Diversity

The ISP Column. An Introduction to BGP the Protocol

Understanding Large Internet Service Provider Backbone Networks

Outline. Internet Routing. Alleviating the Problem. DV Algorithm. Routing Information Protocol (RIP) Link State Routing. Routing algorithms

On Characterizing BGP Routing Table Growth Tian Bu, Lixin Gao, and Don Towsley University of Massachusetts, Amherst, MA 01003

Routing and traffic measurements in ISP networks

BGP FORGOTTEN BUT USEFUL FEATURES. Piotr Wojciechowski (CCIE #25543)

The Impacts of Link Failure on Routing Dynamics

Opnet Based simulation for route redistribution in EIGRP, BGP and OSPF network protocols

Transitioning to BGP. ISP Workshops. Last updated 24 April 2013

IK2205 Inter-domain Routing

CS551 External v.s. Internal BGP

B. Quoitin, S. Uhlig, C. Pelsser, L. Swinnen and O. Bonaventure

Administra0via. STP lab due Wednesday (in BE 301a!), 5/15 BGP quiz Thursday (remember required reading), 5/16

DD2491 p Load balancing BGP. Johan Nicklasson KTHNOC/NADA

How To Make A Network Plan Based On Bg, Qos, And Autonomous System (As)

ABSTRACT 1. INTRODUCTION 2. RELATED RESEARCH

Route Discovery Protocols

A Network Recovery Scheme for Node or Link Failures using Multiple Routing Configurations

BGP Convergence in much less than a second Clarence Filsfils - cf@cisco.com

Lecture 18: Border Gateway Protocol"

#41 D A N T E I N P R I N T. TEN-155 Multicast: MBGP and MSDP monitoring. Jan Novak Saverio Pangoli

Routing in Small Networks. Internet Routing Overview. Agenda. Routing in Large Networks

APNIC elearning: BGP Basics. Contact: erou03_v1.0

Network Performance Monitoring at Small Time Scales

BGP overview BGP operations BGP messages BGP decision algorithm BGP states

Validating the System Behavior of Large-Scale Networked Computers

Doing Don ts: Modifying BGP Attributes within an Autonomous System

Inter-domain Routing. Outline. Border Gateway Protocol

Understanding Virtual Router and Virtual Systems

B. Quoitin, S. Uhlig, C. Pelsser, L. Swinnen and O. Bonaventure

Measurement and Classification of Out-of-Sequence Packets in a Tier-1 IP Backbone

E : Internet Routing

basic BGP in Huawei CLI

Advanced BGP Policy. Advanced Topics

Border Gateway Protocol Best Practices

Traffic & Peering Analysis

Understanding and Optimizing BGP Peering Relationships with Advanced Route and Traffic Analytics

Increasing Path Diversity using Route Reflector

BGP Terminology, Concepts, and Operation. Chapter , Cisco Systems, Inc. All rights reserved. Cisco Public

BGP Attributes and Path Selection

Analyzing Capabilities of Commercial and Open-Source Routers to Implement Atomic BGP

Active measurements: networks. Prof. Anja Feldmann, Ph.D. Dr. Nikolaos Chatzis Georgios Smaragdakis, Ph.D.

Measuring the Shared Fate of IGP Engineering and Interdomain Traffic

CS 457 Lecture 19 Global Internet - BGP. Fall 2011

Demystifying BGP: By Jeffrey Papen Thursday, May 15th, 2003

Introduction. Impact of Link Failures on VoIP Performance. Outline. Introduction. Related Work. Outline

Towards a Next- Generation Inter-domain Routing Protocol. L. Subramanian, M. Caesar, C.T. Ee, M. Handley, Z. Mao, S. Shenker, and I.

Internet routing diversity for stub networks with a Map-and-Encap scheme

CLASSLESS INTER DOMAIN ROUTING - CIDR

Robust Load Balancing using Multi-Topology Routing

ECSE-6600: Internet Protocols Exam 2

The goal of this lecture is to explain how routing between different administrative domains

Internet Routing Protocols Lecture 04 BGP Continued

BGP Basics. BGP Uses TCP 179 ibgp - BGP Peers in the same AS ebgp - BGP Peers in different AS's Private BGP ASN. BGP Router Processes

BGP Router Startup Message Flow

DDoS Mitigation via Regional Cleaning Centers

Impact of link failures on VoIP performance

Dynamic Routing Protocols II OSPF. Distance Vector vs. Link State Routing

Chapter 1 INTRODUCTION

GregSowell.com. Mikrotik Routing

Transcription:

SPRINT ATL TECHNICAL REPORT TR2-ATL-5199 1 Measuring I-BGP Updates and Their Impact on Traffic Chen-Nee Chuah, Supratik Bhattacharyya, and Christophe Diot Abstract Snapshots of BGP tables and updates have been studied in the past to understand the convergence time and global routing stability at the protocol level. However, very little has been done to carefully analyze the causes behind these BGP updates and their impact on traffic. We bridge the gap by presenting a systematic approach for correlating Internal BGP(I-BGP) updates with packet traces in a large-scale operational backbone. The I-BGP updates were collected from a route reflector in the backbone while the packet traces were collected from several OC-12 links at a Point-of-Presence(POP). We observe that continuous I- BGP noise (1-25 updates/minute) is present at all times but there are periods of high churn affecting a significant portion of the BGP table. We focus on the subset of updates that can potentially affect the traffic trajectory (e.g., Nexthop change), and find that during churn periods, as few as 6% of these updates affect network prefixes that carry as much as 8% of the traffic. Preliminary analysis shows no discernible correlation between I-BGP churn and packet loss/reordering. We also discuss our initial effort to identify the possible sources of I-BGP updates including changes in network reachability, internal versus external BGP policies. I. INTRODUCTION The Border Gateway Protocol (BGP) [1] is the current de-facto inter-domain routing protocol used to exchange reachability information between thousands of autonomous systems (ASes). Many studies have been dedicated to examine BGP routing properties in terms of routing table growth [2] and protocol dynamics such as the convergence time after routing changes [3], [4]. In recent years, research has also begun to focus on understanding global routing instability [5], [6]. Towards this end, snapshots of BGP routing tables and real-time route update messages have been collected from External BGP (E-BGP) peering sessions at various Internet exchange points. Prominent efforts in this area include Oregon Route Views [7], RIPE-NCC [8] and Sprint IPMON [9]. Previous work on BGP churns [6] has identified periods of high instability in the global routing system, manifesting itself in very high volumes of BGP routing updates and affecting most of the prefixes in Internet-wide BGP tables. It is widely held that such BGP storms, coupled with the slow convergence behavior of BGP (around tens of minutes) can cause severe rippling instability across large portions of the Internet. These BGP storms may lead to loss of reachability and routing loops resulting in increased packet loss, reordering and delays. Another concern is that high volumes of BGP updates may overload router CPUs and cause router melt-downs thereby leading to outages and disruptions in traffic flow. However, little has been done to verify or quantify how BGP updates really impact Internet traffic. We attempt to bridge this gap by studying how BGP updates impact traffic forwarding in a large operational backbone. Our interest lies in examining the day-to-day behavior and impact of these updates, and not just exceptional BGP storm events as reported in [6]. Our contribution is three-fold. First, we do a systematic analysis of I-BGP updates collected from a BGP router reflector in our backbone over a number of months. We observe that there is continuous BGP noise (around 5-2 updates/minute) interspersed with high churn periods (9 updates/minute). By studying the path attribute changes reported in these updates, we are able to differentiate between updates that have different effects on the network, e.g., those that affect traffic trajectory within the backbone, those that cause loss of reachability, etc. In the second part of our work, we examine the effect of I-BGP churn on traffic entering our backbone. Our traffic data consists of day-long packet traces collected from a number of OC-3 and OC-12 links at a single Point-of- Presence. We find that a very small subset of the I-BGP updates (less than 1%) are for network prefixes that carry as much as 8% of the traffic. On further examination we find no discernible correlation between the volume of BGP updates during a given interval and packet loss/reordering during that same interval. We conclude that the impact of I-BGP churns in our backbone is mostly limited to the control plane (router CPU overload, protocol overhead, etc.) but does not disrupt packet forwarding in a significant way. Finally, we present some initial results on inferring the cause of I-BGP updates that we observe. Our main goal is to distinguish between two types of updates - ones that are triggered by external (i.e. outside our backbone) events such as BGP peering failure, policy change or traffic engineering in a remote AS, and ones that are triggered by in-

SPRINT ATL TECHNICAL REPORT TR2-ATL-5199 2 ternal events such as internal policy change, internal routing changes, etc. The rest of the paper is structured as follows. Section II describes the methodology we use to collect and analyze the routing data and traffic traces. Section III and IV summarize our observations. In Section V, we describe our initial effort to infer the possible causes for BGP updates. Section VI concludes our work and discusses future directions. II. DATA COLLECTION AND ANALYSIS In this section, we describe how we collect and analyze routing updates and traffic data. A. Routing Data We use the Python Routeing Toolkit (PyRT) 1 to collect the I-BGP messages from the backbone. PyRT includes a BGP listener that establishes a peering session with a BGP-router and receives updates from it. The listener is passive because it does not send any updates to its peer. In this work, the PyRT listener was installed on a Linux PC in one of the backbone POPs to collect updates from a BGP route reflector (RR) in the backbone. Our listener appears as a route-reflector client to this particular router. Each update received is prepended with a header in MRTD 2 format (extended to include time-stamp of micro-second granularity) and then dumped to a file. The results reported in this paper is based on continuous data collected between November 21 and April 22. For the sake of comparison, we used separate instantiations of PyRT listener to also collect External BGP (E-BGP) updates from two other backbone routers in our network during the same period. In contrast to previous work that uses data from External-BGP (E-BGP) sessions, we focus on analyzing I-BGP data in this work for two reasons. First, our goal is to assess the effect of route announcements on the traffic in our own AS and not global routing behavior. In our AS, I-BGP is the protocol used to disseminate inter-domain routing information learned from external peers to all the routers in a hierarchical fashion with route reflectors [11]. Second, from our experience of monitoring both I-BGP and E-BGP sessions, we found that the former tends to report two or three times more updates because a subset of route announcements coming from providers, peers and customers are filtered out or aggregated before they are exported to a neighboring AS via an E-BGP session. In general, I-BGP is richer than E-BGP in terms of the types http://www.sprintlabs.com/department/ IP-Interworking/Routing/PyRT/ http://www.mrtd.net/mrt_doc/ of path attributes seen, and reveals more about internal network policy. To our knowledge, this is the first study of its kind that examines I-BGP within an operational network, even though E-BGP updates (e.g., from Routeviews [7]) have been analyzed before. B. Traffic Traces To study the effect of I-BGP updates on packet forwarding across our AS, we consider traffic entering our backbone at the POP where we collect our I-BGP data. The packet traces examined in this paper were collected as part of the Sprint IPMON [9] project. We consider six traces from three OC-12 access links collected on two separate days November 8, 21 and February 3, 22. Each trace contains 1-15 hours of data, consisting of the first 44 bytes of every packet time-stamped with a GPS clock. Analysis of the packet traces to determine packet loss and reordering was done in the following manner. Since we do not have end-to-end flow measurements, we rely on hints such as TCP sequence number to examine individual flow performance. Considering the fact that TCP flows consistently constitute 95-97% of the traffic across all six traces, our approach still gives a representative picture of performance. We identify the occurrences of outof-sequence (OOS) packets for TCP flows by locating gaps of sequence numbers for a particular flow. Observations of OOS events can either be attributed to losses and retransmissions or packet re-ordering. The frequency of OOS events give us an upper bound on the loss rate seen by an individual flow. We compute the overall loss rate across time, and correlate that with the I-BGP update rate for flow [1]. III. ANALYSIS OF I-BGP UPDATES We begin with some some general observations about I- BGP updates. Figure 1a shows the volume I-BGP updates (aggregated over 1-minute intervals) across a representative week of January 13-2, 22. A relatively low number of I-BGP updates (around 1-25/minute) are present at all times but there are periods of high churn. A typical routing table has around 13K entries. During the normal noisy period, as few as 2.2% of these prefixes is affected, but during high churn period more than a quarter of the routing table is affected (Figure 1). There is no general time-of-day trend as reported in [5] such as peaks during the afternoon. However, analysis of the I-BGP data from November 21 - April 22 show a strong weekly trend: temporally localized spikes around midnights (e.g., Sunday and Thursday) that coincide with the routine maintenance schedule of several network operators. This plot also shows non-periodic feature: an order of magnitude

SPRINT ATL TECHNICAL REPORT TR2-ATL-5199 3 Churns/minute Churns/minute 1 4 1 3 1 2 Jan 13 : Jan 2 :, 22 (PST) I BGP session 1 1 Sun Mon Tue Wed Thu Fri Sat Sun 1 4 1 3 1 2 (a) E BGP session 1 1 Sun Mon Tue Wed Thu Fri Sat Sun (b) Fig. 1. I-BGP vs. E-BGP updates over a representative week: January 13-2, 22. Contribution to Churns (%) 1 9 8 7 6 5 4 3 2 1 BGP Background Noise BGP High Churn Period 1 4 1 3 1 2 1 1 1 1 1 1 2 Percentage of Routing Table 2.2 % 25.8% Fig. 2. Portion of routing table responsible for churns. higher BGP churn rate (up to 9/minute) on January 16 (Wednesday). For comparison, we also plot the E-BGP updates during the same period (Figure 1b). On average, I- BGP (mean=215/minute) is twice as noisy than E-BGP (mean=133/minute). This is not surprising since the common practice dictates that routes learned from a provider or peer are not re-advertised to other peers or providers. In addition, more specific customer routes that belong to the same address space are aggregated in a single route announcement to external BGP peers. We also notice that several spikes (e.g., mid Monday, Wednesday and Friday morning) observed in the I-BGP session are absent in the E-BGP counterpart. This implies that the instability is either due to non-transit routes learned from a peer or provider, or IGP that only affects the intra-domain paths to different exit points, Nexthop changes). We need to look at the associated path attributes to differentiate the effect on traffic in these cases (Section V). There are three ways in which I-BGP updates can potentially affect traffic: 1. High volume of updates (churn) may result in extra load on router CPU/memory, thereby disrupting packet forwarding. 2. Route to a specific network prefix may be withdrawn resulting in loss of reachability 3. Traffic trajectory across the network may change due to changes in the Nexthop (i.e. exit point announced for a given network prefix). The first and second cases may cause packet loss and increased delays, while the third case may also result in packet reordering. The extent to which router CPU/memory overload disrupts packet forwarding can be understood only through a deeper analysis of CPU load levels and memory usage statistics. We do not address this issue in this paper. Instead, we focus on the second and third cases above. In order to do so, we classify the updates as follows: Explicit withdrawal: This results in loss of reachability to a specific sub-net and may lead to packet losses. Nexthop change: The Nexthop attribute is usually the loop-back address of the last BGP router (egress point) of a local AS or the adjacent E-BGP peer that announces the path. When the Nexthop of a prefix changes, traffic entering our network may follow a different path to a new exit point, giving rise to variations in link loads and potential packet reordering or losses. Others : Updates that are not of the above two types. Note that changes in ASPath 3 may cause changes in traffic trajectory across inter-domain paths, which we have no control over. We will address the implications of AS- Path and other path attributes in Section V, where we analyze the causes behind I-BGP updates. We classify the I-BGP updates across four weeks in January 22, and find that between 2-45% of the updates carry Nexthop changes, and 35-63% involve explicit route withdrawals. Among the explicitly withdrawn routes, we found that 86.7% of them are withdrawn for less than 1 hour, but a small number of routes (1.4%) remain absent for longer than a day. The average duration of unreachability for a network prefix is 3 minutes. For one instance, the prefix is absent from the routing table for 136 hours. To evaluate how these updates impact traffic forwarding and performance in our network, we need to determine whether these updates affect destination prefixes that carry the majority of the traffic. We address this issue in Section IV-A and IV-C. So far, we have discussed the general patterns of I- BGP churns and identified a subset of updates (Nexthop changes/explicit withdrawals) that can potentially affect The ASPath attribute contains a list of ASes traversed by a destination prefix.

SPRINT ATL TECHNICAL REPORT TR2-ATL-5199 4 traffic trajectory or network reachability. We will shed some insight into the possible causes behind different types of updates in Section V. IV. IMPACT OF I-BGP CHURNS ON TRAFFIC In this section, we examine what fraction of the I-BGP churns actually affect the traffic coming into our AS. From our trace analysis, we have observed that the majority of the traffic is routed to a relatively small number of destination prefixes, around 2%, out of 13K routing entries in the global BGP table. Hence, we focus on analyzing the effect of Nexthop changes and explicit withdrawals on these prefixes. To do so, we first classify all input traffic by BGP prefixes, and identify those elephant prefixes whose aggregate contribution accounts for 8% of the traffic volumes. For example, 2168 such elephant prefixes are identified using this method on the November 8, 21 trace. Then, we filter out I-BGP updates that advertise Nexthop changes for these elephant prefixes. We have examined a total of six traces, but only present results from two representative traces. A. Traffic Re-routing Figure 3 compares the number of raw I-BGP updates and the subset that affect elephant prefixes (sampled at 1 minute intervals) across 6 hours on November 8, 21. At most, only 6% of the total updates affect the elephant prefixes carrying as much as 8% of the traffic. Another observation is that a huge BGP spike does not always lead to traffic re-routing. Among the elephant prefixes, only 5% are volatile, i.e., sees a new BGP route announcement during the entire day on November 8, 21. We order the prefixes according to the total number of updates they see, and plot the total raw updates and Nexthop change for each prefix in Figure 4. The area between the solid line and the bars gives show the amount of BGP updates that have no impact on the forwarding path of the traffic. B. Traffic Performance One question we would like to address is the speculation that high I-BGP churns lead to a router meltdown and impairs its ability to forward traffic. From traffic traces, we determine the upper-bound for packet loss,, experienced by TCP-flows as described in Section II-B. In Figure 5, we plot the total churns seen by the router and seen by the traffic coming from a peer and a customer during a 1-hour monitoring period on November 8, 21. There is no apparent correlation between the sudden surge of BGP updates and the packet loss/re-ordering. Similar results are found from analyzing data on February 3, 22 BGP churn/minute Nexthop Change/minute 4 3 2 1 Nov 8 6: 12:, 21 (PST) Total observed churn rate 6: 7:3 9: 1:3 12: (a) 2 15 1 5 Affecting elephant prefixes 6: 7:3 9: 1:3 12: (b) Fig. 3. Subset of I-BGP route that contains Nexthop changes for elephant prefixes. BGP Updates & Nexthop Changes 25 2 15 1 5 BGP updates Nexthop Changes 2 4 6 8 1 12 14 Index of elephant prefix with decreasing churn rate Fig. 4. Nexthop changes for volatile elephant prefixes on November 8, 21. (Figure 6). The cross-correlation between the BGP churn rate and is non-significant: between.6 and.8. C. Loss of reachability When a route is explicitly withdrawn, the connectivity to a network prefix is lost and the associated flows can potentially be black-holed due to unresolved routes or routing loops. On the day of November 8, 21, 13 of the elephant prefixes are explicitly withdrawn at some point. We examine how long these prefix is missing from the routing table, and plot the distribution in Figure 7. The maximum and mean duration of unreachability are 91 and 2 minutes, respectively. On the same figure, we also plot the distribution for all the prefixes that are withdrawn at least once. One of the prefix is unreachable for almost 2 hours but it carries a negligible amount of traffic. The mean duration averaged over all the affected prefixes is 124 minutes. Further analysis shows that the 13 elephant prefixes carry 2.5% of the total traffic, but contribute to 5.8% of the total out-of-sequence packets observed. The loss rate,

SPRINT ATL TECHNICAL REPORT TR2-ATL-5199 5 1 BGP churn/minute Packet loss/reordering 4 3 2 1 1.6% 1.4% 1.2% 1.%.8% mean = 29/minute 6: 7:3 9: 1:3 12: Nov 8, 21 (PST) mean = 1.3%.5% 6: 7:3 9: 1:3 12: Nov 8, 21 (PST) CDF.9.8.7.6.5.4.3.2.1 All prefixes explicitly withdrawn Elephant prefixes explicitly withdrawn 2 4 6 8 1 12 14 16 18 2 Duration (hours) BGP churn/minute Packet loss/reordering 2 15 1 5 1.4% 1.2% 1.%.8%.6% Fig. 5. I-BGP churns and packet loss mean = 158/minute 12: 13: 14: 15: 16: mean =.74% Feb 3, 22 (PST).4% 12: 13: 14: 15: 16: Feb 3, 22 (PST) Fig. 6. I-BGP churns and packet loss, suffered by these elephant flows range is 3.4%, which is significantly higher than the average seen by other traffic (1.43%). D. Summary Our results show that most of the I-BGP updates in our backbone only induce additional load to the control plane (e.g., router CPU/memory load, protocol convergence), but has minimal impact on traffic forwarding. However, our conclusions are drawn from analyzing the typical dayto-day I-BGP behavior observed in an ISP, which does not include global routing instability event such as the BGP storms during the Nimda Worm attack[6]. In this this specific scenario, a few big network start flapping and generate high volumes of BGP updates, which eventually lead to router melt-downs. We would expect to see traffic disruption during such a global BGP storm. V. INFERENCING CAUSES OF I-BGP UPDATES We need a good understanding of the route selection process to infer the causes of specific route changes. Router vendors adhere to a de facto standard [12] to select the best path, which depends on various path attribute values (e.g., Local Preference, AS Path, MED), BGP im- Fig. 7. Duration of unreachability (November 8, 21) TABLE I CLASSIFICATION OF I-BGP UPDATES. Event Nexthop Nexthop & Withdrawals change ASPath Background noise 34.1% 21.2% 25.6% January 13 22, Peak 21.8% 7.5% 18.3% January 16 22, Peak 31.2% 15.3% 21.2% port/export policies and IGP metrics. Hence, analyzing the changes of these attributes (besides Nexthop or AS- Path changes) provide clues as to what may have caused the updates in the first place. For example, the changes of LocPref is due to local BGP policies while MED values reflect load balancing practiced by neighboring ASes. Our main goal is to distinguish between two types of updates: those triggered by external versus internal events. Internal events include changes in intra-domain topology (link or node failures), IGP metrics (IS-IS/OSPF weights) and BGP import/export rules. The first two events usually lead to a Nexthop change while the policy change is reflected in LocPref or community attribute values. External events include changes in inter-domain topology (failures in peering links), BGP policies of external ASes, and traffic engineering. The associated path attribute that may change are ASPATH, MED values and sometimes, Nexthop. Table I summarizes the classification of I-BGP updates for three specific one-hour windows: a typical background noise period, I-BGP peaks on January 13 (Sun) and 16 (Wed) as observed in Figure 1. The break-downs are very different across the three cases. In all three cases, 21.8-34.1% of the updates contain a Nexthop change. However the majority of the Nexthop changes (18.2-21.2%) are associated with ASPath changes that arise from external events. Further analysis reveals that most of the route announcements during the background noise period originate from customer routers at poorly managed edge networks. Next, we analyze the correlation between internal topol-

SPRINT ATL TECHNICAL REPORT TR2-ATL-5199 6 ogy change and high I-BGP churns (or spikes). A. Failure Induced I-BGP spikes We focus on understanding what causes the spike on January 16, 22 (Wednesday) in the following discussion. A significant fraction (18-38%) of the updates observed during the spike do not have contain changes in Nexthop or ASPath. In addition, the majority of these updates contain routes that are explicitly withdrawn and then re-announced as reachable. Reference [5] speculates that such event (known as WAdup) reflect transient topological link or router failure. An adjacent link failure can interfere with the I-BGP sessions seen by a local router in the following two manners. First, it causes I-BGP session resets between the local router and its I-BGP peers that are connected via the failed link, causing all the routes learned from the these peers to be withdrawn. Secondly, a link failure results in unresolved routes if the link lies in the shortest paths between the local router and the Nexthops of its routing entries. The affected network prefixes will be withdrawn and later re-announced when a backup path to the same Nexthops are found after IGP re-convergence. We correlate the WAdup events observed during the high churn period of January 16, 22 (Figure 1 with the IGP routing data 4 and confirm that there are indeed link failures between the route reflector that we monitor and other backbone routers during these I-BGP peaks. B. Summary Further studies are needed to validate the speculated origins of I-BGP updates. For example, to verify whether a specific set of nexthop changes do arise from intra-domain topology changes, we need to correlate the I-BGP spikes with IGP routing data, e.g., IS-IS failures. This is part of our on-going work. Our preliminary results show that: 33% of the total raw updates are due to internal events, 21.3% of the updates come from external events, and the remaining need more information to be resolved. 15.3% result from intra-domain topology changes such as link failures. 9.4% of the raw updates are implicit announcements of duplicate routes, which may be due to pathological reasons. VI. CONCLUSION AND FUTURE WORK This paper presents a systematic approach to examine the impact of I-BGP route announcements on traffic re-routing and end-to-end performance such as packet We also used PyRT to collect IS-IS messages to study intra-domain routing behavior. losses/reordering for TCP-flows. Results show that only 6% of the I-BGP updates affect the intra-domain forwarding path of the elephant prefixes carrying 8% of the traffic. Detailed flow analysis shows no discernible correlation (.6-.8) between I-BGP churn and packet loss/reordering. We also document our initial effort to identify the possible sources of I-BGP updates and whether they are caused by internal (e.g., link/router failures or local BGP policies) or external events. We found that the majority of the updates affect networks toward the edge of the Internet which are less stable but carry very little traffic. Many open questions remain to be addressed, and we have identified several future directions: Study the interaction between BGP and IS-IS Examine impact of churns on real-time traffic using continuous active probes Correlate BGP/IS-IS routing data with occurrence of routing loops. VII. ACKNOWLEDGMENTS The authors thank R. Mortier for the PyRT listener; D. Papagiannaki and G. Iannaccone for the traffic prediction and flow analysis tools; and R. Gass for trace maintenance. REFERENCES [1] J. W. Stewart, BGP-4: Inter-Domain Routing in the Internet, Addison-Wesley, 1998. [2] S. Bellovin, R. Bush, T. G. Griffin, and J. Rexford Slowing Routing Table Growth by Filtering Based on Address Allocation Policies NANOG 22 presentation, May 21. [3] T. Griffin and G. Wilfong An Analysis of BGP Convergence Properies ACM SIGCOMM, 1997. [4] C. Labovitz, A. Ahuja, A. Abose, and F. Jahanian An Experimental Study of BGP Convergence ACM SIGCOMM, 2. [5] C. Labovitz, G. R. Malan, and F. Jahanian Internet Routing Instability IEEE/ACM Transactions on Networking, vol. 6, no. 5, October 1998. [6] J. Cowie, A. Ogielski, BJ. Premore, and Y. Yuan Global Routing Instabilities during Code Red II and Nimda Worm Propagation NANOG 23 presentation, October 21. [7] University of Oregon Route Views Project Online data and reports http://www.routeviews.org/. [8] RIPE Network Coordination Centre http://www.ripe.net/ripencc/. [9] Sprint ATL IP-Monitoring Project http://www.sprintlabs.com/ [1] S. Jaiswal, G. Iannaccone, C. Diot, J. Kurose, and D. Towsley Measurement & Classification of Out-of-Sequence Packets in a Tier-1 IP Backbone submitted for publication. [11] S. Halabi and D. McPherson, Internet Routing Architectures Cisco Press, second ed. 21. [12] BGP Best Path Selection Algorithm http://www.cisco.com/warp/public/459/25.shtml.