FastRoute: A Scalable Load-Aware Anycast Routing Architecture for Modern CDNs

Size: px
Start display at page:

Download "FastRoute: A Scalable Load-Aware Anycast Routing Architecture for Modern CDNs"

Transcription

1 FastRoute: A Scalable Load-Aware Anycast Routing Architecture for Modern CDNs Ashley Flavel Microsoft ashleyfl@microsoft.com Jie Liu Microsoft Research jie.liu@microsoft.com Pradeepkumar Mani Microsoft prmani@microsoft.com Yingying Chen Microsoft yinchen@microsoft.com David A. Maltz Microsoft dmaltz@microsoft.com Oleg Surmachev Microsoft olegsu@microsoft.com Nick Holt Microsoft nickholt@microsoft.com Abstract Performance of online applications directly impacts user satisfaction. A major component of the user-perceived performance of the application is the time spent in transit between the user s device and the application existing in data centers. Content Delivery Networks (CDNs) are typically used to improve user-perceived application performance through a combination of caching and intelligent routing via proxies. In this paper, we describe FastRoute, a highly scalable and operational anycastbased system that has significantly improved the performance of numerous popular online services. While anycast is a common technique in modern CDNs for providing high-performance proximity routing, it sacrifices control over the load arriving at any individual proxy. We demonstrate that by collocating DNS and proxy services in each FastRoute node location, we can create a highperformance, completely distributed system for routing users to a nearby proxy while still enabling the graceful avoidance of overload an any individual proxy. 1 Introduction The latency between user requests and computer reactions directly relates to user engagement and loyalty [11, 31]. With the expansion of applications connecting to cloud servers, the network latency between users and cloud servers becomes a major component of the overall application performance. As network delays are largely proportional to the routing distance between clients and servers, application operators often employ services of Content Distribution Networks (CDNs). A CDN deploys proxy servers in geographically dispersed locations and then tunnels user requests through them. By utilizing these proxies, the CDN provides (among other things) performance improvements via techniques such as caching and split-tcp connections [27, 19]. While the overarching goal of latency reduction is universal, the logic to determine the proxy an individual user is routed to can be CDN-specific, based on what they offer to their customers, what the requirements of the application are, and what capacity limits the CDN has. Concentrating solely on the optimal proxy selection for every user based on latency can introduce additional complexity in the routing logic. In contrast, our design goals were two-fold, a) deliver a low-latency routing scheme that performed better than our existing CDN, and b) build an easy-to-operate, scalable and simple system The operational aspect of the design goal results in an architecture that is willing to sacrifice some user performance in scenarios that occur rarely in order to maintain a simple design pattern. A major early design choice was to utilize anycast routing (see Section 2.2) as it enabled each FastRoute node (see Section 3.1) to operate independently of other FastRoute nodes (i.e. no real-time communication between nodes). Anycast routing has also successfully been used to deliver content by other CDNs including Edgecast and Cloudflare [28]. Although anycast routing has its advantages, there is the potential for an individual FastRoute node to become overloaded with user traffic (as the CDN does not control which proxy receives traffic, leaving it at the mercy of the intermixed routing policies of Internet Service Providers). To account for this, the FastRoute architecture utilizes multiple layers of FastRoute nodes each with its own anycast IP (see Section 3.2.2). When a FastRoute node in a layer is overloaded, traffic is redirected to node(s) in the next layer. This layer-based approach is an example of choosing simplicity over user performance (for the expected, but rare overloaded node scenario 1 ). I.e. instead of attempting to direct users from an overloaded node to a nearby node regardless of layer, we route users to the closest node in the next layer. An artifact of using anycast routing is that the DNS issue. 1 If an overloaded node is not rare it indicates a build-out capacity

2 must choose which anycast IP address to return to a client without knowing which proxy (of the ones announcing that address) the client s traffic will reach. Consequently, some intelligence is needed to determine which DNS responses must be redirected to the next layer. In Section we show that by collocating DNS servers with proxy servers in the same node locations, a large percentage of user and DNS traffic land at the same a FastRoute node. Although the likelihood of a DNS query and its associated subsequent HTTP requests landing on the same node is not 100% (73% in our network), for the purposes of offloading traffic this has proven sufficient in production for over 2 years. Consequently, a FastRoute node only needs to know about its own load preserving the independence of nodes that anycast provides. Although load management is a critical component of the overall system, it is expected to operate rarely. In most situations, users will be routed to the first layer and their performance is based on the intermixed decisions of all ISPs in the Internet. Although technically possible to dynamically influence routing decisions in real-time (e.g. [8]), the system needed to do this would require significant development effort and most critically introduce additional complexity. Consequently, in Section 4 we introduce several monitoring tools we use for analysing user performance offline that influence our peering policies and who we choose to peer with. FastRoute has been in operation for several years improving the performance of a number of popular Internet applications. The initial move to FastRoute from a third-party CDN demonstrated noticeable performance improvements. Further, the simple load management implementation has handled all such overload scenarios since its inception, and its fully distributed nature has enabled us to quickly add new proxy locations further improving our user performance. Our contributions in this paper are the following novel and interesting aspects of FastRoute: Architecture: A scalable architecture that is robust, simple to deploy and operate, and just complex enough to handle challenges such as overloaded proxies due to anycast routing A simple, yet unique DNS-based load management technique that leverages the self-correlation between proxy traffic and DNS traffic in a collocated DNS and proxy system Use of multiple anycast rings of FastRoute nodes for load absorption to prevent ping-ponging of load between overloaded proxies Longitudinal Experimental Results from Production System: Data showing that FastRoute works effectively, even in the face of traffic that would overload a simpler system. Data showing that the DNS-to-Proxy correlation is relatively stable and high across time. Data showing performance (latency) improvements at the 75th and 95th percentiles with our initial limited set of FastRoute nodes, for customers of 10 ISPs in the USA. Data showing Anycast stability is sufficient to run a production system 2 Background Content Distribution Networks direct users to their nearby proxies to improve their performance. In this section we will first examine two fundamental technologies that are core to many of the techniques used to route traffic to the optimal proxy. We will then examine several known techniques that CDNs utilize to help select and route users to a nearby proxy. 2.1 DNS The Domain Name System (DNS) [23] translates human friendly names (such as into IP addresses (such as ). DNS utilizes a hierarchical system where end-users consult recursive DNS resolvers that identify and query an authoritative DNS resolver to discover the translated IP address. The recursive DNS resolver caches the translation for the duration of the timeto-live (TTL) associated with the response. Other clients that utilize the same recursive DNS resolver will receive the same response for the duration of the TTL. 2.2 Anycast Routing Anycast is a routing technique historically popular with DNS systems due to its inherent ability to spread DDOS traffic among multiple sites as well as provide low latency lookups. It utilizes the fact that routers running the de-facto standard inter-domain routing protocol in the Internet (BGP [30]), select the shortest (based on policy and the BGP decision process) of multiple routes to reach a destination IP prefix. Consequently, if multiple destinations claim to be a single destination, routers independently examine the characteristics of the multiple available routes and select the shortest one (according to the BGP path selection process). The effect of this is that 2

3 individual users are routed to the closest location claiming to be the IP prefix (see [28] for a good explanation of Anycast routing). Note that latency is not a consideration in the BGP path selection process. However, by tuning anycast routing announcements and negotiating policies of peering ISPs (as in Section 4), BGP routing decisions can align with latency based routing. 2.3 Proxy Selection Techniques FastRoute uses Anycast TCP to route a user to a proxy. In this section, we describe the general approach a CDN would use with Anycast TCP as well as examine several other alternatives (a summary is included in [9]) Anycast TCP Anycast TCP is used by many modern CDNs including Edgecast and CloudFlare[28]. This approach has all proxies responding on the same IP address and Internet routing protocols determine the closest proxy[8]. If a location cannot serve traffic, it withdraws its route and the Internet s routing protocols route users to the next closest location. A difficulty with this approach is that control over where user traffic lands is relinquished to Internet routing protocols. Consequently, avoiding the overload of an individual proxy by controlling routes becomes challenging to accomplish in an automated fashion, as this either requires a traffic engineering technique such as [17, 29, 8] or a DNS based approach as presented in this paper. A second concern with this approach is that a route change in the middle of a TCP session can result in the user communicating with an alternative proxy midsession. Attempts can be made to direct rogue TCP flows and route them to the correct proxy[8]; however, much like [22], we analysed the availability of end-users (see Section 5.3), finding the availability of an anycast destination for a small file download was equivalent to unicast indicating this issue did not warrent implementing such a solution at the time of creation Anycast DNS Utilizing an anycast based DNS has become the standard mechanism used by major providers to offer a high level of performance and denial-of-service protection. However, one mechanism a CDN can use to select which proxy to route traffic to takes advantage of information obtained from where the DNS lookup occurs [15]. By co-locating the DNS servers with each proxy, a request landing on a proxy simply returns the unicast IP of the collocated proxy. This is a simple solution that utilizes the Internet s routing protocols to find the shortest route to proxy location. However, with this approach, the closest proxy selected is based on closeness to the recursive DNS of the user instead of the closeness to the user themselves. In practice we found the closest proxy for the DNS and user (self-correlation) for our network was the same for 73% of requests based on the analysis described in Section Although we do not use this unicast based approach (due to it sacrificing the performance benefits of Anycast TCP), our architecture can easily be modified to return unicast IPs instead of anycast IPs (making the self correlation 100%) Internet Map An Internet map based approach relies on projecting the entire IP address space onto the set of available proxies (e.g. [13]). By collecting performance data either passively [20, 25], actively through probing mechanisms [12, 24], or statically mapping using geo-to-ip mappings [4], a map can be created that maps IP address space to proxy locations. The map is updated over time as network conditions change or proxies come up/down. We did not pursue this approach (in contrast to the Anycast TCP approach) as it required global knowledge to analyse user latency data as well as real-time load and health metrics of nodes to decide where to route traffic. Further, the lack of granularity of DNS based responses [26] (which is the predominant method to route users based on an Internet map), the lack of motivation for ISPs (who still perform a majority of the DNS resolutions for users 90% in the USA from our calculations) to implement more granular DNS requests [14, 18, 26] and the additional complexity introduced when supporting IPv6 2 when supporting IPv6 made this approach less appealing Other techniques Several other techniques including manifest modification for video providers [7] (not applicable to other content types) or HTTP redirection are possible. The content we are delivering was predominantly dynamic content making the cost of a HTTP redirection high compared to transfer time making this approach infeasible. 3 FastRoute Architecture In this section we describe a) the components within an individual FastRoute node b) why we can make local 2 DNS requests are not guaranteed to be made over the same protocol for the answer they are requesting. I.e. an IPv4 (A record) resolution can be made over IPv6 and vice-versa. Consequently, supporting IPv6 introduces a 4x complexity over Ipv4 only by adding an additional 3 maps. 3

4 a threshold, the anycast BGP prefixes of the DNS and proxy are withdrawn. Equivalently, when the number of healthy proxy and DNS services is higher than a threshold the BGP prefixes are announced. Announcing and withdrawing routes is the mechanism by which a Fast- Route node either chooses to receive traffic or not. The Proxy service is responsible for handling user traffic (e.g. terminating user TCP sessions, caching, fetching content from origin servers, blocking DDOS traffic etc). For each type of traffic it is handling, a counter defining the load is published locally. The DNS service responds to each DNS query with one of two possible responses: either the anycast IP of its own FastRoute node or a CNAME (redirection) to the next layer (details included in Section 3.2). The probability of returning the CNAME is determined by reading at regular intervals a configuration published by the load management service. The load management service is responsible for aggregating the counters collected across all proxy nodes and publishing the probability of returning the redirection CNAME for each DNS name. It operates in a master/slave configuration so all DNS services within the node receive the same offload probability. Details of the algorithm the load management service uses are included in Section 3.2. Figure 1: The FastRoute node architecture consists of 4 major components: Load Balancer (to balance traffic to a single Virtual IP (VIP) to multiple instances), DNS (to answer user DNS queries), Proxy (to serve application traffic) and Load Manager (to determine the offload percentage). decisions on redirecting load away from a node and c) how the local algorithm makes its decisions. 3.1 FastRoute Node In this section we describe the services within an individual FastRoute node and the communication between them. As explained in Section 3.2, no communication is needed outside an individual node to route users to a proxy. In Figure 1 we show the four major stand-alone services that exist within a FastRoute node:- Load Balancer, Proxy, DNS and Load Manager. Each service may be on independent or co-existing on the same physical machines. The Load Balancer is responsible for spreading user traffic between N instances of the Proxy and DNS traffic between M instances of the DNS service. When the number of healthy proxy or DNS services drops below 3.2 Distributed Load Management When no individual proxy is receiving more traffic than it is configured to handle, the operation of the system follows the pure anycast mechanism with all DNS requests being responded with the same IP address and the Internet s routing protocols determine the closest proxy for each user. However, as described in Section 2.3.1, Internet routing protocols have no knowledge of the load on any individual proxy. Consequently, an individual proxy can be overloaded when using anycast if proxy locations aren t significantly over provisioned relative to the expected traffic load. The ability to over-provision such proxy locations is often limited as power and space comes at a premium. Further, the ability to add new capacity to overloaded locations has significant lead-time (can be weeks to months), hence we must have the ability to dynamically shift load in real-time - even if we expect this situation to be rare. We considered two main techniques for Load Management - (1) altering BGP routes, and (2) modifying DNS responses. When an individual proxy is overloaded, utilizing BGP techniques such as AS Path pre-pending or withdrawing routes to one or more peers is one way to reduce the traffic on the proxy. However, such techniques are difficult to perfect as they can suffer from cascading 4

5 failures (as an action from one proxy can cause a traffic swarm to a nearby proxy causing it to take action, and so on). A centralized system could be used to manage such actions; however, a significant amount of complexity would need to be introduced into predicting where traffic would route to, given a particular action. Further, taking BGP-level actions when utilizing an anycast based system results in route-churn and subsequently, an increased rate of TCP session breakages. Conversely, modifying DNS responses enables the BGP topology to remain unchanged. This has two main attractive features. First, the BGP topology improvements and monitoring described in Section 4.1 remain independent to load management. Second, modifying DNS responses will only affect the routing of new users (as users already connected to a proxy will continue their session). Hence, DNS is a less abrupt change to users and a more gradual shift of overall traffic patterns than modifying BGP. However, there are two primary difficulties when using DNS for load management: first, the DNS server must infer that its response will result in additional traffic landing on an overloaded proxy (or not); second, given that a DNS server knows which DNS responses to modify to prevent load from landing on an overloaded proxy, what answer does it then respond with to redirect users causing the excessive load? To solve the first difficulty, we used an artifact used by many traditional CDNs the LDNS and the users of that LDNS are often in a similar network location. Consequently, we collocated our authoritative DNS servers and proxy servers in a FastRoute Node. Our hypothesis was that there would be a high correlation between the location of the proxy receiving the user traffic and the authoritative DNS server receiving the DNS request. Given a high correlation, by altering only the decision of the collocated DNS server, we can divert traffic and avoid overloading the proxy. This has a very appealing characteristic that the only communication needed is between the collocated proxy and DNS in a given FastRoute node. We discuss more on this communication in Section The second difficulty we encounter is where to direct traffic that would normally be directed to an overloaded proxy. One option is to return the unicast IP address of the next closest non-overloaded proxy, however, this in essence means creating a parallel system like the Internet map as described in Section for the (expected) rare situation of an overloaded proxy. In contrast, we translate the problem from one of where to send the traffic, to one of where not to send the traffic. As we only have collocated proxy-dns pair communication, the only load a DNS server is aware of is its own. Consequently, each DNS simply knows to direct traffic to not-me. By configuring multiple anycast IP addresses on different sets of proxies, the DNS server can direct Figure 2: CDF of self-correlation values observed each day for all FastRoute nodes over a week. Each datapoint is a self-correlation for an individual node for a single day. 90% of datapoints have a self-correlation greater than 50% with no datapoint less than 27% justifying FastRoute s decisoin to use onl local decisions for load management. traffic to one of the anycast IPs that it is not responsible for. However, when using such an approach, it is possible that multiple proxies experience high load and direct traffic to each other causing more load on proxies that are already overloaded. The underlying problem is there is a possibility for ping-ponging of traffic among overloaded proxies, if we are not careful. We address this concern in Section 3.2.2, by setting up loop-free traffic diversion paths Local Controllability of Load If DNS queries and subsequent user traffic to proxies lands on the same FastRoute Node, we call such user traffic as controllable. We measure correlation between two FastRoute nodes i and j as the likelihood of the DNS query landing on FastRoute node i (DNS response is the anycast IP of the proxy) and the subsequent user traffic landing on the proxy in node j. The self-correlation of any node i is a measure of the controllable load on that node. Every node could have a mix of controllable and uncontrollable load. From the data gathered using the approach shown in [16], we can construct a correlation matrix for all the nodes in the system. The diagonal of the correlation matrix gives the self-correlation values for the various nodes. For our solution to be able to handle a given load, we rely on the self-correlation being high enough to offload sufficient traffic to avoid congestion. In Figure 2 we show the CDF of self-correlation values observed each day for every FastRoute node using over 10 million DNS-to-HTTP request mappings collected over a week in February 2015 from a representative sample of users 5

6 of a major application utilizing FastRoute. Each node contributed around 7 data points (one self-correlation value per day) towards computing the CDF. Any Fast- Route node receiving less than 100 samples on an individual day was excluded from the results. We see that more than 90% of (Node,day) pairs have a self-correlation greater than 50%. No node on any day had a self-correlation below 27%. Further, when examining the individual node self-correlation values, they remain relatively constant over the entire week. For the nodes with self-correlation below 50% we found that the cross-correlation with either one or several other neighboring nodes was relatively high. For example, one node in Europe with a self-correlation of approximately 28% had four other nodes in nearby cities with cross-correlation values of 20%, 18%, 17% and 10%. A distinct North American node also with a self-correlation of approximately 28% had a single other node with a cross-correlation value of 50%. We see this pattern (of a small subset of nodes that have high cross-correlation values) consistently throughout other nodes with relatively low self-correlation. FastRoute does not currently attempt to do anything special for nodes with low self-correlation. This is based on our design principle of simplicity do not build unnecessary complexity unless absolutely needed (and so far it has not). The two FastRoute nodes discussed above serve less than 2% of total global traffic and are sufficiently over-provisioned to handle the load they receive. However, if any node (low self-correlation or not) is unable to offload sufficient traffic, we have the ability to alert an operator to manually divert traffic from other nodes (based on the historic non-diagonal terms of the correlation matrix). In the future, if operators are being involved sufficiently often enough to justify the additional complexity, we can implement one or more of the following features:- Lowly correlated nodes can commit suicide (i.e., withdraw DNS and Proxy anycast BGP routes) when an offered load is unable to be sufficiently diverted, resulting in traffic (expected) to land on nearby nodes with higher self-correlation values and can divert traffic if necessary. This keeps our current desirable system property of no real-time communication between nodes. Lowly correlated nodes can inform the small set of nearby nodes that have a high cross-correlation to start offloading traffic (e.g. in the examples presented earlier, this would increase the ability to offload traffic to 93% for the European node and 78% for the North American node). This breaks our current system property of no real-time communication between nodes, but does limit it to a small subset of nodes. Nodes with low self-correlation can be configured in anycast-dns mode (i.e. DNS served over anycast, but proxy over unicast addresses; see Section 2.3.2). Such nodes could always be configured in this mode, or nodes could automatically transition to this mode when they cannot divert sufficient traffic. The proxy can take steps to divert traffic including reducing its workload (e.g. dropping lower priority traffic) or diverting traffic via HTTP 302 redirects. As more FastRoute nodes are added, we will continue to monitor the correlation matrix to ensure it is sufficient to handle our traffic patterns Loop-free Diversion of Load So far we have discussed the control over load landing on a proxy with purely local actions (The DNS altering its decision to divert traffic away from the collocated proxy). We now discuss how we determine what the altered response should be. Our approach is one that utilizes anycast layers where each layer has a different anycast IP address for the DNS and proxy services. Each DNS knows only the domain name of its parent layer. Under load, it will start CNAME ing requests to its parent layer domain name 3. By utilizing a CNAME, we force the recursive resolver to fetch the DNS name resolution from a FastRoute node within the parent layer. This mechanism ensures that a parent layer node has control over traffic landing in the parent layer with the parent layer following the same process if it becomes overloaded. We see an example setup of anycast layers in Figure 3. Here we see FastRoute nodes 1 and 2 in the outermost layer becoming overloaded. This results in both nodes diverting traffic to the middle layer resulting in additional traffic landing on nodes 3 and 4. Node 4 determines that it is now being overloaded as a result and diverts load to the innermost layer with node 5 receiving the additional traffic. From a user perspective, although their DNS requests may be bounced off several nodes, their proxy traffic will not experience the redirects. Higher level layers are not required to be as close to users as lower level layers, consequently, they can be in physical locations where space is relatively cheap and easy to add capacity (e.g. within large data centers with elastic capacity [21, 1]). Hence, bursts of traffic can be 3 A CNAME is an alias within the DNS protocol that causes the recursive resolver to undertake a new lookup 6

7 Figure 3: An example configuration with three Anycast layers. Solid arrows denote user connections, while dotted arrows denote the effect of diverting traffic by nodes that would otherwise be in overload. handled by over-provisioning. By diverting lower priority traffic from higher layers first (as in Section 3.3.1) we can avoid the perceived user performance impact. Although we have shown a single directed path between the lowest layer and highest layer, more advanced configurations are possible. Several extensions include an individual proxy may have two parent layers and offload proportionally between the layers (we did operate in this mode initially when the highest layer did not have sufficient spare capacity), different applications may have a different relationship between layers or individual proxies may exist in multiple layers (i.e. a layer may consist of locations that are subset of a lower layer). The only requirements are that the relationship between layers be loop-free and the highest layer be able to handle the load with no ability to divert traffic. 3.3 Local Offload Algorithm In this section, we will discuss our approach to use DNS to manage load on a collocated proxy. We will begin by defining the notion of load first: user traffic hitting a proxy will consume various system resources such as CPU, memory, network bandwidth, etc., in the proxy. We refer to the strain placed on these resources as load. The nature of load could vary based on the traffic hitting each end point in the proxy (e.g. short HTTP request-response type queries are generally bottlenecked by CPU; file streaming applications are generally bottlenecked by network bandwidth, etc.). We can control load on a particular resource by controlling the user traffic hitting the end point(s) associated with the load. For every such identified loaded resource associated with the end point (one resource per end point), we define an overload threshold, that defines the operating boundary of the proxy, and we consider the proxy overloaded if the load on any resource exceeds the threshold. The goal of FastRoute s load management scheme is to operate the system such that the load on any resource in a given proxy stays under the overload threshold. Also, as each FastRoute load manager instance expects a fraction of traffic that is not controllable locally, multiple instances of the load management service can operate on different endpoints hosted on the same physical machine even if they utilize each other s bottlenecked resource (e.g. a filestreaming application may be bottlenecked by network bandwidth, but still consumes CPU). This behavior simply alters the fraction of uncontrollable load each load manager instance sees When to Divert Load? In our design it is up to an individual node to discover when it is overloaded and divert some traffic. The load management algorithm that controls offload should be able to quickly recognize an overload situation and divert (just enough) load to another layer, so as to bring load under the overload threshold; equally important for the algorithm is to recognize that the overload situation has passed, and reduce or stop the offloading, as appropriate. Also, it is important to note that any delay in offloading during overload will cause measurable user impact (may cause service unavailability), while any delay in bringing back the traffic once overload has passed, has a relatively smaller penalty and user impact (e.g. higher latency due to being served from a farther layer). The two types of load to expect are:- Slowly increasing/decreasing load. This load is caused by the natural user patterns throughout the day. Generally, over a day a diurnal pattern is seen based on users activities in the timezone of the proxy s catchment zone. Figure 4 shows diurnal traffic pattern observed over a period of 3 days in a proxy in production. Step changes in load. This is caused by a nearby proxy going down and all traffic from that proxy hitting the next closest proxy. We show an example of one such occurrence from our production system in Figure 5. Consequently, our algorithm that determines which DNS answer to return must handle the above two scenarios. Challenges in this algorithm surround the limitations of the control mechanism. These include:- 7

8 Figure 4: Traffic pattern over a period of 3 days for a single node. The Y-axis represents traffic volume. We have removed the values for confidentiality purposes Figure 5: At around 17:00, a neighboring proxy (top) fails and as a result the closest proxy (bottom) is hit with all the load. The Y-axis represents traffic volume. We have removed the values for confidentiality purposes The TTL on a DNS response causes a delayed response to changes. Though it was shown in [16] that load management using DNS is feasible, the delay due to TTL is unavoidable. Local DNS servers have differing numbers of users behind them. A user s DNS lookup may not land on the same proxy as their TCP traffic (see Section for analysis). Consequently, some load on a proxy (from its perspective) is uncontrollable. Given the limitations of the control mechanism we have, we would like our control algorithm to be able to Quickly reduce load when a step change forces traffic above a desired utilization. Prioritize low value traffic to be offloaded Alert if the "uncontrollable" load becomes too large to maintain load under the desired utilization. Many algorithms can support these characteristics. We present a simplified version of our algorithm in production. Let S be the current load on a given resource at node i, T be the overload threshold that is set as the operating boundary for this resource, under which we expect the proxy to operate at all times, and x be the fraction of traffic being offloaded to the next higher layer (offload probability). In order for the load management control loop to function effectively, the load sampling interval is set to higher than twice the TTL of the DNS responses (note that the TTL on the responses reflected the desire of responsiveness; i.e. longer TTL implied that a sustained overload condition and slower reaction to overload was acceptable). if S > T, the node i is overloaded. Offload probability x is increased super-linearly (maximum value = 1) if S < T, the node i is NOT overloaded. Offload probability x is decreased linearly (minimum value = 0) As an extension, we implemented priority-based offloading of different end points that have the same load characteristics (no results shown in this paper). Among end points that contend for the same resource, we defined load management policies such that offload happens in some desired priority order. For example, say the proxy is the end point for both and and traffic to either of these end points will use up CPU. Suppose, com is more important than and say the overload threshold is set to 70%. When overload occurs (i.e. CPU use exceeds 70%), the system will begin offloading customers of in an effort to control the load on the system. If the overload persists, then customers of are fully offloaded before offloading any customers of If overload persists even after offloading 100% of customers of both endpoints, 8

9 then manual intervention is sought by engaging personnel from the FastRoute operations team to artificially increase the measured load on highly cross-correlated neighboring nodes causing an increased diversion of traffic away from the overloaded node Scalability and Independent Operation Given that (a) our operating assumption is that each node has sufficient self-correlation, and (b) the DNS and proxy are collocated in the FastRoute node, it thus follows that the load management system situated at any given node only needs to monitor the various aspects of load on the local node. Once it has collected the load data, the load management system computes the offload fraction for a given end point, and it only needs to communicate the results to the local DNS. Thus, all communication needed to make load management work effectively is contained fully within the same FastRoute node, without the need for any external input or sharing of global state, which makes the operation of FastRoute nodes completely independent of one another, and allows for simplified and easy-to-scale deployment. 4 Improving Anycast Routes over Time We chose an anycast TCP based approach for FastRoute due to its simplicity and low dependence on DNS for optimal proxy selection. Consequently, we rely heavily on BGP (the de-facto standard inter-domain routing protocol used in the Internet) to best direct users to the closest proxy. The underlying assumption when using anycast is that the shortest route chosen by BGP is also the lowest latency route. However, due to the way the BGP route selection process operates, this may not always be the case. Although possible to implement a real-time system that adapts to the current network topology to modify route announcments of flip-back to unicast, this would introduce additional complexity something we wished to avoid. Consequently, we opt for a primarily offline approach to monitoring the behavior of anycast. We utilize the user performance telementry to analyse daily user performance (see Section 4.1) to prioritize network peering improvements and identify performance changes for a set of users (see Section 4.2). Availability is most critical, hence we monitor availability in real-time via active Internet based probes such as [3, 2, 5] and internal probes (from within the node itself). 4.1 Identifying Performance Problems One of the most valuable visualization techniques we developed as part of FastRoute was to overlay perfor- Figure 6: User performance grouped by ISP, geographic location and proxy location displayed on a Bing map. The size of the bubble represents the relative number of users. The color of the bubble represents the relative performance. mance data collected from users of our production application(s) on top of a Bing map. Multiple views of user performance data were then plotted on top of this map providing unprecedented insight into how our users were experiencing our service. The most basic view we created using this technique is shown in Figure 6. Here we see users in Washington state. Navigation timing data [6] for these users are aggregated based on the user s geographic location, the ISP they are connected to and the proxy that they connect to. We display this data by sizing the bubble based on the relative number of users in the aggregate group and coloring the bubble based on the relative performance the users receive (red (worst), orange, yellow, light green, dark green (best)). From the example in Figure 6 we can quickly determine that we have a significant user base in the Seattle region (as expected due to the large population in this area) 4. We can also see that one particular ISP is experiencing lower levels of performance than others in the same region. Upon further investigation (with data contained in the flyout box that appears when hovering over this bubble), we found this ISP was a cellular provider - expected to have slower performance than a cable or DSL network. This display quickly identified large user populations that were receiving a lower level of performance than others. By filtering by individual proxies, it became immediately obvious when users were routed to a suboptimal location (e.g. if European users were being routed to the North America). We found the performance of major ISPs to be relatively constant day-overday. Consequently, by identifying the ISPs whose users 4 Note that we have introduced random jitter around the actual geolocation of the user population to avoid bubbles being drawn directly on top of each other 9

10 4.3 Active Monitoring Passive analysis of users reaching our service provide the best aggregate view of the performance our users are receiving. However, active probing mechanisms from third-party sources ( e.g., [3, 2, 5]) provides additional sources of data. We found that utilizing systems that existed outside of our own infrastructure avoided circular dependencies and enables us to have information that is normally unavailable using passive monitoring (e.g. traceroutes). Figure 7: Latency vs Time for several ISPs. A peering link change on Day 5 resulted in a substantial increase in latency. Part-way through Day 6 the link was restored resulting in the expected performance returning. were being sub-optimally routed (and prioritizing them based on user populations), our ISP peering team could prioritize their efforts to best satisfy our users (improving the performance of users accessing all applications of the Microsoft network - not just those running through Fast- Route). 4.2 Identifying Changes in Performance The above map based view of performance is highly beneficial for analyzing a snapshot of performance. However, it is not as beneficial when trying to identify performance changes. Our goal is also to continually improve the performance for all our users. By considering the current performance of users as a benchmark, we can identify performance degradations, correlate the changes with known network changes and revert them if necessary. For example, in Figure 7 we see several ISPs latency dramatically increase in the middle of the time series. This was as a result of an alteration in our peering relationship with another ISP resulting in congestion. By identifying an anomaly in the expected performance of users from this ISP, we were able to quickly rectify the issue, ensuring the effect on our users was minimized. Conversely, the addition of new peering relationships and their impact on user performance was directly attributable providing business justification for the (possible) additional cost. In a similar way, we can identify black-holing (or hijacking) of traffic (e.g. [10]). By monitoring the current user traffic volumes, we can identify anomalies in the expected volumes of traffic from particular ISPs. 5 FastRoute in Production FastRoute was designed to replace a third-party CDN that was currently in operation for our Internet applications. However, in order to do so, we had to prove that FastRoute was not only functional, but there was a performance improvement and no drop in availability when compared to the third-party CDN. We describe in Section 5.1 how we compared the two systems, presenting data from our initial comparison. A critical component of FastRoute is its ability to handle an overloaded proxy. This is expected to be a rare scenario given appropriate capacity planning, but prevents availability drops under load. In Section 5.2 we examine how load manager has operated in production. A concern when using anycast is the availabilty of anycast in comparison to unicast given route flaps. In Section 5.3 we see no difference in the availability of a thirdparty unicast based CDN and our anycast solution. 5.1 Onboarding to FastRoute We took a two step process for ensuring we reliably onboarded our first application onto FastRoute: first, compare availability and performance of non-production traffic served through FastRoute vs our existing third-party CDN, before gradually increasing the fraction of production traffic that was directed to FastRoute instead of the third-party CDN ensuring real-user performance and availability was not degraded throughout the transition Non-Production Traffic Comparison One method of comparison for two CDNs is through the use of active monitoring probes from agents spread throughout the Internet [2, 3, 5]. However, active probes come from a very limited set of locations and do not reflect the network location of our users. Consequently, we utilized our existing user base as our probing set. We achieved this by placing a small image on both the third party CDN as well as FastRoute. We then directed a small random sampling ( 5%) of users to download 10

11 the image from both the CDN and from FastRoute (after their page had loaded) and report the time taken (utilizing the javascript as described in [16]). This demonstrated that FastRoute frequently delivered the image faster than our third-party CDN. This was sufficient justification to initiate the delivery of the actual application through FastRoute Production Traffic Comparison The above non-production traffic experiment indicated that performance improvements were possible using FastRoute, however, there are many differences between a small image download and our production application. Consequently, we were cautious when moving to Fast- Route. Our first production traffic moved onto FastRoute was a small percentage of a single US-based ISP. We configured our DNS servers to direct a random small fraction of users from the ISP s known LDNS IPs to Fast- Route (leaving the remainder on the third-party CDN). By analysing the performance data for the random sampling of users and comparing with the third-party CDN, we were able to ascertain the performance difference between the two CDNs 5. This also enabled us to gather confidence that we were functionally equivalent to the third-party CDN. We repeated this flighting of different sets of users at different times and for different durations. We see in Figure 8 that for 10 major ISPs contributing more than 60% of traffic within the United States, all experienced a performance improvement with FastRoute. This initial comparison was undertaken with our initial deployment of only 10 FastRoute nodes throughout the United States 6. This data was sufficient to justify increasing the percentage of users directed to FastRoute until 100% of users now pass through Fast- Route. Since the time of analysis, we have increased the number of FastRoute nodes, added new applications and and improved our network connectivity to ISPs to further improve user performance. 5.2 Load Management in Production We designed FastRoute s load manager with a single configurable parameter for each application the threshold that a metric must be kept under (see Section 3). This metric is collected periodically and Load Manager reacts based on the current and previous values of the metric. We see in Figure 9 the traffic patterns of one application running on FastRoute. This application has a particularly spiky metric that had a threshold set to 70%. 5 Note that the performance improvement shown is for the entire FastRoute system (user to proxy to data center) not just for proxy selection. 6 Nodes throughout the world were present, but for this analysis we focus on the United States. Figure 9: One application running on FastRoute had a very spiky traffic pattern within its diurnal. Load manager reacted automatically to divert the appropriate amount of traffic when load crossed the threshold, bringing it back when it had subsided sufficiently. If the metric went above a hard limit of 100%, it would result in the loss of user traffic. We can see that the spiky nature of the burst in traffic resulted in the load manager offloading traffic quickly to bring the load back under the threshold. Some oscillation occurs around the threshold due to the delayed effects of DNS TTLs, but we control the traffic around the threshold FastRoute s load management has been in operation for over 2 years. During this time we have seen a number of scenarios resulting in overloaded proxies (usually of the order of few incidents per week) including nearby proxies going down, naturally spiky user traffic patterns and code bugs in the proxy or DNS. FastRoute s load management scheme has provided the required safety net to handle all scenarios during this time without requiring manual intervention to modify routing policies or alter DNS configurations. 5.3 Anycast Availability A concern when utilizing an anycast based solution is that the availability of the endpoint will be lower due to route fluctuations. In Figure 10 we show results from a synthetic test where approximately 20, 000 Bing toolbar clients downloaded a small image from an anycast IP announced from all 12 nodes (full set of nodes at time of experiment) throughout the Internet and the same image from a 3rd party (unicast) CDN over a period of a week. Although one datapoint showed the anycast availability 11

12 Figure 8: Performance improvements in 10 major US ISPs (contributing above 60% of all user traffic in the US) when using FastRoute compared to a third-party CDN. This data was collected when only 10 FastRoute nodes were in operation and no nodes were overloaded. Examining the selection of data center for user traffic landing on a proxy as well as techniques used to prioritize and multiplex user traffic to achieve optimal performance. Analyzing the impact to the self-correlation of DNS and proxy traffic when supporting IPv6. Figure 10: Anycast availability over a week compared to third-party CDN. Note the y-axis starts at 99.4%. The availabilty over the entire week was 99.96% vs 99.95% respectively. dropped to 99.65% availabilty, we saw overall availabilities of 99.96% and 99.95% for anycast and third-party CDN availabilities respectively. These results, the success of other anycast TCP based CDNs (e.g. Cloudflare, Edgecast), the work presented in [22] and the lack of issues found in over 2 years serving production traffic (even as the set of nodes grows) indicate that anycast in the Internet is stable enough to run a production network on. 6 Future Work Within this paper we have described the architecture of FastRoute, concentrating on proxy selection mechanism. Future work includes: Analyzing the impact that local decisions made when diverting load, have on the global traffic patterns. In particular, we would like to understand the degree of sub-optimality introduced due to making local decisions, compared to making globally optimal decisions centrally. Studying the distributed load management algorithm from a control-theoretic perspective, and understand limits on correlation and user-traffic for stable system operation. 7 Conclusion We have presented FastRoute, an anycast routing architecture for CDNs that is operational and provides high performance for users. FastRoute s architecture is robust, simple to deploy and operate, scalable, and just complex enough to handle overload conditions due to anycast routing. We highlighted performance gains obtained from our production system when routing users through FastRoute instead of a major third-party CDN. We described a novel load management technique in FastRoute, which used the anycast DNS and multiple anycast proxy rings for load absorption. Excess traffic from one layer was directed to another higher layer using the collocated DNS. We provided data from our 12

FortiBalancer: Global Server Load Balancing WHITE PAPER

FortiBalancer: Global Server Load Balancing WHITE PAPER FortiBalancer: Global Server Load Balancing WHITE PAPER FORTINET FortiBalancer: Global Server Load Balancing PAGE 2 Introduction Scalability, high availability and performance are critical to the success

More information

State of the Cloud DNS Report

State of the Cloud DNS Report transparency for the cloud State of the Cloud DNS Report Basic Edition August 2015 2015 Table of Contents Overview Introduction 3 Anycast vs. Unicast DNS 3 Provider Overview & Current News 4 Provider Marketshare

More information

Global Server Load Balancing

Global Server Load Balancing White Paper Overview Many enterprises attempt to scale Web and network capacity by deploying additional servers and increased infrastructure at a single location, but centralized architectures are subject

More information

Web Caching and CDNs. Aditya Akella

Web Caching and CDNs. Aditya Akella Web Caching and CDNs Aditya Akella 1 Where can bottlenecks occur? First mile: client to its ISPs Last mile: server to its ISP Server: compute/memory limitations ISP interconnections/peerings: congestion

More information

THE MASTER LIST OF DNS TERMINOLOGY. v 2.0

THE MASTER LIST OF DNS TERMINOLOGY. v 2.0 THE MASTER LIST OF DNS TERMINOLOGY v 2.0 DNS can be hard to understand and if you re unfamiliar with the terminology, learning more about DNS can seem as daunting as learning a new language. To help people

More information

Global Server Load Balancing

Global Server Load Balancing White Paper Global Server Load Balancing APV Series Application Delivery Controllers May 2011 Global Server Load Balancing Access. Security. Delivery. Introduction Scalability, high availability and performance

More information

State of the Cloud DNS Report

State of the Cloud DNS Report transparency for the cloud State of the Cloud DNS Report Basic Edition April 2015 2015 Table of Contents Overview Introduction 3 Anycast vs. Unicast DNS 3 Provider Overview & Current News 4 Provider Marketshare

More information

Data Center Content Delivery Network

Data Center Content Delivery Network BM 465E Distributed Systems Lecture 4 Networking (cont.) Mehmet Demirci Today Overlay networks Data centers Content delivery networks Overlay Network A virtual network built on top of another network Overlay

More information

Choosing a Content Delivery Method

Choosing a Content Delivery Method Choosing a Content Delivery Method Executive Summary Cache-based content distribution networks (CDNs) reach very large volumes of highly dispersed end users by duplicating centrally hosted video, audio

More information

A DNS Reflection Method for Global Traffic Management

A DNS Reflection Method for Global Traffic Management A DNS Reflection Method for Global Traffic Management Cheng Huang Microsoft Research Albert Greenberg Microsoft Research Nick Holt Microsoft Corporation Jin Li Microsoft Research Y. Angela Wang Polytechnic

More information

Superior Disaster Recovery with Radware s Global Server Load Balancing (GSLB) Solution

Superior Disaster Recovery with Radware s Global Server Load Balancing (GSLB) Solution Superior Disaster Recovery with Radware s Global Server Load Balancing (GSLB) Solution White Paper January 2012 Radware GSLB Solution White Paper Page 1 Table of Contents 1. EXECUTIVE SUMMARY... 3 2. GLOBAL

More information

How To Understand The Power Of A Content Delivery Network (Cdn)

How To Understand The Power Of A Content Delivery Network (Cdn) Overview 5-44 5-44 Computer Networking 5-64 Lecture 8: Delivering Content Content Delivery Networks Peter Steenkiste Fall 04 www.cs.cmu.edu/~prs/5-44-f4 Web Consistent hashing Peer-to-peer CDN Motivation

More information

THE MASTER LIST OF DNS TERMINOLOGY. First Edition

THE MASTER LIST OF DNS TERMINOLOGY. First Edition THE MASTER LIST OF DNS TERMINOLOGY First Edition DNS can be hard to understand and if you re unfamiliar with the terminology, learning more about DNS can seem as daunting as learning a new language. To

More information

Request Routing, Load-Balancing and Fault- Tolerance Solution - MediaDNS

Request Routing, Load-Balancing and Fault- Tolerance Solution - MediaDNS White paper Request Routing, Load-Balancing and Fault- Tolerance Solution - MediaDNS June 2001 Response in Global Environment Simply by connecting to the Internet, local businesses transform themselves

More information

The OpenDNS Global Network Delivers a Secure Connection Every Time. Everywhere.

The OpenDNS Global Network Delivers a Secure Connection Every Time. Everywhere. The OpenDNS Global Network Delivers a Secure Connection Every Time. Everywhere. Network Performance Users devices create multiple simultaneous connections each time we exchange data with other Internet

More information

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at distributing load b. QUESTION: What is the context? i. How

More information

Content Delivery and the Natural Evolution of DNS

Content Delivery and the Natural Evolution of DNS Content Delivery and the Natural Evolution of DNS Remote DNS Trends, Performance Issues and Alternative Solutions John S. Otto Mario A. Sánchez John P. Rula Fabián E. Bustamante Northwestern University

More information

GLOBAL SERVER LOAD BALANCING WITH SERVERIRON

GLOBAL SERVER LOAD BALANCING WITH SERVERIRON APPLICATION NOTE GLOBAL SERVER LOAD BALANCING WITH SERVERIRON Growing Global Simply by connecting to the Internet, local businesses transform themselves into global ebusiness enterprises that span the

More information

The Requirement for a New Type of Cloud Based CDN

The Requirement for a New Type of Cloud Based CDN The Requirement for a New Type of Cloud Based CDN Executive Summary The growing use of SaaS-based applications has highlighted some of the fundamental weaknesses of the Internet that significantly impact

More information

Multihoming and Multi-path Routing. CS 7260 Nick Feamster January 29. 2007

Multihoming and Multi-path Routing. CS 7260 Nick Feamster January 29. 2007 Multihoming and Multi-path Routing CS 7260 Nick Feamster January 29. 2007 Today s Topic IP-Based Multihoming What is it? What problem is it solving? (Why multihome?) How is it implemented today (in IP)?

More information

Distributed Systems. 23. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2015

Distributed Systems. 23. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2015 Distributed Systems 23. Content Delivery Networks (CDN) Paul Krzyzanowski Rutgers University Fall 2015 November 17, 2015 2014-2015 Paul Krzyzanowski 1 Motivation Serving web content from one location presents

More information

The Feasibility of Supporting Large-Scale Live Streaming Applications with Dynamic Application End-Points

The Feasibility of Supporting Large-Scale Live Streaming Applications with Dynamic Application End-Points The Feasibility of Supporting Large-Scale Live Streaming Applications with Dynamic Application End-Points Kay Sripanidkulchai, Aditya Ganjam, Bruce Maggs, and Hui Zhang Instructor: Fabian Bustamante Presented

More information

CDN Brokering. Content Distribution Internetworking

CDN Brokering. Content Distribution Internetworking CDN Brokering Alex Biliris, Chuck Cranor, Fred Douglis, Michael Rabinovich, Sandeep Sibal, Oliver Spatscheck, and Walter Sturm AT&T Labs--Research March 12, 2001 Content Distribution Internetworking Definition:

More information

SiteCelerate white paper

SiteCelerate white paper SiteCelerate white paper Arahe Solutions SITECELERATE OVERVIEW As enterprises increases their investment in Web applications, Portal and websites and as usage of these applications increase, performance

More information

Teridion. Rethinking Network Performance. The Internet. Lightning Fast. Technical White Paper July, 2015 www.teridion.com

Teridion. Rethinking Network Performance. The Internet. Lightning Fast. Technical White Paper July, 2015 www.teridion.com Teridion The Internet. Lightning Fast. Rethinking Network Performance Technical White Paper July, 2015 www.teridion.com Executive summary Online services face the growing dual challenge of supporting many

More information

Measuring the Web: Part I - - Content Delivery Networks. Prof. Anja Feldmann, Ph.D. Dr. Ramin Khalili Georgios Smaragdakis, PhD

Measuring the Web: Part I - - Content Delivery Networks. Prof. Anja Feldmann, Ph.D. Dr. Ramin Khalili Georgios Smaragdakis, PhD Measuring the Web: Part I - - Content Delivery Networks Prof. Anja Feldmann, Ph.D. Dr. Ramin Khalili Georgios Smaragdakis, PhD Acknowledgement Material presented in these slides is borrowed from presentajons

More information

Routing & Traffic Analysis for Converged Networks. Filling the Layer 3 Gap in VoIP Management

Routing & Traffic Analysis for Converged Networks. Filling the Layer 3 Gap in VoIP Management Routing & Traffic Analysis for Converged Networks Filling the Layer 3 Gap in VoIP Management Executive Summary Voice over Internet Protocol (VoIP) is transforming corporate and consumer communications

More information

A Precise and Efficient Evaluation of the Proximity between Web Clients

A Precise and Efficient Evaluation of the Proximity between Web Clients A Precise and Efficient Evaluation of the Proximity between Web Clients and their Local DNS Servers Zhuoqing Morley Mao, Charles D. Cranor, Fred Douglis, Michael Rabinovich, Oliver Spatscheck, and Jia

More information

Web Application Hosting Cloud Architecture

Web Application Hosting Cloud Architecture Web Application Hosting Cloud Architecture Executive Overview This paper describes vendor neutral best practices for hosting web applications using cloud computing. The architectural elements described

More information

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution Peer-to-Peer Networks Chapter 6: P2P Content Distribution Chapter Outline Content distribution overview Why P2P content distribution? Network coding Peer-to-peer multicast Kangasharju: Peer-to-Peer Networks

More information

A Link Load Balancing Solution for Multi-Homed Networks

A Link Load Balancing Solution for Multi-Homed Networks A Link Load Balancing Solution for Multi-Homed Networks Overview An increasing number of enterprises are using the Internet for delivering mission-critical content and applications. By maintaining only

More information

John S. Otto Fabián E. Bustamante

John S. Otto Fabián E. Bustamante John S. Otto Fabián E. Bustamante Northwestern, EECS AIMS-4 CAIDA, SDSC, San Diego, CA Feb 10, 2012 http://aqualab.cs.northwestern.edu ! CDNs direct web clients to nearby content replicas! Several motivations

More information

Truffle Broadband Bonding Network Appliance

Truffle Broadband Bonding Network Appliance Truffle Broadband Bonding Network Appliance Reliable high throughput data connections with low-cost & diverse transport technologies PART I Truffle in standalone installation for a single office. Executive

More information

TRUFFLE Broadband Bonding Network Appliance. A Frequently Asked Question on. Link Bonding vs. Load Balancing

TRUFFLE Broadband Bonding Network Appliance. A Frequently Asked Question on. Link Bonding vs. Load Balancing TRUFFLE Broadband Bonding Network Appliance A Frequently Asked Question on Link Bonding vs. Load Balancing 5703 Oberlin Dr Suite 208 San Diego, CA 92121 P:888.842.1231 F: 858.452.1035 info@mushroomnetworks.com

More information

Load Balancing. Final Network Exam LSNAT. Sommaire. How works a "traditional" NAT? Un article de Le wiki des TPs RSM.

Load Balancing. Final Network Exam LSNAT. Sommaire. How works a traditional NAT? Un article de Le wiki des TPs RSM. Load Balancing Un article de Le wiki des TPs RSM. PC Final Network Exam Sommaire 1 LSNAT 1.1 Deployement of LSNAT in a globally unique address space (LS-NAT) 1.2 Operation of LSNAT in conjunction with

More information

AKAMAI WHITE PAPER. Delivering Dynamic Web Content in Cloud Computing Applications: HTTP resource download performance modelling

AKAMAI WHITE PAPER. Delivering Dynamic Web Content in Cloud Computing Applications: HTTP resource download performance modelling AKAMAI WHITE PAPER Delivering Dynamic Web Content in Cloud Computing Applications: HTTP resource download performance modelling Delivering Dynamic Web Content in Cloud Computing Applications 1 Overview

More information

Intelligent Content Delivery Network (CDN) The New Generation of High-Quality Network

Intelligent Content Delivery Network (CDN) The New Generation of High-Quality Network White paper Intelligent Content Delivery Network (CDN) The New Generation of High-Quality Network July 2001 Executive Summary Rich media content like audio and video streaming over the Internet is becoming

More information

EECS 489 Winter 2010 Midterm Exam

EECS 489 Winter 2010 Midterm Exam EECS 489 Winter 2010 Midterm Exam Name: This is an open-book, open-resources exam. Explain or show your work for each question. Your grade will be severely deducted if you don t show your work, even if

More information

Meeting Worldwide Demand for your Content

Meeting Worldwide Demand for your Content Meeting Worldwide Demand for your Content Evolving to a Content Delivery Network A Lucent Technologies White Paper By L. R. Beaumont 4/25/01 Meeting Worldwide Demand for your Content White Paper Table

More information

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

1. Comments on reviews a. Need to avoid just summarizing web page asks you for: 1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of

More information

DATA COMMUNICATOIN NETWORKING

DATA COMMUNICATOIN NETWORKING DATA COMMUNICATOIN NETWORKING Instructor: Ouldooz Baghban Karimi Course Book: Computer Networking, A Top-Down Approach, Kurose, Ross Slides: - Course book Slides - Slides from Princeton University COS461

More information

Network-Wide Class of Service (CoS) Management with Route Analytics. Integrated Traffic and Routing Visibility for Effective CoS Delivery

Network-Wide Class of Service (CoS) Management with Route Analytics. Integrated Traffic and Routing Visibility for Effective CoS Delivery Network-Wide Class of Service (CoS) Management with Route Analytics Integrated Traffic and Routing Visibility for Effective CoS Delivery E x e c u t i v e S u m m a r y Enterprise IT and service providers

More information

networks Live & On-Demand Video Delivery without Interruption Wireless optimization the unsolved mystery WHITE PAPER

networks Live & On-Demand Video Delivery without Interruption Wireless optimization the unsolved mystery WHITE PAPER Live & On-Demand Video Delivery without Interruption Wireless optimization the unsolved mystery - Improving the way the world connects - WHITE PAPER Live On-Demand Video Streaming without Interruption

More information

DEPLOYMENT GUIDE Version 1.1. DNS Traffic Management using the BIG-IP Local Traffic Manager

DEPLOYMENT GUIDE Version 1.1. DNS Traffic Management using the BIG-IP Local Traffic Manager DEPLOYMENT GUIDE Version 1.1 DNS Traffic Management using the BIG-IP Local Traffic Manager Table of Contents Table of Contents Introducing DNS server traffic management with the BIG-IP LTM Prerequisites

More information

ALTO and Content Delivery Networks dra7- penno- alto- cdn

ALTO and Content Delivery Networks dra7- penno- alto- cdn ALTO and Content Delivery Networks dra7- penno- alto- cdn Stefano Previdi, sprevidi@cisco.com Richard Alimi, ralimi@google.com Jan Medved, jmedved@juniper.net Reinaldo Penno, rpenno@juniper.net Richard

More information

Where Do You Tube? Uncovering YouTube Server Selection Strategy

Where Do You Tube? Uncovering YouTube Server Selection Strategy Where Do You Tube? Uncovering YouTube Server Selection Strategy Vijay Kumar Adhikari, Sourabh Jain, Zhi-Li Zhang University of Minnesota- Twin Cities Abstract YouTube is one of the most popular video sharing

More information

Distributed Systems. 25. Content Delivery Networks (CDN) 2014 Paul Krzyzanowski. Rutgers University. Fall 2014

Distributed Systems. 25. Content Delivery Networks (CDN) 2014 Paul Krzyzanowski. Rutgers University. Fall 2014 Distributed Systems 25. Content Delivery Networks (CDN) Paul Krzyzanowski Rutgers University Fall 2014 November 16, 2014 2014 Paul Krzyzanowski 1 Motivation Serving web content from one location presents

More information

Network Level Multihoming and BGP Challenges

Network Level Multihoming and BGP Challenges Network Level Multihoming and BGP Challenges Li Jia Helsinki University of Technology jili@cc.hut.fi Abstract Multihoming has been traditionally employed by enterprises and ISPs to improve network connectivity.

More information

Indirection. science can be solved by adding another level of indirection" -- Butler Lampson. "Every problem in computer

Indirection. science can be solved by adding another level of indirection -- Butler Lampson. Every problem in computer Indirection Indirection: rather than reference an entity directly, reference it ( indirectly ) via another entity, which in turn can or will access the original entity A x B "Every problem in computer

More information

1 2014 2013 Infoblox Inc. All Rights Reserved. Talks about DNS: architectures & security

1 2014 2013 Infoblox Inc. All Rights Reserved. Talks about DNS: architectures & security 1 2014 2013 Infoblox Inc. All Rights Reserved. Talks about DNS: architectures & security Agenda Increasing DNS availability using DNS Anycast Opening the internal DNS Enhancing DNS security DNS traffic

More information

Internet Protocol: IP packet headers. vendredi 18 octobre 13

Internet Protocol: IP packet headers. vendredi 18 octobre 13 Internet Protocol: IP packet headers 1 IPv4 header V L TOS Total Length Identification F Frag TTL Proto Checksum Options Source address Destination address Data (payload) Padding V: Version (IPv4 ; IPv6)

More information

A Topology-Aware Relay Lookup Scheme for P2P VoIP System

A Topology-Aware Relay Lookup Scheme for P2P VoIP System Int. J. Communications, Network and System Sciences, 2010, 3, 119-125 doi:10.4236/ijcns.2010.32018 Published Online February 2010 (http://www.scirp.org/journal/ijcns/). A Topology-Aware Relay Lookup Scheme

More information

Inter-domain Routing Basics. Border Gateway Protocol. Inter-domain Routing Basics. Inter-domain Routing Basics. Exterior routing protocols created to:

Inter-domain Routing Basics. Border Gateway Protocol. Inter-domain Routing Basics. Inter-domain Routing Basics. Exterior routing protocols created to: Border Gateway Protocol Exterior routing protocols created to: control the expansion of routing tables provide a structured view of the Internet by segregating routing domains into separate administrations

More information

Denial of Service Attacks and Resilient Overlay Networks

Denial of Service Attacks and Resilient Overlay Networks Denial of Service Attacks and Resilient Overlay Networks Angelos D. Keromytis Network Security Lab Computer Science Department, Columbia University Motivation: Network Service Availability Motivation:

More information

Content Delivery Networks. Shaxun Chen April 21, 2009

Content Delivery Networks. Shaxun Chen April 21, 2009 Content Delivery Networks Shaxun Chen April 21, 2009 Outline Introduction to CDN An Industry Example: Akamai A Research Example: CDN over Mobile Networks Conclusion Outline Introduction to CDN An Industry

More information

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM? MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM? Ashutosh Shinde Performance Architect ashutosh_shinde@hotmail.com Validating if the workload generated by the load generating tools is applied

More information

Intelligent Routing Platform White Paper

Intelligent Routing Platform White Paper White Paper Table of Contents 1. Executive Summary...3 2. The Challenge of a Multi-Homed Environment...4 3. Network Congestion and Blackouts...4 4. Intelligent Routing Platform...5 4.1 How It Works...5

More information

Experimentation with the YouTube Content Delivery Network (CDN)

Experimentation with the YouTube Content Delivery Network (CDN) Experimentation with the YouTube Content Delivery Network (CDN) Siddharth Rao Department of Computer Science Aalto University, Finland siddharth.rao@aalto.fi Sami Karvonen Department of Computer Science

More information

The Importance of High Customer Experience

The Importance of High Customer Experience SoftLayer Investments Drive Growth and Improved Customer Experience A Neovise Vendor Perspective Report 2010 Neovise, LLC. All Rights Reserved. Executive Summary Hosting and datacenter services provider

More information

Cisco IOS Flexible NetFlow Technology

Cisco IOS Flexible NetFlow Technology Cisco IOS Flexible NetFlow Technology Last Updated: December 2008 The Challenge: The ability to characterize IP traffic and understand the origin, the traffic destination, the time of day, the application

More information

Networking Topology For Your System

Networking Topology For Your System This chapter describes the different networking topologies supported for this product, including the advantages and disadvantages of each. Select the one that best meets your needs and your network deployment.

More information

Testing & Assuring Mobile End User Experience Before Production. Neotys

Testing & Assuring Mobile End User Experience Before Production. Neotys Testing & Assuring Mobile End User Experience Before Production Neotys Agenda Introduction The challenges Best practices NeoLoad mobile capabilities Mobile devices are used more and more At Home In 2014,

More information

Scaling with Zeus Global Load Balancer

Scaling with Zeus Global Load Balancer White Paper Scaling with Zeus Global Load Balancer Zeus. Why wait Contents Introduction... 3 Server Load Balancing within a Datacenter... 3 Global Server Load Balancing between Datacenters... 3 Who might

More information

Whitepaper. A Guide to Ensuring Perfect VoIP Calls. www.sevone.com blog.sevone.com info@sevone.com

Whitepaper. A Guide to Ensuring Perfect VoIP Calls. www.sevone.com blog.sevone.com info@sevone.com A Guide to Ensuring Perfect VoIP Calls VoIP service must equal that of landlines in order to be acceptable to both hosts and consumers. The variables that affect VoIP service are numerous and include:

More information

Demand Routing in Network Layer for Load Balancing in Content Delivery Networks

Demand Routing in Network Layer for Load Balancing in Content Delivery Networks Demand Routing in Network Layer for Load Balancing in Content Delivery Networks # A SHRAVANI, 1 M.Tech, Computer Science Engineering E mail: sravaniathome@gmail.com # SYED ABDUL MOEED 2 Asst.Professor,

More information

Global Load Balancing with Brocade Virtual Traffic Manager

Global Load Balancing with Brocade Virtual Traffic Manager WHITE PAPER Global Load Balancing with Brocade Virtual Traffic Manager Introduction Every year, global enterprises suffer application downtime due to failures in software or infrastructure, whether the

More information

Tunnel Broker System Using IPv4 Anycast

Tunnel Broker System Using IPv4 Anycast Tunnel Broker System Using IPv4 Anycast Xin Liu Department of Electronic Engineering Tsinghua Univ. lx@ns.6test.edu.cn Xing Li Department of Electronic Engineering Tsinghua Univ. xing@cernet.edu.cn ABSTRACT

More information

The changing face of global data network traffic

The changing face of global data network traffic The changing face of global data network traffic Around the turn of the 21st century, MPLS very rapidly became the networking protocol of choice for large national and international institutions. This

More information

Content Delivery Networks (CDN) Dr. Yingwu Zhu

Content Delivery Networks (CDN) Dr. Yingwu Zhu Content Delivery Networks (CDN) Dr. Yingwu Zhu Web Cache Architecure Local ISP cache cdn Reverse Reverse Proxy Reverse Proxy Reverse Proxy Proxy L4 Switch Content Content Content Server Content Server

More information

TrustNet CryptoFlow. Group Encryption WHITE PAPER. Executive Summary. Table of Contents

TrustNet CryptoFlow. Group Encryption WHITE PAPER. Executive Summary. Table of Contents WHITE PAPER TrustNet CryptoFlow Group Encryption Table of Contents Executive Summary...1 The Challenges of Securing Any-to- Any Networks with a Point-to-Point Solution...2 A Smarter Approach to Network

More information

Internet Content Distribution

Internet Content Distribution Internet Content Distribution Chapter 4: Content Distribution Networks (TUD Student Use Only) Chapter Outline Basics of content distribution networks (CDN) Why CDN? How do they work? Client redirection

More information

The Effectiveness of Request Redirection on CDN Robustness

The Effectiveness of Request Redirection on CDN Robustness The Effectiveness of Request Redirection on CDN Robustness Limin Wang, Vivek Pai and Larry Peterson Presented by: Eric Leshay Ian McBride Kai Rasmussen 1 Outline! Introduction! Redirection Strategies!

More information

VMDC 3.0 Design Overview

VMDC 3.0 Design Overview CHAPTER 2 The Virtual Multiservice Data Center architecture is based on foundation principles of design in modularity, high availability, differentiated service support, secure multi-tenancy, and automated

More information

Network Positioning System

Network Positioning System Network Positioning System How service provider infrastructure can support rapid growth of video, cloud and application traffic Stefano Previdi sprevidi@cisco.com Distinguished Engineer Cisco Systems 1

More information

A Framework for Scalable Global IP-Anycast (GIA)

A Framework for Scalable Global IP-Anycast (GIA) A Framework for Scalable Global IP-Anycast (GIA) Dina Katabi, John Wroclawski MIT Laboratory for Computer Science 545 Technology Square Cambridge, MA 02139 {dina,jtw}@lcs.mit.edu ABSTRACT This paper proposes

More information

HUAWEI OceanStor 9000. Load Balancing Technical White Paper. Issue 01. Date 2014-06-20 HUAWEI TECHNOLOGIES CO., LTD.

HUAWEI OceanStor 9000. Load Balancing Technical White Paper. Issue 01. Date 2014-06-20 HUAWEI TECHNOLOGIES CO., LTD. HUAWEI OceanStor 9000 Load Balancing Technical Issue 01 Date 2014-06-20 HUAWEI TECHNOLOGIES CO., LTD. Copyright Huawei Technologies Co., Ltd. 2014. All rights reserved. No part of this document may be

More information

CS514: Intermediate Course in Computer Systems

CS514: Intermediate Course in Computer Systems : Intermediate Course in Computer Systems Lecture 7: Sept. 19, 2003 Load Balancing Options Sources Lots of graphics and product description courtesy F5 website (www.f5.com) I believe F5 is market leader

More information

Backbone Capacity Planning Methodology and Process

Backbone Capacity Planning Methodology and Process Backbone Capacity Planning Methodology and Process A Technical Paper prepared for the Society of Cable Telecommunications Engineers By Leon Zhao Senior Planner, Capacity Time Warner Cable 13820 Sunrise

More information

Optimize Application Delivery Across Your Globally Distributed Data Centers

Optimize Application Delivery Across Your Globally Distributed Data Centers BIG IP Global Traffic Manager DATASHEET What s Inside: 1 Key Benefits 2 Globally Available Applications 4 Simple Management 5 Secure Applications 6 Network Integration 6 Architecture 7 BIG-IP GTM Platforms

More information

Akamai CDN, IPv6 and DNS security. Christian Kaufmann Akamai Technologies DENOG 5 14 th November 2013

Akamai CDN, IPv6 and DNS security. Christian Kaufmann Akamai Technologies DENOG 5 14 th November 2013 Akamai CDN, IPv6 and DNS security Christian Kaufmann Akamai Technologies DENOG 5 14 th November 2013 Agenda Akamai Introduction Who s Akamai? Intelligent Platform & Traffic Snapshot Basic Technology Akamai

More information

Facility Usage Scenarios

Facility Usage Scenarios Facility Usage Scenarios GDD-06-41 GENI: Global Environment for Network Innovations December 22, 2006 Status: Draft (Version 0.1) Note to the reader: this document is a work in progress and continues to

More information

Content Delivery Networks

Content Delivery Networks Content Delivery Networks Terena 2000 ftp://ftpeng.cisco.com/sgai/t2000cdn.pdf Silvano Gai Cisco Systems, USA Politecnico di Torino, IT sgai@cisco.com Terena 2000 1 Agenda What are Content Delivery Networks?

More information

TRUFFLE Broadband Bonding Network Appliance BBNA6401. A Frequently Asked Question on. Link Bonding vs. Load Balancing

TRUFFLE Broadband Bonding Network Appliance BBNA6401. A Frequently Asked Question on. Link Bonding vs. Load Balancing TRUFFLE Broadband Bonding Network Appliance BBNA6401 A Frequently Asked Question on Link Bonding vs. Load Balancing LBRvsBBNAFeb15_08b 1 Question: What's the difference between a Truffle Broadband Bonding

More information

DOMINO Broadband Bonding Network

DOMINO Broadband Bonding Network 2 DOMINO AGGREGATION DE VOIES ETHERNET N 1 Bridging to the Future par [Hypercable] DOMINO DOMINO Broadband BondingTM Network Appliance With cellular data card failover/aggregation capability DANS CE NUMERO

More information

Efficient and low cost Internet backup to Primary Video lines

Efficient and low cost Internet backup to Primary Video lines Efficient and low cost Internet backup to Primary Video lines By Adi Rozenberg, CTO Table of Contents Chapter 1. Introduction... 1 Chapter 2. The DVP100 solution... 2 Chapter 3. VideoFlow 3V Technology...

More information

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Introduction For companies that want to quickly gain insights into or opportunities from big data - the dramatic volume growth in corporate

More information

DNS, CDNs Weds March 17 2010 Lecture 13. What is the relationship between a domain name (e.g., youtube.com) and an IP address?

DNS, CDNs Weds March 17 2010 Lecture 13. What is the relationship between a domain name (e.g., youtube.com) and an IP address? DNS, CDNs Weds March 17 2010 Lecture 13 DNS What is the relationship between a domain name (e.g., youtube.com) and an IP address? DNS is the system that determines this mapping. Basic idea: You contact

More information

Making the Internet fast, reliable and secure. DE-CIX Customer Summit - 2014. Steven Schecter <schecter@akamai.com>

Making the Internet fast, reliable and secure. DE-CIX Customer Summit - 2014. Steven Schecter <schecter@akamai.com> Making the Internet fast, reliable and secure DE-CIX Customer Summit - 2014 Steven Schecter What is a Content Distribution Network RFCs and Internet Drafts define a CDN as: Content

More information

Validating the System Behavior of Large-Scale Networked Computers

Validating the System Behavior of Large-Scale Networked Computers Validating the System Behavior of Large-Scale Networked Computers Chen-Nee Chuah Robust & Ubiquitous Networking (RUBINET) Lab http://www.ece.ucdavis.edu/rubinet Electrical & Computer Engineering University

More information

Decoding DNS data. Using DNS traffic analysis to identify cyber security threats, server misconfigurations and software bugs

Decoding DNS data. Using DNS traffic analysis to identify cyber security threats, server misconfigurations and software bugs Decoding DNS data Using DNS traffic analysis to identify cyber security threats, server misconfigurations and software bugs The Domain Name System (DNS) is a core component of the Internet infrastructure,

More information

Week 3 / Paper 2. Bernhard Ager, Wolfgang Mühlbauer, Georgios Smaragdakis, Steve Uhlig ACM IMC 2010.

Week 3 / Paper 2. Bernhard Ager, Wolfgang Mühlbauer, Georgios Smaragdakis, Steve Uhlig ACM IMC 2010. Week 3 / Paper 2 Comparing DNS Resolvers in the Wild Bernhard Ager, Wolfgang Mühlbauer, Georgios Smaragdakis, Steve Uhlig ACM IMC 2010. Main point How does ISP DNS compare with Google DNS and OpenDNS?

More information

Measuring CDN Performance. Hooman Beheshti, VP Technology

Measuring CDN Performance. Hooman Beheshti, VP Technology Measuring CDN Performance Hooman Beheshti, VP Technology Why this matters Performance is one of the main reasons we use a CDN Seems easy to measure, but isn t Performance is an easy way to comparison shop

More information

Technical Bulletin. Enabling Arista Advanced Monitoring. Overview

Technical Bulletin. Enabling Arista Advanced Monitoring. Overview Technical Bulletin Enabling Arista Advanced Monitoring Overview Highlights: Independent observation networks are costly and can t keep pace with the production network speed increase EOS eapi allows programmatic

More information

SwiftStack Global Cluster Deployment Guide

SwiftStack Global Cluster Deployment Guide OpenStack Swift SwiftStack Global Cluster Deployment Guide Table of Contents Planning Creating Regions Regions Connectivity Requirements Private Connectivity Bandwidth Sizing VPN Connectivity Proxy Read

More information

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Scalability of web applications. CSCI 470: Web Science Keith Vertanen Scalability of web applications CSCI 470: Web Science Keith Vertanen Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing Approaches

More information

F5 Intelligent DNS Scale. Philippe Bogaerts Senior Field Systems Engineer mailto: p.bogaerts@f5.com Mob.: +32 473 654 689

F5 Intelligent DNS Scale. Philippe Bogaerts Senior Field Systems Engineer mailto: p.bogaerts@f5.com Mob.: +32 473 654 689 F5 Intelligent Scale Philippe Bogaerts Senior Field Systems Engineer mailto: p.bogaerts@f5.com Mob.: +32 473 654 689 Intelligent and scalable PROTECTS web properties and brand reputation IMPROVES web application

More information

How To Manage A Network On A Network With A Global Server (Networking)

How To Manage A Network On A Network With A Global Server (Networking) HIGH AVAILABILITY STRATEGY - GLOBAL TRAFFIC MANAGEMENT PROTOTYPE REPORT Version 1-00 Document Control Number 2460-00004 11/04/2008 Consortium for Ocean Leadership 1201 New York Ave NW, 4 th Floor, Washington

More information

Alteon Global Server Load Balancing

Alteon Global Server Load Balancing Alteon Global Server Load Balancing Whitepaper GSLB Operation Overview Major Components Distributed Site Monitoring Distributed Site State Protocol Internet Topology Awareness DNS Authoritative Name Server

More information

Photonic Switching Applications in Data Centers & Cloud Computing Networks

Photonic Switching Applications in Data Centers & Cloud Computing Networks Photonic Switching Applications in Data Centers & Cloud Computing Networks 2011 CALIENT Technologies www.calient.net 1 INTRODUCTION In data centers and networks, video and cloud computing are driving an

More information

Introduction. The Inherent Unpredictability of IP Networks # $# #

Introduction. The Inherent Unpredictability of IP Networks # $# # Introduction " $ % & ' The Inherent Unpredictability of IP Networks A major reason that IP became the de facto worldwide standard for data communications networks is its automated resiliency based on intelligent

More information