AKAMAI WHITE PAPER Delivering Dynamic Web Content in Cloud Computing Applications: HTTP resource download performance modelling
Delivering Dynamic Web Content in Cloud Computing Applications 1 Overview This white paper models the time it takes a client (e.g. a web browser) to fetch an http resource from the origin server, based on a small set of parameters, and compare the expected performance of the direct (=BGP) client-origin path with a CDN (e.g. Akamai SureRoute ) client-edge-intermediary-origin path. Along the way, we highlight the main parameters impacting performance, and provide origin settings recommendations to improve end-user experience. Based on this model, we created an interactive web page where the reader can visualize the expected download time of a website dynamic resource as a function of the distance client-server. Introduction Despite the relentless progress in processing power and the ever increasing network bandwidths, web pages on average are taking longer to load (see HTTP Archive). Yet at the same time, it is now well established that end-user satisfaction is directly impacted by the speed at which a page is served. An obvious solution is to cache content at the edge, reducing considerably its latency. However, even the cached pages once in a while have to be refreshed from the origin, and there are whole categories of resources that cannot be cached altogether, either because they cannot afford any staleness, or they hold non-shareable information. To address the non-cached resource acceleration issue, Akamai uses the SureRoute technology, which picks the fastest of up to 3 concurrent paths from an edge server to the origin infrastructure, one of them going directly (=BGP driven) to the origin (when not protected by SiteShield ), the other 2 going thru intermediary servers. Given the myriad of parameters contributing to web resource latencies and some of the complexity inherent to the TCP protocol, it is not surprising that little information has been made available so far on the internet allowing a reader to fully understand, let alone predict, the time it takes a client device to download a web resource from its origin server. Main Parameters Impacting Http Downloads Performance ISP/Carrier Exit point Global Settings: Http vs Https Resource size Max. Transmission Unit Client Settings: Origin Infrastructure Rwin = TCP receiving buffer size New or Existing Connection Connection Persistence Last Mile: Bandwidth RTT Loss Middle Mile: RTT Loss First Mile: Response Generation TCP initial congestion window Connection Persistence
Delivering Dynamic Web Content in Cloud Computing Applications 2 This document aims at filling this gap. Using empirical measurements from the Akamai platform, well documented TCP behaviors and the latest State of the Internet report data, we implemented a predictive model and layered a graphical representation on top of it, so the reader can visually gauge the performance differences, whether Akamai SureRoute is used or not to fetch content from the origin. Round Trip Time (RTT) This is one of the main parameters influencing the download times, yet has proven difficult to model either mathematically or empirically. RTT does have a lower bound, computed by the time it would take a ray of light in a vacuum to travel in a straight line between the client and its origin server and back. However in real life situations, the RTT of a single packet is much slower for the following reasons: First-mile: In most cases, origin infrastructure first-mile does not incur significant additional rtt times, and we ll therefore omit it for the rest of the document. Middle-mile: Non-straight lines between the source and the destination - This is primarily the result of ISP business peering relationships, not always motivated by ensuring fastest IP packet delivery! It is common to see the internet public route, driven by BGP tables, be more than twice as long as the geometrical distance. Light speed in fiber optic is about 40 % slower than in vacuum Congestions and delays incurred in routers and other internet hops. Unreliable networks. Hardware failures, DDoS attacks, mis-configurations, cable cuts or de-peering can all combine to increased latency or even plain failures in extreme cases Last-mile: Many residential users connect to the internet backbone thru a less efficient line provided by their ISP, incurring a last-mile RTT penalty, typically in the 20 to 45 ms range. When using mobile devices, this penalty jumps to 60 ms for 4G LTE and up to 300 ms for older 3G networks. Let s call RTT Ratio, or RTTR, the ratio between the fastest theoretical RTT and the observed RTT on the public internet. In order to model its middle-mile value, we leveraged the large footprint of the Akamai platform and conducted sets of ping point measurements spread across the world. Throughout the US and Europe, we tracked the RTTR values against the geometric distance between the source and the destination: We found that, on average and for distances greater than 500 miles: RTTR is geographically uniform and fairly independent from the distance source-target, at least throughout Americas and Europe It holds the same value over time It has a median value of 2.9, with a vast majority of data points between 2 and 4
Delivering Dynamic Web Content in Cloud Computing Applications 3 From these experimental results, we can therefore model: Observed RTT (ms) = Last-mile RTT (ms) + RTTR * 0.0108 * Distance (miles), or based on empirical results: Middle-Mile RTT (ms) ~ 3.1 % * Distance (miles) We can now plot the expected RTTs as a function of client-origin distance for common use-cases: Expected RTT (ms) 280 210 140 70 Average RTT (30 ms ISP RTT, RTTR=3) Lower-Bound RTT (5 ms LAN RTT, RTTR=2) Higher-Bound RTT (200 ms mobile RTT, RTTR=4) 0 500 1000 1500 2000 2500 Distance (miles) Throughput Next to RTT, throughput is another important factor to determine a resource download latency. It is the maximum amount of data that can be downloaded by unit of time. 3 factors limit it: The maximum data that is yet to be acknowledged by the client (in flight) Throughput <= RWIN / RTT. The client receiving window (RWIN), is commonly set to a value of 65.7 KB. The connection packet loss probability Throughput <= C * MSS / (RTT* (packet loss)), where the maximum segment size (MSS) is 1460 bytes for a vast majority of connections. 1<=C<=1.3, we ll use C=1 for the remainder of this document. The maximum bandwidth provided by the ISP to the user, usually in the 2 to 20+ Mbps range for residential users, much higher for LAN settings.
Delivering Dynamic Web Content in Cloud Computing Applications 4 Let s review their relative impact. The graph below represents the maximum bandwidth as a function of the packet loss percentage for various values of RTT: The line break is hit for probability of about 0.05%. For smaller loss probability, RWIN limits the throughput. For higher ones, the loss probability bounds it. 24 Max Throughput (Mbps) 18 12 6 RTT = 25 ms RTT = 50 ms RTT = 100 ms RTT = 200 ms 0 0.01 500 1000 1500 2000 2500 Percent Packet Loss (log) From information gathered by Akamai in their State of the Internet quarterly reports (dated Q4 2013) we know that the average worldwide connection speed for end users is about 4 Mbps (10 Mbps in the US) with peak average values over 20 Mbps (over 40 Mbps in the US). 20 Mbps is also the maximum throughput allowed by a RWIN of 65.7 K for an RTT of 25 ms. Since the last-mile connection time alone for most users is 25 ms or more, it is reasonable to assess that the ISP maximum provided bandwidth is rarely a limiting factor. Users with close or direct access to the internet backbone, with virtually no last-mile time penalty, usually also enjoy very large bandwidths, so even for shorter TTLs the bandwidth is rarely a bottleneck. Similarly, because most RTT values are greater than 25 ms and loss probability is greater than 0.05%, the RWIN size is not generally a limiting factor for TCP object download. For users close to the backbone, enjoying a virtually lossless connection and very high bandwidth, the prevalent value of 65.7 K RWIN may become a bottleneck for large file download and users may consider augmenting it. However for these users, performance is generally not a significant issue so the need for optimization is not as critical. To conclude this chapter, we ve shown that for most users and most websites, the time it takes to download objects is not limited by either the last-mile advertised bandwidth or the standard receiving client buffer size
Delivering Dynamic Web Content in Cloud Computing Applications 5 Resource Download Time Before we can compute the overall latency of a downloaded object, and due to TCP s implementation behavior, we have to consider 2 separate cases, whether the object is downloaded thru an existing connection or whether it initiates a new one. New Connection The amount of data transferred during each round trip is dictated by the TCP slow-start mechanism, where the payload doubles in size with each round trip after the initial connection is established. This initial connection takes one round trip for non-secure (HTTP) communications, or 3 round trips for secure ones (HTTPS). With a maximum segment size of 1460 bytes, we can graph the amount of data transferred per round trip as a function of the initial window size (in number of packets, or frames): 80 Data transferred by round trip (KB), loseless connection 60 40 20 initcwnd = 3, http initcwnd = 10, http initcwnd = 3, https initcwnd = 10, https 0 1 2 3 4 5 Round Trip # We readily see the significant impact of the initial window size on the amount of data transferred for new connections. A draft IETF recommendation to set the default size up to 10 segments has been submitted for this purpose. This graph also emphasizes the benefits for both the clients and the servers to keep their connections open, so they can be reused for subsequent object downloads. Existing Connections The graph above is only valid at the beginning of the connection. At some point in time, the throughput becomes bound by the formula highlighted in the throughput section above and reaches an average steady state value: Average Steady State Throughput (KB/ms) = Minimum of [ 1.46 / (RTT*packet loss) and RWIN / RTT ]
Delivering Dynamic Web Content in Cloud Computing Applications 6 CDN Intermediaries Impact on Latency Since the majority of high traffic websites leverage the services of Akamai or other CDN, it is worth investigating their impact on the delivery of non-cached website content, especially over long distances. When SureRoute is used, an http request flows from the client to an edge server, located close to the user (or his/her ISP exit point), through an intermediary, and to the origin infrastructure. As such, 2 extra sets of connections are established compared to the direct-to-origin route: Client Public (=BGP) route. Origin Server Optimized (=CDN) route. CDN Edge Server CDN Intermediary Overall latency is computed by adding the latency from the client to the edge server, plus the latency from the edge server to the intermediary server and finally the latency from the intermediary to the origin. Although the initial requests spawn 3 distinct connections on its way to the origin infrastructure, the positioning of the Akamai servers relative to the client and origin, their internal TCP settings and their ability to pick alternate routes (compared to the default one picked by BGP) makes this alternate path faster in most instances. Putting it all together Based on the data modelling detailed above, we created an interactive web page where the reader can visualize the expected download time of a website dynamic resources as a function of the distance client-server, with and without SureRoute intermediaries. As the global leader in Content Delivery Network (CDN) services, Akamai makes the Internet fast, reliable and secure for its customers. The company s advanced web performance, mobile performance, cloud security and media delivery solutions are revolutionizing how businesses optimize consumer, enterprise and entertainment experiences for any device, anywhere. To learn how Akamai solutions and its team of Internet experts are helping businesses move faster forward, please visit www.akamai.com or blogs.akamai.com, and follow @Akamai on Twitter. Akamai is headquartered in Cambridge, Massachusetts in the United States with operations in more than 40 offices around the world. Our services and renowned customer care enable businesses to provide an unparalleled Internet experience for their customers worldwide. Addresses, phone numbers and contact information for all locations are listed on www.akamai.com/locations. 2015 Akamai Technologies, Inc. All Rights Reserved. Reproduction in whole or in part in any form or medium without express written permission is prohibited. Akamai and the Akamai wave logo are registered trademarks. Other trademarks contained herein are the property of their respective owners. Akamai believes that the information in this publication is accurate as of its publication date; such information is subject to change without notice. Published 05/15.