Tema 5: Distribución de contenidos



Similar documents
Reliable Distributed Systems

CSC2231: Akamai. Stefan Saroiu Department of Computer Science University of Toronto

How To Understand The Power Of A Content Delivery Network (Cdn)

Web Caching and CDNs. Aditya Akella

DATA COMMUNICATOIN NETWORKING

Measuring the Web: Part I - - Content Delivery Networks. Prof. Anja Feldmann, Ph.D. Dr. Ramin Khalili Georgios Smaragdakis, PhD

Content Delivery Networks (CDN) Dr. Yingwu Zhu

Content Distribu-on Networks (CDNs)

How To Test Performance Of A Cdn Server

Internet Content Distribution

Overlay Networks. Slides adopted from Prof. Böszörményi, Distributed Systems, Summer 2004.

Internet Content Distribution

CDN and Traffic-structure

Request Routing, Load-Balancing and Fault- Tolerance Solution - MediaDNS

Communications Software. CSE 123b. CSE 123b. Spring Lecture 13: Load Balancing/Content Distribution. Networks (plus some other applications)

Advanced Networking Technologies

ICP. Cache Hierarchies. Squid. Squid Cache ICP Use. Squid. Squid

SiteCelerate white paper

Distributed Systems. 23. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2015

Distributed Systems 19. Content Delivery Networks (CDN) Paul Krzyzanowski

Distributed Systems. 25. Content Delivery Networks (CDN) 2014 Paul Krzyzanowski. Rutgers University. Fall 2014

Distributed Systems. 24. Content Delivery Networks (CDN) 2013 Paul Krzyzanowski. Rutgers University. Fall 2013

Global Server Load Balancing

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

High volume Internet data centers. MPLS-based Request Routing. Current dispatcher technology. MPLS-based architecture

Indirection. science can be solved by adding another level of indirection" -- Butler Lampson. "Every problem in computer

Content Delivery Networks

Overview. Tor Circuit Setup (1) Tor Anonymity Network

Content Delivery Networks. Shaxun Chen April 21, 2009

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

The Domain Name Service, Etc. Jeff Chase Duke University, Department of Computer Science CPS 212: Distributed Information Systems

EECS 489 Winter 2010 Midterm Exam

Experimentation with the YouTube Content Delivery Network (CDN)

The old Internet. Software in the Network: Outline. Traditional Design. 1) Basic Caching. The Arrival of Software (in the network)

The Web History (I) The Web History (II)

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. Caching, Content Distribution and Load Balancing

GLOBAL SERVER LOAD BALANCING WITH SERVERIRON

From Internet Data Centers to Data Centers in the Cloud

Web Application Hosting Cloud Architecture

AKAMAI WHITE PAPER. Delivering Dynamic Web Content in Cloud Computing Applications: HTTP resource download performance modelling

DNS, CDNs Weds March Lecture 13. What is the relationship between a domain name (e.g., youtube.com) and an IP address?

A Precise and Efficient Evaluation of the Proximity between Web Clients

FortiBalancer: Global Server Load Balancing WHITE PAPER

Load Balancing Web Applications

Global Server Load Balancing

Computer Networks - CS132/EECS148 - Spring

The Value of Content Distribution Networks Mike Axelrod, Google Google Public

Protocolo HTTP. Web and HTTP. HTTP overview. HTTP overview

Rapid IP redirection with SDN and NFV. Jeffrey Lai, Qiang Fu, Tim Moors December 9, 2015

Meeting Worldwide Demand for your Content

First Midterm for ECE374 02/25/15 Solution!!

Intelligent Content Delivery Network (CDN) The New Generation of High-Quality Network

FAQs for Oracle iplanet Proxy Server 4.0

The Value of a Content Delivery Network

Akamai CDN, IPv6 and DNS security. Christian Kaufmann Akamai Technologies DENOG 5 14 th November 2013

COMP 361 Computer Communications Networks. Fall Semester Midterm Examination

Measuring CDN Performance. Hooman Beheshti, VP Technology

A Guide to WAN Application Delivery for the SME Market

Final for ECE374 05/06/13 Solution!!

high-quality steaming over the Internet

Octoshape s Multicast Technology Suite:

The Measured Performance of Content Distribution Networks

Open Issues in Content Distribution

Internet Firewall CSIS Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS net15 1. Routers can implement packet filtering

Content Delivery Networks

A Link Load Balancing Solution for Multi-Homed Networks

The Effect of Caches for Mobile Broadband Internet Access

Results-Oriented Application Acceleration with FastView Because Every Second Counts Whitepaper

On the Use and Performance of Content Distribution Networks

Sage 300 ERP Online. Mac Resource Guide. (Formerly Sage ERP Accpac Online) Updated June 1, Page 1

Sage ERP Accpac Online

Front-End Performance Testing and Optimization

Key Components of WAN Optimization Controller Functionality

The Effectiveness of Request Redirection on CDN Robustness

Getting Started with AWS. Hosting a Static Website

Alteon Global Server Load Balancing

Load Balancing. Final Network Exam LSNAT. Sommaire. How works a "traditional" NAT? Un article de Le wiki des TPs RSM.

Building a Highly Available and Scalable Web Farm

CS514: Intermediate Course in Computer Systems

THE MASTER LIST OF DNS TERMINOLOGY. v 2.0

Computer Networking LAB 2 HTTP

Debugging With Netalyzr

DNS ROUND ROBIN HIGH-AVAILABILITY LOAD SHARING

1 Introduction: Network Applications

How To Model The Content Delivery Network (Cdn) From A Content Bridge To A Cloud (Cloud)

Demand Routing in Network Layer for Load Balancing in Content Delivery Networks

BT Internet Connect Global - Annex to the General Service Schedule

LOAD BALANCING IN WEB SERVER

CS101 Lecture 19: Internetworking. What You ll Learn Today

Internet Control Protocols Reading: Chapter 3

Acceleration Systems Performance Assessment Tool (PAT) User Guide v 2.1

Teridion. Rethinking Network Performance. The Internet. Lightning Fast. Technical White Paper July,

CDN Brokering. Content Distribution Internetworking

Testing & Assuring Mobile End User Experience Before Production. Neotys

How To Load Balance On A Bgg On A Network With A Network (Networking) On A Pc Or Ipa On A Computer Or Ipad On A 2G Network On A Microsoft Ipa (Netnet) On An Ip

networks Live & On-Demand Video Delivery without Interruption Wireless optimization the unsolved mystery WHITE PAPER

TCP over Multi-hop Wireless Networks * Overview of Transmission Control Protocol / Internet Protocol (TCP/IP) Internet Protocol (IP)

COMP 631: COMPUTER NETWORKS. How to distribute content without requiring centralized, heavy-duty servers? Peer-to-peer content distribution

Traffic delivery evolution in the Internet ENOG 4 Moscow 23 rd October 2012

Transcription:

Tema 5: Distribución de contenidos 1. Introducción. 2. Arquitecturas. Cliente-Servidor Web proxies. Réplica de contenidos. 3. Caching y balanceado de carga 4. Un caso La red Akamai Bibliografía [GIL11] Gilbert Held, A practical Guide to Content Delivery Networks [FLU95] Fluckiger, Understanding networked multimedia. Arquitecturas de red para la distribución de contenidos

Objetivo: 1. Introducción. Conocer los mecanismos y arquitecturas para la distribución eficiente y escalable de contenidos en Internet. Para ello Revisaremos los diferentes arquitecturas de distribución de contenidos en Internet analizando sus ventajas e inconvenientes. Nos centraremos en el concepto de red de distribución de contenidos CDN Definición Mecanismos de distribución, redirección y gestión. Examinaremos un ejemplo de éxito: La red Akamai. 2

Single-site Single-Server 3 Arquitecturas de red para la distribución de contenidos

Single-site Single-Server Advantages: Reduced HW/SW Cost Disadvantages: Failure of server HW/SW maintenance while servicing Users experience unequal access delays Networking/Processing scalability problems. 4

Single-site Multiple-Servers: Server Farm Advantages: Resource Load Balancing Error resilience improved. HW/SW upgrading without service disrupting Disadvantages: Users experience unequal access delays Networking scalability problems?? HW/SW cost increases 5

Multiple-Sites Single-Server: Mirrors Advantages: Content is closer to users Fast response Supports network failure at origin server site Disadvantages: Keep content updated Source server site require redundant network access services HW/SW cost increases?? 6

Client-side devices: Web Proxies Web/Content Proxies: Client-side agents accessing web contents Caching Content Saves network resources Reduces server load Speedups web responses 7

Proxies play both roles A server to the client A client to the server Web Proxies are Intermediaries Proxy www.google.com www.cnn.com 8

Proxy Caching Client #1 requests http://www.foo.com/fun.jpg Client sends GET fun.jpg to the proxy Proxy sends GET fun.jpg to the server Server sends response to the proxy Proxy stores the response, and forwards to client Client #2 requests http://www.foo.com/fun.jpg Client sends GET fun.jpg to the proxy Proxy sends response to the client from the cache Benefits Faster response time to the clients Lower load on the Web server Reduced bandwidth consumption inside the network 9

Explicit configuration Browser configured to use a proxy Directs all requests through the proxy Problem requires user action Getting Requests to the Proxy Transparent proxy (or interception proxy ) Proxy lies in path from the client to the servers Proxy intercepts packets en route to the server and interposes itself in the data transfer Benefit does not require user action 10

Challenges of Transparent Proxies Must ensure all packets pass by the proxy By placing it at the only access point to the Internet E.g., at the border router of a campus or company Overhead of reconstructing the requests Must intercept the packets as they fly by and reconstruct into the ordered by stream May be viewed as a violation of user privacy The user does not know the proxy lies in the path Proxy may be keeping logs of the user s requests 11

Anonymization Other Functions of Web Proxies Server sees requests coming from the proxy address rather than the individual user IP addresses Transcoding Converting data from one form to another E.g., reducing the size of images for cell-phone browsers Prefetching Requesting content before the user asks for it Filtering Blocking access to sites, based on URL or content 12

Content Providers/Consumers Content providers/consumers are interested in being able to offer/access content Efficiently Reliably Securely Inexpensively Providers deploy server farms and replicas Consumers deploy web proxies But, there is an alternative solution 13

3 rd Parties: Content Delivery Networks 14 Arquitecturas de red para la distribución de contenidos

Business Model: Content Distribution Networks (CDN) A content provider such as www.cnn.com or Yahoo pays a CDN company (such as Akamai) to get its content to the requesting users with short delays. A CDN provides a mechanism for Replicating content on multiple servers in the Internet Providing clients with a means to determine the servers that can deliver the content fastest. 15

CDN Terminology Content Any publicly accessible combination of text, images, applets, frames, MP3, video, flash, virtual reality objects, etc. Content Provider Any individual, organization, or company that has content that it wishes to make available to users. Origin Server Content provider s server, where the content is first uploaded. Surrogate Server (sometimes called edge server) Content distributor s server, where the replicated content is kept. Full/Partial Site Delivery All the contents are delivered by the CDN (including HTML, images, and other objects) Only images, streaming media and other bandwidth intensive objects are delivered by the CDN. 16

Content Suitable for CDNS Images Streaming media Java applets Static information Content not suitable Dynamic information Personalized information CDNs and Content 17

Yahoo, MSNBC, CNN Cisco, Lucent, Inktomi, CacheFlow Content Provider H/W and S/W Vendor Content Distributor CDN Players Akamai, Digital Island, AT&T Hosting Provider Exodus 18

CDN: Distribution The CDN company places hundreds of CDN servers in Internet hosting centers. The CDN replicates its customers content in the CDN servers. Whenever, a customer updates its content (e.g., web page), the CDN redistributes the fresh content to the CDN servers. The CDN provides a mechanism so that when a user requests content, the content is provided by the CDN server that can most rapidly deliver the content to the user. This can be the closest CDN server to the user (perhaps in the same ISP as the user) or may be a CDN server with a congestion-free path to the user. 19

CDN CDN server in South America push content CDN server in Europe CDN: Distribution Origin server in North America push content CDN distribution node push content push content CDN server in Asia 20

CDN: Functional Components Distribution Service Redirection Service Accounting and Billing system 21 Arquitecturas de red para la distribución de contenidos

CDN: Distribution Service The content provider determines which of its objects it wants the CDN to distribute. The content provider tags and then pushes this content to a CDN node, which in turn replicates and pushes the content to all its CDN servers. When a browser in a user s host is instructed to retrieve a specific object (specified using a URL), how does the browser determine whether it should retrieve the object from the origin server or from one of the CDN servers? As an example, suppose the hostname of the content provider is www.cnn.com Suppose the hostname of the CDN company is www.akamai.com 22

CDN: Redirection Users get an html document from www.cnn.com; this could be index.html The file index.html uses a modified URL for content that has been replicated. Example: If the gif files are what has been replicated then <img src= http://cnn.com/af/x.gif> may be modified as follows: <img src=http://a73.g.akamaitech.net/7/23/cnn.com/af/x.gif> The browser needs to resolve axyz.g.akamaitech.net hostname for replicated content. DNS is configured so that all queries about g.akamaitech.net are sent to its authoritative DNS server. This is referred to as a Akamai DNS server (authoritative DNS server) 23

CDN: Redirection When the Akamai DNS server receives the query, it extracts the IP address of the requesting browser. Based on the IP address and information that it has about the Internet (called a map), the IP address of an Akamai server(surrogate server) is returned to the requesting browser based on policy e.g., select the server that is the fewest hops away. The Akamai DNS server IP address is now in the cache of the local DNS server. This implies that it is not always necessary to go to the root DNS server. The TTL associated with the IP address of an Akamai server(surrogate) is relatively small. This is done for performance reasons. Akamai content distribution servers are caches 24

Index.html... <img src="http://www.cdn.com/cnn/im ages/1.gif >... CNN.com Index.html GET www.cnn.com/index.html Client DNS query: cdn.com? 64.236.24.28 CDN Redirection Authoritative DNS server for cdn.com 64.236.24.28 Local DNS server 25

What if content is not there? CDN Redirection If the request content is not found then the surrogate will ask other surrogates within a specified region for information. If requested information is still not found or is stale, then a request is made to the original web site. 26

CDN Selection The tricky issue is selecting which local content server to use for a particular request Want to spread load evenly Want minimal impact if server is added or removed. In Akamai, each surrogate server sends measurement results to the Network Operations Communications Center (NOCC). Measurement results include number of active TCP connections, HTTP request arrival rate, bandwidth availability, etc This information is used by the Akamai DNS server. 27

Accounting Mechanism Accounting mechanisms collect and track information related to request routing, distribution and delivery. Information is gathered in real time and put into log files for each CDN component. This gets sent to the Network Operations Communications Center (NOCC). 28

S ISP S Backbone ISP S Hosting Center IX ISP How well do CDNs work? Backbone ISP S S S Hosting Center OS IX ISP S Backbone ISP CS CS CS CS C C CS S Site Sites C 29

Recall that the bottleneck links are at the edges. S ISP S Backbone ISP Even if CSs are pushed towards the edge, they are still behind the bottleneck link! S Hosting Center IX ISP How well do CDNs work? Backbone ISP S S S Hosting Center OS IX ISP S Backbone ISP CS CS CS CS C C CS S Site Sites C 30

Reduced latency improve TCP performance DNS round trip TCP handshake (2 round trips) Slow-start ~8 round trips to fill DSL pipe total 128K bytes Compare to 56 Kbytes for cnn.com home page Download finished before slow-start completes Total 11 round trips UMH - Berkeley University RTT is about 200 ms Measured RTT last night UMH Nearest CDN (akamai) node RTT ~ 20 ms One order of magnitude improvement in RTT!!! 11 RTTs stand up for 20x11 = 220 ms with CDN support, saving 1800 ms in downloading response time. Certainly noticeable 31

Tema 5: Distribución de contenidos 1. Introducción. 2. Arquitecturas. Cliente-Servidor Web proxies. Réplica de contenidos. 3. Caching y balanceado de carga 4. La red Akamai Bibliografía [FLU95] Fluckiger, Understanding networked multimedia. [SEI04] R. Seifert, Gibabit Ethernet: Technology & Applications for High-Speed Networks. [GAN04] A. Ganz, Z. Ganz and K. Wongthavarawat, Multimedia Wireless Networks: Technologies, Standards and QoS. Arquitecturas de red para la distribución de contenidos

Some Interesting Observations Top 1% of all documents account for 20% - 35% of proxy requests Top 10% account for 45% - 55% of requests It takes 25% to 40% of all documents to account for 70% of requests It takes 70% to 80% of all documents to account for 90% of requests 33

Web Caching As an example, we use the web to illustrate caching and other related issues browser browser request response request response Web Proxy cache Web server request response Web server 34

Web Browser Caching Web browsers have their own caches. When a page is downloaded from a site the web page is put into the browser cache. This is especially useful in those cases when the back button is pressed. If a new copy is needed then a refresh can be done. No page stays permanently in the cache. There is limited room. A replacement algorithm is needed to determine which cached page should be purged. 35

Client pull Web Browser Caching The server provides the content with instructions on when the client should ask for a refreshed copy of the content or if the content should be cached. Server push The server transmits page information to the screen. The browser application displays the information and leaves the connection to the server open. With an open connection, the server can continue to push updated pages for your screen to display on an ongoing basis. You can close the connection by closing the page. The server is in control Browser caches are different from proxy caches (discussed next). 36

Web Caching Proxy caches (also called proxy server) Intercepts HTTP requests from client Serves object if in its cache If not goes to object s home server On behalf of user, gets the object and possibly deposits in its cache before returning to user Usually deployed at edges of a network Wide area bandwidth savings, improved response time, and increased availability of static web-based objects A browser may have to be configured to point to the proxy server. Usually a proxy cache is purchased and installed by an ISP 37

Push-Based Approach Server tracks all proxies that have requested objects If a web page is modified, notify each proxy Notification types Indicate object has changed [invalidate] Send new version of object [update] How to decide between invalidate and updates? Pros and cons? One approach Send updates for more frequently accessed objects, invalidate for rest proxy push Web server 38

Advantages Push-Based Approaches Provide tight consistency [minimal stale data] Proxies can be passive Disadvantages Need to maintain state at the server Recall that HTTP is stateless Need mechanisms beyond HTTP State may need to be maintained indefinitely Not resilient to server crashes The disadvantage is the reason why push-based approaches are not used 39

proxy poll response Pull-Based Approaches Web server The proxy is entirely responsible for maintaining consistency The proxy periodically polls the server to see if object has changed Use if-modified-since HTTP messages: This type of message can be used by a proxy to tell a remote server to return a copy only if it has been modified. Key question: When should a proxy poll? Server-assigned Time-to-Live (TTL) values No guarantee if the object will change in the interim 40

Pull-Based Approach Proxy can dynamically determine the polling interval Compute based on past observations Start with a conservative poll interval Increase interval if object has not changed between two successive polls Decrease interval if object is updated between two polls Adaptive: No prior knowledge of object characteristics needed Advantages Server remains stateless Resilient to both server and proxy failures Disadvantages Weaker consistency guarantees (objects can change between two polls and proxy will contain stale data until next poll) High message overhead 41

A Hybrid Approach: Leases Lease: Duration of time for which server agrees to notify proxy of modification Issue lease on first request, send notification until expiry Need to renew lease upon expiry Smooth tradeoff between state and messages exchanged Zero duration polling, Infinite leases server-push Efficiency depends on the lease duration Limited use Get + lease req Client read Proxy Reply + lease Invalidate/update Server 42

Cooperative Caching Caching infrastructure can have multiple web proxies Proxies can be arranged in a hierarchy or other structures Proxies can cooperate with one another Answer client requests Propagate server notifications Uses a combination of HTTP and ICP (Internet Caching Protocol). ICP can be used by one cache to quickly ask another cache if it has an object. HTTP is used to actually retrieve the object. 43

Caching proxies do not serve all Internet users. Problems Content providers (say, Web servers) cannot rely on existence and correct implementation of caching proxies. Accounting issues with caching proxies: Example: www.cnn.com needs to know the number of hits to the advertisements displayed on the web page. 44

User types or clicks on a URL DNS Query in Web Download E.g., http://www.cnn.com/2006/leadstory.html Browser extracts the site name E.g., www.cnn.com Browser calls gethostbyname() to learn IP address Triggers resolver code to query the local DNS server Eventually, the resolver gets a reply Resolver returns the IP address to the browser Then, the browser contacts the Web server Creates and connects socket, and sends HTTP request 45

www.cnn.com 1 10 Browser s cache Local Name Server 2 9 8 3 5 4 6 7 User PC DNS Resolution.com.net Root (InterNIC) cnn.com DNS servers 46

Often a Web page has embedded objects E.g., HTML file with embedded images Each embedded object has its own URL and potentially lives on a different Web server E.g., http://www.myimages.com/image1.jpg Browser downloads embedded objects Multiple DNS Queries Usually done automatically, unless configured otherwise Requires learning the IP address for www.myimages.com 47

When are DNS Queries Unnecessary? Browser is configured to use a proxy E.g., browser sends all HTTP requests through a proxy Then, the proxy takes care of issuing the DNS request Requested Web resource is locally cached E.g., cache has http://www.cnn.com/2006/leadstory.html No need to fetch the resource, so no need to query Browser recently queried for this host name E.g., user recently visited http://www.cnn.com/ So, the browser already called gethostbyname() and may be locally caching the resulting IP address 48

Directing Web Clients to Replicas Simple approach: different names www1.cnn.com, www2.cnn.com, www3.cnn.com But, this requires users to select specific replicas More elegant approach: different IP addresses Single name (e.g., www.cnn.com), multiple addresses E.g., 64.236.16.20, 64.236.16.52, 64.236.16.84, Authoritative DNS server returns many addresses And the local DNS server selects one address Authoritative server may vary the order of addresses 49

Clever Load Balancing Schemes Selecting the best IP address to return Based on server performance Based on geographic proximity Based on network load Example policies Round-robin scheduling to balance server load U.S. queries get one address, Europe another Tracking the current load on each of the replicas 50

Tema 5: Distribución de contenidos 1. Introducción. 2. Arquitecturas. Cliente-Servidor Web proxies. Réplica de contenidos. 3. Caching y balanceado de carga 4. La red Akamai Bibliografía [FLU95] Fluckiger, Understanding networked multimedia. [SEI04] R. Seifert, Gibabit Ethernet: Technology & Applications for High-Speed Networks. [GAN04] A. Ganz, Z. Ganz and K. Wongthavarawat, Multimedia Wireless Networks: Technologies, Standards and QoS. Arquitecturas de red para la distribución de contenidos

La red Akamai Starts its commercial service in April 1999 with Yahoo! as first customer Currently offers content delivery services to more than 1200 world s leading electronic commerce organizations. As Akamai states, between 15% to 20% of ALL Web traffic is delivered by Akamai servers. Akamai s content delivery service is based on caching and replicating content through its servers which are conveniently spread around the world. Also supports adaptive bitrate streaming HD video Akamai HD Network 52

Slow Problems with the Centralized Approach content must traverse multiple backbones and long distances Unreliable delivery may be prevented by congestion or backbone peering problems Not scalable usage limited by bandwidth available at master site Inferior streaming quality packet loss, congestion, and narrow pipes degrade stream quality 53

Multi-Site Multi-Server distributed content approach. Caches,replicates & distributes all forms of content and supports applications Monitors the Internet and routes around trouble spots Provides feedback on hit counts to content providers The Akamai Solution 54

Arquitecturas de red para la distribución de contenidos Advantages of the Akamai Solution Fast Content is served from locations near to end users Reliable No single point of failure Automatic fail-over Scalable Master site no longer requires massive available bandwidth 55

Logos 3,395 bytes Navigation Bar 9,674 bytes Total page Total Akamai Served 78% Typical Page Content 87,550 bytes 68,756 bytes Banner Ads 16,174 bytes Gif links 22,395 bytes Fresh Content 17,118 bytes Page Served by Akamai 56

Network Deployment 105.000+ 1900+ 78+ Servers Networks Countries 57 Arquitecturas de red para la distribución de contenidos

Results Web Site Performance Typical Improvement with Akamai Arquitecturas de red para la distribución de contenidos Noon May 15 Noon May 26 Noon May 27 Noon May 16 Noon May 17 Noon May 18 Noon May 19 Noon May 20 Noon May 21 Noon May 22 Noon May 23 Noon May 24 Noon May 25 Web object delivered without Akamai Web object delivered by Akamai 58

Arquitecturas de red para la distribución de contenidos Over 1300 Web Sites are Now Akamaized 59

Akamai CDN: How it works HTML Title Page for www.xyz.com with Embedded Objects <html> <head> <title>welcome to xyz.com!</title> </head> <body> <img src= http://www.xyz.com/logos/logo.gif > <img src= http://www.xyz.com/jpgs/navbar1.jpg > <h1>welcome to our Web site!</h1> <a href= page2.html >Click here to enter</a> </body> </html> 60

1 WWW.XYZ.COM 5 User enters www.xyz.com Browser requests IP address for www.xyz.com DNS returns IP address Browser requests HTML Content provider s web server returns HTML 6 3 Downloading www.xyz.com - before Akamai DNS Server 2 10.10.123.8 Browser obtains IP addresses for hostnames listed in URLs of objects embedded on page Browser requests embedded objects Content provider s web server returns embedded objects 4 7 Content Provider Web server 10.10.123.8 61

1 WWW.XYZ.COM 6 2 User enters www.xyz.com Browser requests IP address for www.xyz.com DNS returns IP address 3 5 Browser requests HTML Downloading www.xyz.com - The Akamai way DNS Server Content provider s web server returns page with Akamaized URLs Browser obtains IP address of optimal Akamai server for embedded objects Browser obtains objects from optimal Akamai server 4 Content Provider Web server 62

Content Delivery Using Akamai <html> <head> <title>welcome to xyz.com!</title> </head> <body> Embedded URLs are Converted to ARLs ak <img src= http://www.xyz.com/logos/logo.gif > <img src= http://www.xyz.com/jpgs/navbar1.jpg > <h1>welcome to our Web site!</h1> <a href= page2.html >Click here to enter</a> </body> </html> 63

Akamai caching services ARL: Akamai Resource Locator http://a620.g.akamai.net/7/620/16/259bf4ed29de/www.cnn.com/i/22.gif Host Part Akamai Control Part Content URL /7/620/16/259bf4ed29de/ a620.g.akamai.net/ /www.cnn.com/i/22.gif 64

ARL: Akamai Resource Locator (I) http://a620.g.akamai.net/7/620/16/259bf4ed29de/www.cnn.com/i/22.gif Content Provider (CP) selects which content will be hosted by Akamai. Akamai provides a tool that transforms this CP URL into this ARL a620.g.akamai.net/ /www.cnn.com/i/22.gif 65

ARL: Akamai Resource Locator (II) http://a620.g.akamai.net/7/620/16/259bf4ed29de/www.cnn.com/i/22.gif This in turn causes the client to access Akamai s content server instead of the origin server a620.g.akamai.net/ /www.cnn.com/i/22.gif 66

ARL: Akamai Resource Locator (III) http://a620.g.akamai.net/7/620/16/259bf4ed29de/www.cnn.com/i/22.gif If Akamai s content server doesn t have the content in its cache, it retrieves it using this URL. a620.g.akamai.net/ /www.cnn.com/i/22.gif 67

Customer Number Type Code (I.e. CNN, Yahoo ) (different types will have different contents) a620.g.akamai.net/??? ARL Control Part /7/620/16/259fdbf4ed29de/ Content Checksum (May be used for identifying changed content. May also validate content???) /www.cnn.com/i/22.gif http://a620.g.akamai.net/7/620/16/259fdbf4ed29de/www.cnn.com/i/22.gif 68

But why such a complex domain name???? ARL Host Part a620.g.akamai.net/ /7/620/16/259fdbf4ed29de/ /www.cnn.com/i/22.gif http://a620.g.akamai.net/7/620/16/259fdbf4ed29de/www.cnn.com/i/22.gif 69

Hierarchical DNS architecture.net gtld akamai.net g.akamai.net a620.g.akamai.net CS CS ARL Host Part (II) Points to ~8 akamai.net DNS servers (random ordering, TTL order hours to days) Attempts to select ~8 g.akamai.net DNS servers near client. (Using BGP? TTL order 30 min 1 hour) Makes a very fine-grained loadbalancing decision among local content servers. TTL order 30 sec 1 min. 70

xyz.com DNS server 1 16 6 Browser s cache a212.g.akamai.net 2 15 Local DNS 3 xyz.com? DNS Resolution.com.net Root DNS (InterNIC) 9 15.15.125.6 Akamai High-Level DNS Servers g.akamai.net? 10 20.20.123.55 11 a212.g.akamai.net? 12 30.30.123.5 13 TTL: 30 Akamai Low-Level DNS Servers 14 User PC User DNS requests for www.xyz.com 7 4 5 10.10.123.5 Akamai.net? 8 71

Lets look at a study about CDNs performance Zhang, Krishnamurthy and Wills AT&T Labs Traces taken in Sept. 2000 and Jan. 2001 Compared CDNs with each other Compared CDNs against non-cdn 72

Selected a bunch of CDNs Akamai, Speedera, Digital Island Note, most of these gone now! Methodology Selected a number of non-cdn sites for which good performance could be expected U.S. and international origin U.S.: Amazon, Bloomberg, CNN, ESPN, MTV, NASA, Playboy, Sony, Yahoo Selected a set of images of comparable size for each CDN and non-cdn site Compare apples to apples Downloaded images from 24 NIMI machines 73

Response Time Results (II) Including DNS Lookup Time 74 Arquitecturas de red para la distribución de contenidos Cumulative Probability

Response Time Results (II) Including DNS Lookup Time About one second Arquitecturas de red para la distribución de contenidos Cumulative Probability Author conclusion: CDNs generally provide much shorter download time. 75

Other findings of study Each CDN performed best for at least one (NIMI) client Why? Because of proximity? The best origin sites were better than the worst CDNs CDNs with more servers don t necessarily perform better Note that they don t know load on servers HTTP 1.1 improvements (parallel download, pipelined download) help a lot Even more so for origin (non-cdn) cases Note not all origin sites implement pipelining 76

Keynote Systems A Performance Analysis of 40 e-business Web Sites Doing measurements since 1997 (All from one location, near as I can tell) Latest measurement January 2001 Another study 77

Historical trend: Clear improvement 78 Arquitecturas de red para la distribución de contenidos

Performance breakdown Basically says that smaller content leads to shorter download times (duh!) Average content size 12K bytes Average content size 44K bytes Average content size 99K bytes 79

Effect of CDN Note: non-cdns can work well (CDN not always better) 80 Arquitecturas de red para la distribución de contenidos