Content Delivery Networks (CDN) Dr. Yingwu Zhu



Similar documents
Internet Content Distribution

How To Understand The Power Of A Content Delivery Network (Cdn)

Overlay Networks. Slides adopted from Prof. Böszörményi, Distributed Systems, Summer 2004.

Content Delivery Networks. Shaxun Chen April 21, 2009

More efficient content delivery over the Web has become an important

Measuring the Web: Part I - - Content Delivery Networks. Prof. Anja Feldmann, Ph.D. Dr. Ramin Khalili Georgios Smaragdakis, PhD

Overview. Tor Circuit Setup (1) Tor Anonymity Network

From Internet Data Centers to Data Centers in the Cloud

Web Caching and CDNs. Aditya Akella

CSC2231: Akamai. Stefan Saroiu Department of Computer Science University of Toronto

CDN and Traffic-structure

Distributed Systems. 23. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2015

Indirection. science can be solved by adding another level of indirection" -- Butler Lampson. "Every problem in computer

How To Test Performance Of A Cdn Server

Internet Content Distribution

Distributed Systems 19. Content Delivery Networks (CDN) Paul Krzyzanowski

Distributed Systems. 25. Content Delivery Networks (CDN) 2014 Paul Krzyzanowski. Rutgers University. Fall 2014

An Architecture for Distributed Content Delivery Network

Week 3 / Paper 2. Bernhard Ager, Wolfgang Mühlbauer, Georgios Smaragdakis, Steve Uhlig ACM IMC 2010.

Distributed Systems. 24. Content Delivery Networks (CDN) 2013 Paul Krzyzanowski. Rutgers University. Fall 2013

Meeting Worldwide Demand for your Content

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

LARGE SCALE INTERNET SERVICES

On the Use and Performance of Content Distribution Networks

Request Routing, Load-Balancing and Fault- Tolerance Solution - MediaDNS

Content Distribu-on Networks (CDNs)

DATA COMMUNICATOIN NETWORKING

The Value of a Content Delivery Network

Web Application Hosting Cloud Architecture

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. Caching, Content Distribution and Load Balancing

Global Server Load Balancing

Communications Software. CSE 123b. CSE 123b. Spring Lecture 13: Load Balancing/Content Distribution. Networks (plus some other applications)

Efficient Parallel Distributed Load Balancing in Content Delivery Networks

GLOBAL SERVER LOAD BALANCING WITH SERVERIRON

Globule: a Platform for Self-Replicating Web Documents

Content Delivery and the Natural Evolution of DNS

ICP. Cache Hierarchies. Squid. Squid Cache ICP Use. Squid. Squid

CHAPTER 4 PERFORMANCE ANALYSIS OF CDN IN ACADEMICS

Information- Centric Networks. Section # 3.2: DNS Issues Instructor: George Xylomenos Department: Informatics

Content Delivery Networks

Open Issues in Content Distribution

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution

John S. Otto Fabián E. Bustamante

Rapid IP redirection with SDN and NFV. Jeffrey Lai, Qiang Fu, Tim Moors December 9, 2015

Real-Time Analysis of CDN in an Academic Institute: A Simulation Study

The Application Front End Understanding Next-Generation Load Balancing Appliances

DNS, CDNs Weds March Lecture 13. What is the relationship between a domain name (e.g., youtube.com) and an IP address?

FortiBalancer: Global Server Load Balancing WHITE PAPER

High volume Internet data centers. MPLS-based Request Routing. Current dispatcher technology. MPLS-based architecture

Experimentation with the YouTube Content Delivery Network (CDN)

CDN Brokering. Content Distribution Internetworking

Getting Started with AWS. Hosting a Static Website

Setup The package simply needs to be installed and configured for the desired CDN s distribution server.

A Taxonomy and Survey of Content Delivery Networks

The Application Delivery Controller Understanding Next-Generation Load Balancing Appliances

AKAMAI WHITE PAPER. Turbo-Charging Dynamic Web Sites with Akamai EdgeSuite SM

SiteCelerate white paper

Deliuery Networks. A Practical Guide to Content. Gilbert Held. Second Edition. CRC Press. Taylor & Francis Group

Demand Routing in Network Layer for Load Balancing in Content Delivery Networks

Reverse Proxy Caching

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

A Walk through Content Delivery Networks

Cisco Videoscape Distribution Suite Service Broker

1 Introduction: Network Applications

The old Internet. Software in the Network: Outline. Traditional Design. 1) Basic Caching. The Arrival of Software (in the network)

The Domain Name Service, Etc. Jeff Chase Duke University, Department of Computer Science CPS 212: Distributed Information Systems

EdgeCast Networks Inc. Flash Media Streaming Administration Guide

Choosing a Content Delivery Method

How To Model The Content Delivery Network (Cdn) From A Content Bridge To A Cloud (Cloud)

THE MASTER LIST OF DNS TERMINOLOGY. v 2.0

Measuring the Performance of Prefetching Proxy Caches

Enabling Media Rich Curriculum with Content Delivery Networking

THE MASTER LIST OF DNS TERMINOLOGY. First Edition

Lecture 2 CS An example of a middleware service: DNS Domain Name System

Active ISP Involvement in Content-Centric Future Internet Eugene Kim

The secret life of a DNS query. Igor Sviridov <sia@nest.org>

Computer Networks 1 (Mạng Máy Tính 1) Lectured by: Dr. Phạm Trần Vũ MEng. Nguyễn CaoĐạt

Content Distribution Networks (CDN)

Global Server Load Balancing

Tema 5: Distribución de contenidos

Lecture 8a: WWW Proxy Servers and Cookies

COMP 631: COMPUTER NETWORKS. How to distribute content without requiring centralized, heavy-duty servers? Peer-to-peer content distribution

Handling Flash Crowds From Your Garage

Intelligent Content Delivery Network (CDN) The New Generation of High-Quality Network

Content Delivery Networks

HW2 Grade. CS585: Applications. Traditional Applications SMTP SMTP HTTP 11/10/2009

Copyright

Front-End Performance Testing and Optimization

Pavlo Baron. Big Data and CDN

Data Center Content Delivery Network

Akamai CDN, IPv6 and DNS security. Christian Kaufmann Akamai Technologies DENOG 5 14 th November 2013

CS514: Intermediate Course in Computer Systems

WHITE PAPER. DNS: Key Considerations Before Deploying Your Solution

How To Create A Toecdn (Open Edge Content Delivery Network) From Scratch On A Microsoft Ipad Or Ipad (For Free) On A Pc Or Ipa (For A Free) With A Free Ipad) On An Ip

The Web History (I) The Web History (II)

HTG XROADS NETWORKS. Network Appliance How To Guide: EdgeDNS. How To Guide

WiNG5 CAPTIVE PORTAL DESIGN GUIDE

DNS ROUND ROBIN HIGH-AVAILABILITY LOAD SHARING

Transcription:

Content Delivery Networks (CDN) Dr. Yingwu Zhu

Web Cache Architecure Local ISP cache cdn Reverse Reverse Proxy Reverse Proxy Reverse Proxy Proxy L4 Switch Content Content Content Server Content Server Server Server cache Browser cache Intranet cache Browser cache Browser Data Center ISP cdn

History 1998 1 st CDNs appear. Save $ by putting more web sites on a CDN, reliability and scalability without expensive hardware and management 1999 several companies (Akamai, Mirror Image) became the specialists in providing fast and reliable delivery of Web content, earning large profits 2000 U.S. only, CDNs are a huge market generating $905 millions, reaching $12 billion by 2007 2001 the flash crowd event (numerous users access a web site simultaneously), e.g., Sept. 11 2001 when users flooded popular news sites, making the sites unavailable. Flash events transfer more $ to CDN sale income 2002 Large-scale ISPs (AT&T) tend to build their own CDN functionality, providing customized services 2004 More than 3000 companies using CDNs, spending more than $20 million monthly. CDN providers doubled their revenue from streaming media operations in 2004 compared to 2003. 2005 CDN revenue for both streaming video and Internet radio is estimated to grow at 40%, spending more than $450 million for delivery of news, film, sports, music and entertainment.

Content Delivery - a bit of History Individual Web servers Increase in Web content Web Server Farms Issue of Flash Crowds Replication of same Web content around the globe in a net of Web servers Not financially viable for individual content providers (say, bbc.com) to set up their own server networks Expensive hardware, maintenance, energy cost?

Content Delivery Networks (CDN) What: Geographically distributed network of Web servers around the globe (by an individual provider, E.g. Akamai). Many ISP points of presence (POP) Why: Improve the performance and scalability of content retrieval. How: Allow content providers to replicate their content in a network of servers.

Conventional CDN Architecture Classical Example: Akamai Figure Ref:http://arxiv.org/pdf/cs/0609027

Conventional CDN Architectures Commercial CDN Centralized Client-Server Architecture Owned by corporate companies E.g: Akamai Academic CDN Peer-to-peer Architecture Designed to reduce the cost E.g: Globule

What is CDN? The CDNs are means to offload some or all of the (mainly static content) content delivery burden from the origin server. A replica server, which delivers content on behalf of the origin server is called a CDN server. Aimed to address Client perceived latency (e.g. web browsers). Capacity management of the server. Caching as a side-effect.

What is CDN? CDN is an architecture for efficient delivery of (web) content to a large number of clients CDNs are operated by companies which charge content providers for the delivery services CDNs are mostly transparent to the end-user Meaning: You can see CDNs being used only if you look at actual DNS requests or read HTML-source of a page Commercial CDNs for actual content delivery: Akamai, Panther Express, SAVVIS, VitalStream Academic CDNs for research on content delivery: CoDeeN, CoralCDN, Globule

A Big Picture

Advantages of using CDN Reduce customers needs in investing web site infrastructures and decrease operational cost of managing such infrastructures Bypass traffic jams on the web Requested data is close to the clients Avoid traversing bottleneck links Improve content delivery quality, speed, and reliability Reduce load on the original server Load balancing?

CDN why? One of the main goals of CDNs is to put content provider in control over how her content is cached Content provider signs a contract with CDN Contract specifies how content can be cached Contract also means CDN will follow what content provider wants CDNs typically charge per-byte of traffic served CDNs can be used for any kind of content Typically main use is for web content Streaming media has also been delivered over CDNs

CDN--How? Original servers A set of surrogate servers or CDN servers Geographically distributed worldwide Cache original servers content Routers deliver the client s requests to a best fitted CDN server (latency, load balancing, etc) Network elements Distribute content from the original servers to surrogate/cdn servers Accounting mechanism Provide logs and accounting info. to the original servers

How does CDN work? Users send requests to origin server Requests somehow intercepted by redirection service Redirection service forwards user s request to the best CDN content server Content served from the CDN content server

CDN- Design Issues CDN operates CDN content servers Content servers are placed close to users In terms of network distance Some or all of the content from the content provider (original server) is replicated on the content servers Different content servers might have different content Users access content from the nearest content server Challenges: How to redirect clients (request redirection)? How to replicate content? Usually happens over a private network Can optimize according to many criteria

Request Redirection Key to CDNs Select the most appropriate CDN content server for user requests DNS redirection Complete/full Partial URL rewrite

Request Redirection DNS redirection Authoritative DNS server is controlled by the CDN infrastructure. Distributes the load to the various CDN servers depending whatever policy (e.g. roundrobin, least loaded CDN server, geographical distance etc.) using DNS trick. URL rewriting Main page still comes from the origin server, but URL for the embedded objects, e.g. images, clips are rewritten, which points to a any of the CDN server. Some vendors rewrite using hostname and some uses IP address directly.

Full Site DNS redirection example GET index.html Origin Server 111.222.100.1 <HTML> <HTML> 10.20.30.1 www.yahoo.com/get index.html 10.20.30.1 (not 111.222.100.1) IP for yahoo.com 10.20.30.4 10.20.30.1 10.20.30.2 10.20.30.3 10.20.30.4 10.20.30.2 10.20.30.3 CDN controlled DNS Server CNAME DNS record Vendors: Adero(Full), Akami and Digital Island (Partial)

DNS Redirection Client s DNS request comes to CDN s nameserver Somehow, see below for two possibilities Typically the request has to go through some steps through the CDN s DNS hierarchy Each step redirects the client to a nearby nameserver Finally, last nameserver returns the address of a nearby content server For the infrastructure, CDN needs to measure the state of the network Needed to determine which servers are the closest Network measurements to determine current state

Two DNS Redirection Types Full redirection Any request for origin server is redirected to CDN Basically, CDN takes control of content provider s DNS zone Benefit: All requests are automatically redirected Disadvantage: May send lots of traffic to CDN, hence expensive for the content provider, $ per byte Partial redirection Content provider marks which objects are to be served from CDN Typically, larger objects like images are selected Refer to images as: <img src=http://cdn.com/foo/bar/img.gif> When client wants to retrieve image, DNS request for cdn.com gets resolved by CDN and image is fetched from the selected content server Pro: Fine-grained control over what gets delivered Con: Have to (manually) mark content for CDN

Two DNS Redirection Types Full redirection All requests redirected to content servers Partial redirection Get HTML page from origin server, images from content server Need to open new TCP connection for images

DNS Redirection: other issues DNS redirection has one (big) problem Because redirection is based on DNS queries, the content server is chosen based on who sent that query DNS queries do not come from clients, but from the DNS servers used by the clients Why is this a problem? In many cases it s not a problem For example, clients in a university use university s nameserver In many cases, it s a big problem Larger ISPs might run only a few nameservers Especially in US for dial-up users, DNS lookups are concentrated This means the content server is optimized for the nameserver, not the actual client The difference can sometimes be very large

URL rewrite Modify pages at the origin server on the fly Change embedded URL s based on up-todate knowledge of the network and CDN server loads Does not require additional DNS lookups Fasttide, Clearway

Partial DNS redirect/url rewriting example index.html <HTML> <BODY> <A HREF= /about_us.html > About Us </A> <IMG SRC= www.clearway1.net/www.yahoo.com/img1.gif > <IMG SRC= www.clearway2.net/www.yahoo.com/img2.gif > <IMG SRC= 10.20.30.2/www.yahoo.com/img3.gif > </BODY> </HTML> Vendors: Clearway (URL RW)

CDN: other issues Content server placement Content selection Content outsourcing

Content Server Placement Minimize user-perceived latency Put content servers close to the users Minimize cost Content outsourcing cost Algorithms to achieve both

Content selection How much content should be replicated to content server? Full site replication Simple, but high storage cost, outsourcing cost Partial replication Content grouping based on correlation or access frequency Replicate content groups

Content Outsourcing Cooperating push-based Content is prefetched to content servers from the original server Content servers cooperate in order to reduce the replication and update cost CDNs maintain the mapping between content and content servers

Some Facts... CDN mainly used for image files (static contents). Content server by the CDN is a static in the nature. Only 0.3% content changed for existing URLs and at the most 13% new URLs were introduced. Large increase in deployment in the CDN between Nov 99 (only 1-2% of top 670 sites) and Dec 2000 (25% of the popular sites). Akamai seems to be most popular CDN vendor. Images are 96-98% of the CDN served contents. But only 40-46% of the CDN-served bytes. Rest is dynamic content? CDN images cache-hit rate is 30-80%. CDNs can not be used for something that involves authentication etc.