Content Distribu-on Networks (CDNs) Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:0am in Architecture N101 hjp://www.cs.princeton.edu/courses/archive/spr12/cos461/ Second Half of the Course Applica-on case studies Content distribu-on and mul-media streaming Peer- to- peer file sharing and overlay networks Network case studies Home, enterprise, and data- center networks Backbone, wireless, and cellular networks Network management and security Programmable networks and network security Internet measurement and course wrap- up 2 Single Server, Poor Performance Skewed Popularity of Web Traffic Zipf or power- law distribu-on Single Popular content Single point of failure Popular site Easily overloaded Flash crowd (aka Far from most clients Slashdot effect ) Denial of Service ajack Characteris*cs of WWW Client- based Traces Carlos R. Cunha, Azer Bestavros, Mark E. Crovella, BU- CS- - 01 4 Web Caching Caches Reac-vely replicates popular content Smaller round- trip -mes to clients Reduces load on origin s client client request response request response request response origin Reduces network load, and bandwidth costs Maintain persistent TCP connec-ons 6 1
Forward Cache close to the client Improves client performance client Reduces network provider s costs Explicit proxy Requires configuring browser client Implicit proxy Service provider deploys an on path proxy that intercepts and handles Web requests request response request response Reverse Cache close to Improve client performance Reduce content provider cost Load balancing, content assembly, transcoding, etc. Direc-ng clients to the proxy Map the site name to the IP address of the proxy request response request response origin origin.. Servers Router Requests Google Design Data Centers Reverse Client Private Backbone Internet Client Reverse. Router Client Servers Limita-ons of Web Caching Much content is not cacheable Dynamic data: stock prices, scores, web cams CGI scripts: results depend on parameters Cookies: results may depend on passed data SSL: encrypted data is not cacheable Analy-cs: owner wants to measure hits Stale data Or, overhead of refreshing the cached data 10 Content Distribu-on Networks Content Distribu-on Network Proac-ve content replica-on Content provider (e.g., CNN) contracts with a CDN CDN replicates the content On many s spread throughout the Internet Upda-ng the replicas Updates pushed to replicas when the content changes origin in North America CDN distribution node CDN CDN in S. America CDN in Asia in Europe 11 12 2
Live For availability Server Selec-on Policy Requires continuous monitoring of liveness, load, and performance Lowest load To balance load across the s Closest Nearest geographically, or in round- trip -me Best performance Throughput, latency, Cheapest bandwidth, electricity, Server Selec-on Mechanism Applica-on redirec-on Redirect OK Advantages Fine- grain control Selec-on based on client IP address Disadvantages Extra round- trips for TCP connec-on to Overhead on the 14 1 Server Selec-on Mechanism 1 Server Selec-on Mechanism Rou-ng Anycast rou-ng 1.2..0/24 1.2..0/24 Advantages No extra round trips Route to nearby Disadvantages Does not consider network or load Different packets may go to different s Used only for simple request- response apps DNS query Naming DNS- based selec-on 1.2..4 1.2.. local Advantages Avoid TCP set- up delay DNS caching reduces overhead Rela-vely fine control Disadvantage Based on IP address of local DNS Hidden load effect DNS TTL limits adapta-on 16 How Works 1 Sta-s-cs Distributed s Servers: ~61,000 Networks: ~1,000 Countries: ~0 Many customers Apple, BBC, FOX, GM IBM, MTV, NASA, NBC, NFL, NPR, Puma, Red Bull, Rutgers, SAP, Client requests Hundreds of billions per day Half in the top 4 networks 1-20% of all Web traffic worldwide 1
How Uses DNS 1 How Uses DNS 20 index. html http://cache.cnn.com/ foo.jpg global regional DNS lookup cache.cnn.com 4 ALIAS: g.akamai.net global regional How Uses DNS 21 How Uses DNS 22 ALIAS a.g.akamai.net DNS lookup g.akamai.net global regional DNS a.g.akamai.net Address 1.2..4 global regional How Uses DNS 2 How Uses DNS 24 foo.jpg global regional 11 12 global regional /foo.jpg Host: cache.cnn.com /foo.jpg Host: cache.cnn.com 4
How Uses DNS 2 How Works: Cache Hit 11 12 10 global regional index. html 10 /cnn.com/foo.jpg high-level low-level hash-chosen 26 Mapping System Equivalence classes of IP addresses IP addresses experiencing similar performance Quan-fy how well they connect to each other Collect and combine measurements Ping, traceroute, BGP routes, logs E.g., over 100 TB of logs per days Network latency, loss, and connec-vity Mapping System Map each IP class to a preferred Based on performance, health, etc. Updated roughly every minute Map client request to a in the Load balancer selects a specific E.g., to maximize the cache hit rate 2 2 Adap-ng to Failures Failing hard drive on a Suspends aler finishing in progress requests Failed Another takes over for the IP address Low- level map updated quickly Failed High- level map updated quickly Failed path to customer s origin Route packets through an intermediate node Transport Op-miza-ons Bad Internet routes Overlay rou-ng through an intermediate Packet loss Sending redundant data over mul-ple paths TCP connec-on set- up/teardown Pools of persistent connec-ons TCP conges-on window and round- trip -me Es-mates based on network latency measurements 2 0
Applica-on Op-miza-ons Slow download of embedded objects Prefetch when HTML page is requested Large objects Content compression Slow applica-ons Moving applica-ons to edge s E.g., content aggrega-on and transforma-on E.g., sta-c databases (e.g., product catalogs) E.g. batching and valida-ng input on Web forms Conclusion Content distribu-on is hard Many, diverse, changing objects Clients distributed all over the world Reducing latency is king Contribu-on distribu-on solu-ons Reac-ve caching Proac-ve content distribu-on networks Next -me Mul-media streaming applica-ons 1 2 6