Measuring the Web: Part I - - Content Delivery Networks Prof. Anja Feldmann, Ph.D. Dr. Ramin Khalili Georgios Smaragdakis, PhD
Acknowledgement Material presented in these slides is borrowed from presentajons by: Mike Freedman (Princeton University), Ravi Sundaram (Northeastern University, ), Craig Labovitz (Arbor Networks, DeepField), Fabian Bustamante (Northeastern University), Tobias Flach (University of South California) 2
Today s Internet Traffic Web becomes main transport for video and everything else Thousands small web sites subsumed by large content providers 3
Single Server, Poor Performance Single server Single point of failure Easily overloaded Far from most clients Popular content Popular site Flash crowd (aka Slashdot effect ) Denial of Service axack 4
The Web: Simple on the Outside Content Providers End Users Internet
But ProblemaBc on the Inside Content Providers Peering Points Network Providers End Users IXP IXP
For TCP Distance MaXers 7
UJlizing Caching: 8 Proxy Caches Proxy server origin server client client 8
Forward Proxy Cache close to the client Under administrajve control of client- side AS Explicit proxy Requires configuring browser client Proxy server Implicit/Transparent proxy client Service provider deploys an on path proxy that intercepts and handles Web requests 9
Reverse Proxy Cache close to server Either by proxy run by server or in third- party content distribujon network (CDN) DirecJng clients to the proxy Map the site name to the IP address of the proxy Proxy server origin server origin server 10
Content DistribuJon Network ProacJve content replicajon Content provider (e.g., CNN) contracts with a CDN CDN replicates the content On many servers spread throughout the Internet origin server in North America CDN distribution node UpdaJng the replicas Updates pushed to replicas when the content changes CDN server in S. America CDN server in Europe CDN server in Asia 11
Live server Server SelecJon Policy For availability Lowest load To balance load across the servers Closest Nearest geographically, or in round- trip Jme Best performance Throughput, latency, Requires conjnuous monitoring of liveness, load, and performance. Monitoring includes traceroutes, pings, BGP updates etc Cheapest bandwidth, electricity, 12
Server SelecJon Mechanism ApplicaJon HTTP redirecjon GET Redirect GET OK Advantages Fine- grain control SelecJon based on client IP address Disadvantages Extra round- trips for TCP connecjon to server Overhead on the server 13
Server SelecJon Mechanism RouJng Anycast roujng 1.2.3.0/24 1.2.3.0/24 Advantages No extra round trips Route to nearby server Disadvantages Does not consider network or server load Different packets may go to different servers Used only for simple request- response apps 14
Server SelecJon Mechanism Naming DNS- based server selecjon 1.2.3.4 Advantages Avoid TCP set- up delay DNS caching reduces overhead RelaJvely fine control DNS query local DNS server 1.2.3.5 Disadvantage Based on IP address of local DNS server Hidden load effect DNS TTL limits adaptajon 15
CDN example: Content Providers Servers at Network Edge End Users IXP IXP
StaJsJcs Distributed servers Servers: ~150,000 Networks: ~1,100 Countries: ~80 Many customers - Web portals - Streaming - E- commerce Client requests - Hundreds of billions per day - 15-20% of all Web traffic worldwide 17
18 How Uses DNS cnn.com (content provider) DNS root server GET index. html http://cache.cnn.com/foo.jpg 1 2 HTTP HTTP global DNS server regional DNS server cluster End user Nearby cluster
19 How Uses DNS cnn.com (content provider) DNS TLD server 1 2 HTTP DNS lookup cache.cnn.com 3 4ALIAS: g.akamai.net global DNS server regional DNS server cluster End user Nearby cluster
20 How Uses DNS cnn.com (content provider) DNS TLD server 1 2 HTTP 3 4 6 ALIAS a73.g.akamai.net DNS lookup g.akamai.net 5 global DNS server regional DNS server cluster End user Nearby cluster
21 How Uses DNS cnn.com (content provider) DNS TLD server 1 2 HTTP End user 5 3 4 6 DNS a73.g.akamai.net Address 1.2.3.4 7 8 global DNS server regional DNS server Nearby cluster cluster
22 How Uses DNS cnn.com (content provider) DNS TLD server 1 2 HTTP 3 5 4 6 7 global DNS server regional DNS server cluster 8 End user GET /foo.jpg Host: cache.cnn.com 9 Nearby cluster
23 How Uses DNS cnn.com (content provider) DNS TLD server GET foo.jpg 11 12 1 2 HTTP 3 5 4 6 7 global DNS server regional DNS server cluster 8 End user GET /foo.jpg Host: cache.cnn.com 9 Nearby cluster
24 How Uses DNS cnn.com (content provider) DNS TLD server 11 12 1 2 HTTP 3 5 4 6 7 global DNS server regional DNS server cluster 8 End user 9 10 Nearby cluster
25 How Works: Cache Hit cnn.com (content provider) DNS TLD server 1 2 HTTP 3 4 global DNS server regional DNS server cluster End user 5 6 Nearby cluster
OpJmizaJons to Improve Performance and Increase Cache Hit - Terminate the connecbon close to the end- user - UBlize proprietary protocols between the servers 26
Example $ dig www.audi.com ; <<>> DiG 9.6- ESV- R4- P3 <<>> www.audi.com ;; global opjons: +cmd ;; Got answer: ;; - >>HEADER<<- opcode: QUERY, status: NOERROR, id: 47918 ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 8, ADDITIONAL: 8 ;; QUESTION SECTION: ;www.audi.com. IN A RedirecBon ;; ANSWER SECTION: www.audi.com. 900 IN CNAME www.audi.com.edgesuite.net. www.audi.com.edgesuite.net. 8850 IN CNAME a1805.r.akamai.net. a1805.r.akamai.net. 20 IN A 23.62.61.73 a1805.r.akamai.net. 20 IN A 23.62.61.65 ;; AUTHORITY SECTION: r.akamai.net. 5386 IN NS n5r.akamai.net. r.akamai.net. 5386 IN NS n6r.akamai.net. r.akamai.net. 5386 IN NS n7r.akamai.net. r.akamai.net. 5386 IN NS n3r.akamai.net. r.akamai.net. 5386 IN NS n4r.akamai.net. r.akamai.net. 5386 IN NS n0r.akamai.net. r.akamai.net. 5386 IN NS n1r.akamai.net. r.akamai.net. 5386 IN NS n2r.akamai.net. 27 ;; ADDITIONAL SECTION:
Example $ dig www.audi.com ; <<>> DiG 9.6- ESV- R4- P3 <<>> www.audi.com ;; global opjons: +cmd ;; Got answer: ;; - >>HEADER<<- opcode: QUERY, status: NOERROR, id: 47918 ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 8, ADDITIONAL: 8 ;; QUESTION SECTION: ;www.audi.com. IN A RedirecBon ;; ANSWER SECTION: www.audi.com. 900 IN CNAME www.audi.com.edgesuite.net. www.audi.com.edgesuite.net. 8850 IN CNAME a1805.r.akamai.net. a1805.r.akamai.net. 20 IN A 23.62.61.73 a1805.r.akamai.net. 20 IN A 23.62.61.65 ;; AUTHORITY SECTION: r.akamai.net. 5386 IN NS n5r.akamai.net. r.akamai.net. 5386 IN NS n6r.akamai.net. Try: r.akamai.net. 5386 IN NS n7r.akamai.net. r.akamai.net. 5386 IN NS n3r.akamai.net. r.akamai.net. 5386 IN NS n4r.akamai.net. curl H Host:www.audi.com hxp://23.62.61.73/index.html r.akamai.net. 5386 IN NS n0r.akamai.net. r.akamai.net. 5386 IN NS n1r.akamai.net. r.akamai.net. 5386 IN NS n2r.akamai.net. 28 ;; ADDITIONAL SECTION:
Measuring CDNs UJlize a number of vantage points or Open DNS resolvers Every e.g., 60 secs, each vantage point queries an appropriate URL delivered by, or CNAMES (e.g., *.akamai.net) Similar technique can be used for other CDNs (e.g., Limemight) Low-Level DNS Server Edge Server 1 Edge Server 2. 29 PL Node Edge Server 3
Measuring CDNs By ujlizing a large number of vantage points or open resolvers it is possible to collect all the IPs of the CDNs! Example of measurement in 2009: AKAMAI CDN: Country # of IP United States 16,843 United Kingdom 1,690 Japan 1,622 Germany 1,103 Netherlands 857 France 722 Australia 514 Canada 438 Sweden 396 Hong Kong SAR 370 Others 3018 Total 27,573 Limelight CDN: Country # of IP United States 2,830 Germany 314 United Kingdom 300 Netherlands 199 Japan 126 Canada 121 France 120 Hong Kong SAR 83 China 53 Australia 1 Total 4,147 30
Measuring CDNs Berkeley Purdue day night 31
Where to Measure CDN and Web traffic? OpJon 1: IXP infrastructure Complex system Centralized monitoring Source: DE- CIX, 2012
Internet exchange Point (IXP) AS1 AS2 AS3 AS4 Layer-2 switch AS5 AS6
EsJmaJon of CDN and Web traffic AS (AS1) Link IXP AS2 AS4 AS3
IXP Results Around 70% of the traffic is Web (Large IXP, 2012) A relajve small number of large CDNs, Hosters, and Streaming providers are responsible for >50% of the traffic 35
Where to Measure CDN and Web traffic? OpJon 2: Private peering locajons ARBOR study: - 110+ ISPs / content providers - Including 3,000 edge routers and 100,000 interfaces - And an esjmated ~25% all inter- domain traffic AT&T Backbone study: Backbone, access, and mobile network Deutsche Telekom Study: Passive measurements from 20K residenjal users Source: DE- CIX, 2012
A Few Large CDNs and Datacenters are responsible for most of the Web Arbor Study (2009-13): ConsolidaJon of Web Traffic %Traffic due to CDNs 2009: 25% 2011: 35% 2013: 50% AT&T Study (2010)
A Few Large CDNs and Datacenters are responsible for most of the Web Deutsche Telekom Study (2010)
..and the CDN traffic increases 39