Uncovering the Big Players of the Web 3 rd TMA Workshop Vienna March 12 Vinicius Gehlen Alessandro Finamore Marco Mellia Maurizio M. Munafò TMA COST Action
Introduction 2 Nowadays Internet traffic volume is mainly HTTP + P2P Breakdown of downstream traffic of residential customers Mainly SSH, VoIP, DNS, email, etc. + A plethora of services!
Methodology 3 Focus only on HTTP traffic Rely on to generate flow-level HTTP logs L4: #bytes, #pkts, RTT, etc. L7: service type and meta-data (e.g. video) Rely on organization data base Each server IP is associated to its owner n 92.122.208.73 à AKAMAI TECHNOLOGIES
Dataset 4 3 vantage points (VPx) of an ISP in Italy Residential customers ADSL (VP2, VP3) + Fiber-To-The-Home (VP1) 1 week of traffic (20-24 June 2011)
5 OVERVIEW Which organizations? Volumes? Popularity?
Top10 (+ 1) organizations 6 Rank Org. Name % B % F 1 Google 22.7 12.7 2 Akamai 12.3 16.7 3 Leaseweb 6.3 1.1 4 Megaupload 5.5 0.2 5 Level3 4.7 1.9 6 Limelight 3.9 1.6 7 PSINet 3.2 0.2 8 Webzilla 2.9 0.3 9 Choopa 1.5 0.01 10 OVH 1.0 0.7 11 Facebook 0.9 4.2 Total % 65 40 Google handles 2x the Akamai volume Besides Google and Akamai, many others known (Level3, Limelight, Leaseweb, Megaupload) less known (PSINet, Webzilla, Choopa) >10k organizations but 65% of volume is due to only 11 big players
Organizations popularity 7 % IP clients that have contacted the organization at least one time Organization % Client Video Content SW Update Adv. & Others Google 97.1 YouTube - Google services Akamai 97.2 Vimeo Microsoft, Apple Facebook static content, ebay Leaseweb 64.3 Megavideo - publicbt.com Megaupload 15.6 Megavideo - FileHosting Level3 79.7 YouPorn - Limelight 72.5 Pornhub, Veoh Avast quantserve, tinypic, Photobucket betclick, wdig, trafficjunky PSINet 44.6 Megavideo Kaspersky Imageshack Webzilla 13.2 Adult Video - Filesonic, Depositfiles 97% of clients contact Google and Akamai 63% of client clients contact OVH Advertisement 90.6% of clients contact Facebook!??!!? Choopa 5.7 - - zshare OVH 63.1 Auditude - Telaxo, m2cai Facebook 90.6 Facebook - Facebook dynamic content
Why FB sees 90% of clients? 8 You visit nutella.com Slurp! There is an embedded object pointing to the FB fan page This generates a connection to FB So FB knows that you like nutella Privacy anyone?!?!
Why FB sees 90% of clients? 9 GET /plugins/like.php?href=http%3a%2f%2fwww.facebook.com %2FNutella.Italy&layout=box_count&show_faces=false&width=120&action=like&colorscheme =light&height=65 HTTP/1.1 Host: www.facebook.com User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/534.53.11 (KHTML, like Gecko) Version/5.1.3 Safari/534.53.10 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Referer: http://www.nutella.it/ Accept-Language: en-us Accept-Encoding: gzip, deflate Cookie: presence=em331242829euserfa21519421867a2estatefdsb2f0et2f_5b_5delm2fnulleuct2f133124 2225BEtrFnullEtwF838980386G331242829049H0EblcF0EsndF1CEchFDsubF_5b0_5dEp_5f151942186 7F2CC; p=6; c_user=1519421867; datr=fokptf9njodb9oshmlkp5e_ong..; lu=ggojzj70pxz6grarmpcnirxw; xs=1%3a889419cc0700aee88497c71b1015fa45%3a0%3a1331242820; locale=en_us
10 CONTENT SERVED VIDEO? SSL/TLS?
Which content is served? 11 Video content is highly predominant
Which type of content is served? 12 Video content is highly predominant 90% of Google is YouTube video download
Which type of content is served? 13 Video content is highly predominant 90% of Google is YouTube video download The majority of FileHosting is actually video content
Which type of content is served? 14 20% of Akamai is Facebook
Which type of content is served? 15 20% of Akamai is Facebook Google, Akamai e Facebook have some HTTPS traffic
Evolution of HTTPS 16 Compare HTTPS traffic from June 11 and October 11 % Bytes % Flows Organization Jun Oct Jun Oct Google 1.75 2.67 11.1 17.34 Akamai 3.46 2.95 10.55 17.6 Leaseweb 0.46 0.26 0.55 1.85 Level3 1.22 0.34 1.49 2.48 Limelight 0.05 0.78 1.82 2.16 Megaupload - - - - Facebook 13.3 26.8 15.35 24.3 No. of HTTPS connections are increasing for all the organizations 25% of Facebook connections were HTTPS in October 11 +7% of HTTPS volume for Akamai and Google in < 6 months
17
Flow size 18 The majority of the flows are small >50% of connection have < 10kB File Hosting organizations are serving a lot of short flows
Bulk download rate (1/2) 19 Focusing on connections with > 1MB, what is the download rate?
ORGANIZATIONS INFRASTRUCTURE BEHIND THE SCENES 20
RTT Latency towards the Internet 21 Minimum RTT is measured by Tstat on per-flow base Facebook has 2 locations (100ms e 170ms) Akamai and Limelight are the closest to the ISP (5ms) 3 Google datacenters are preferred Only <30% of request are served by the closest one (12ms)
Number of IPs 22 Organization No. % No. Top5 %bytes Google 3678 0.76 135 94.1 Akamai 10445 2.16 1255 86.1 Leaseweb 3833 0.79 546 80.0 Level3 1868 0.39 572 65.8 Limelight 1179 0.24 115 97.2 Megaupload 808 0.17 15 64.0 Facebook 338 0.07 27 74.5 Total 33798 7 4596-33800 IP addresses serve 65% of HTTP traffic Google handles 2x the Akamai volume with 1/3 of IPs Most of the traffic is served by few preferred IPs within an organization
Volume served by %IP 23
Bulk download rate (2/2) 24 Considering connection with > 1MB 1 2 All organizations but Akamai (and Facebook) have >90% connection >500kb/s Caching policies may have an impact: 1 Content already available à high bitrate 2 Content retrieved from backend à low bitrate
Conclusions 25 We investigated how the web looks like these days Some clear trends are visible Majority of traffic is handled by few big players HTTPS is becoming very popular Lot of datacenters to manage demand It is very difficult to understand Who owns and who serves the content Which policies are used How much data leaks to these players This makes a tangled web which is very hard to discern
26 RTT variation during the day
Breakdown del traffico HTTP Traffico HTTP su PDF: La frazione di traffico gestita da ciascuna organizzazione è costante nel tempo 27
Comparing days & locations 28 Short-term stability with marginal differences with respect to Days of the week Locations of the users