Uncovering the Big Players of the Web



Similar documents
Inside Dropbox: Understanding Personal Cloud Storage Services

HTTP/2: Operable and Performant. Mark

Live Traffic Monitoring with Tstat: Capabilities and Experiences

Personal Cloud Storage: Usage, Performance and Impact of Terminals

Getting More Information On Your Network Performance

Exploring the Cloud from Passive Measurements: the Amazon AWS Case

Inside Dropbox: Understanding Personal Cloud Storage Services

Towards Web Service Classification using Addresses and DNS

Web Conferencing Version 8.3 Troubleshooting Guide

CDN and Traffic-structure

THE PROXY SERVER 1 1 PURPOSE 3 2 USAGE EXAMPLES 4 3 STARTING THE PROXY SERVER 5 4 READING THE LOG 6

LARGE-SCALE INTERNET MEASUREMENTS FOR DIAGNOSTICS AND PUBLIC POLICY. Henning Schulzrinne (+ Walter Johnston & James Miller) FCC & Columbia University

Arnaud Becart ip- label 11/9/11

Distributed Systems. 25. Content Delivery Networks (CDN) 2014 Paul Krzyzanowski. Rutgers University. Fall 2014

Hypertext for Hyper Techs

Experian Secure Transport Service

Application Latency Monitoring using nprobe

Internet Traffic Evolution

Software Product Information. Faba5 Website

1. When will an IP process drop a datagram? 2. When will an IP process fragment a datagram? 3. When will a TCP process drop a segment?

Analyzing the Impact of YouTube Delivery Policies on User Experience

The Application Usage and Threat Report

PLATO Learning Environment System and Configuration Requirements for workstations. October 27th, 2008

Distributed Systems. 23. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2015

Analysing the impact of CDN based service delivery on traffic engineering

Anatomy of a Pass-Back-Attack: Intercepting Authentication Credentials Stored in Multifunction Printers

Cyber Security Workshop Ethical Web Hacking

Distributed Systems. 24. Content Delivery Networks (CDN) 2013 Paul Krzyzanowski. Rutgers University. Fall 2013

From Internet Data Centers to Data Centers in the Cloud

Inside Dropbox: Understanding Personal Cloud Storage Services

Where Do You Tube? Uncovering YouTube Server Selection Strategy

Protocolo HTTP. Web and HTTP. HTTP overview. HTTP overview

LBL Application Availability Infrastructure Unified Secure Reverse Proxy

Web Log Analysis for Identifying the Number of Visitors and their Behavior to Enhance the Accessibility and Usability of Website

HTTP Response Splitting

No. Time Source Destination Protocol Info HTTP GET /ethereal-labs/http-ethereal-file1.html HTTP/1.

HTTP. Internet Engineering. Fall Bahador Bakhshi CE & IT Department, Amirkabir University of Technology

IxLoad TM : Data HTTP, SSL, and FTP

Manual. Traffic Exchange

D. SamKnows Methodology 20 Each deployed Whitebox performs the following tests: Primary measure(s)

Load Balance Mechanism

PLATO Learning Environment 2.0 System and Configuration Requirements. Dec 1, 2009

The Effect of Caches for Mobile Broadband Internet Access

The Other 50% of Internet Traffic. Craig Labovitz

Exploring YouTube s Content Distribution Network Through Distributed Application-Layer Measurements: A First View

Using SAML for Single Sign-On in the SOA Software Platform

Facebook Smart Card FB _1800

Front-End Performance Testing and Optimization

NEFSIS DEDICATED SERVER

To ensure you have the appropriate equipment and settings please review the following: Software and Hardware Recommendations.

Akamai CDN, IPv6 and DNS security. Christian Kaufmann Akamai Technologies DENOG 5 14 th November 2013

How Do You Tube? Reverse Engineering the YouTube Video Delivery Cloud

Mobile Performance Testing

Indirection. science can be solved by adding another level of indirection" -- Butler Lampson. "Every problem in computer

Security-Assessment.com White Paper Leveraging XSRF with Apache Web Server Compatibility with older browser feature and Java Applet

All You Can Eat Realtime

DEPLOYMENT GUIDE Version 1.1. Deploying the BIG-IP LTM v10 with Citrix Presentation Server 4.5

Large scale observation and analysis of Amazon AWS traffic

Test Methodology White Paper. Author: SamKnows Limited

Three short case studies

Rise of the Machines: An Internet-Wide Analysis of Web Bots in 2014

Testing & Assuring Mobile End User Experience Before Production. Neotys

Access the GV-IP Camera through a broadband modem

Frequently Asked Questions for the USA TODAY e-newspaper

How To Login To Webex Online

MySciLEARN System Requirements. For educators and providers using the Fast ForWord and Reading Assistant programs

TP-LINK TD-W8901G. Wireless Modem Router. Advanced Troubleshooting Guide

A Look at the Consequences of Internet Censorship through an ISP Lens

Exploitation of Server Log Files of User Behavior in Order to Inform Administrator

McAfee Web Gateway 7.4.1

SSL Enforcer Documentation

Website Analysis. foxnews has only one mail server ( foxnewscommail.protection.outlook.com ) North America with 4 IPv4.

Networks and the Internet A Primer for Prosecutors and Investigators

MySciLEARN System Requirements. For educators and providers using the Fast ForWord and Reading Assistant programs

Minimum Computer System Requirements

Our My first DDoS attack. Velocity Europe 2011 Berlin Cosimo Streppone Operations Lead

REVERSE ENGINEERING THE YOUTUBE VIDEO DELIVERY CLOUD. Vijay Kumar Adhikari, Sourabh Jain, Yingying Chen, and Zhi-Li Zhang

Combating Web Fraud with Predictive Analytics. Dave Moore Novetta Solutions

Bit-Rate and Application Performance in Ultra BroadBand Networks

People Data and the Web Forms and CGI. HTML forms. A user interface to CGI applications

Initial Access and Basic IPv4 Internet Configuration

QoE-Aware Multimedia Content Delivery Over Next-Generation Networks

DATA COMMUNICATOIN NETWORKING

Transcription:

Uncovering the Big Players of the Web 3 rd TMA Workshop Vienna March 12 Vinicius Gehlen Alessandro Finamore Marco Mellia Maurizio M. Munafò TMA COST Action

Introduction 2 Nowadays Internet traffic volume is mainly HTTP + P2P Breakdown of downstream traffic of residential customers Mainly SSH, VoIP, DNS, email, etc. + A plethora of services!

Methodology 3 Focus only on HTTP traffic Rely on to generate flow-level HTTP logs L4: #bytes, #pkts, RTT, etc. L7: service type and meta-data (e.g. video) Rely on organization data base Each server IP is associated to its owner n 92.122.208.73 à AKAMAI TECHNOLOGIES

Dataset 4 3 vantage points (VPx) of an ISP in Italy Residential customers ADSL (VP2, VP3) + Fiber-To-The-Home (VP1) 1 week of traffic (20-24 June 2011)

5 OVERVIEW Which organizations? Volumes? Popularity?

Top10 (+ 1) organizations 6 Rank Org. Name % B % F 1 Google 22.7 12.7 2 Akamai 12.3 16.7 3 Leaseweb 6.3 1.1 4 Megaupload 5.5 0.2 5 Level3 4.7 1.9 6 Limelight 3.9 1.6 7 PSINet 3.2 0.2 8 Webzilla 2.9 0.3 9 Choopa 1.5 0.01 10 OVH 1.0 0.7 11 Facebook 0.9 4.2 Total % 65 40 Google handles 2x the Akamai volume Besides Google and Akamai, many others known (Level3, Limelight, Leaseweb, Megaupload) less known (PSINet, Webzilla, Choopa) >10k organizations but 65% of volume is due to only 11 big players

Organizations popularity 7 % IP clients that have contacted the organization at least one time Organization % Client Video Content SW Update Adv. & Others Google 97.1 YouTube - Google services Akamai 97.2 Vimeo Microsoft, Apple Facebook static content, ebay Leaseweb 64.3 Megavideo - publicbt.com Megaupload 15.6 Megavideo - FileHosting Level3 79.7 YouPorn - Limelight 72.5 Pornhub, Veoh Avast quantserve, tinypic, Photobucket betclick, wdig, trafficjunky PSINet 44.6 Megavideo Kaspersky Imageshack Webzilla 13.2 Adult Video - Filesonic, Depositfiles 97% of clients contact Google and Akamai 63% of client clients contact OVH Advertisement 90.6% of clients contact Facebook!??!!? Choopa 5.7 - - zshare OVH 63.1 Auditude - Telaxo, m2cai Facebook 90.6 Facebook - Facebook dynamic content

Why FB sees 90% of clients? 8 You visit nutella.com Slurp! There is an embedded object pointing to the FB fan page This generates a connection to FB So FB knows that you like nutella Privacy anyone?!?!

Why FB sees 90% of clients? 9 GET /plugins/like.php?href=http%3a%2f%2fwww.facebook.com %2FNutella.Italy&layout=box_count&show_faces=false&width=120&action=like&colorscheme =light&height=65 HTTP/1.1 Host: www.facebook.com User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/534.53.11 (KHTML, like Gecko) Version/5.1.3 Safari/534.53.10 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Referer: http://www.nutella.it/ Accept-Language: en-us Accept-Encoding: gzip, deflate Cookie: presence=em331242829euserfa21519421867a2estatefdsb2f0et2f_5b_5delm2fnulleuct2f133124 2225BEtrFnullEtwF838980386G331242829049H0EblcF0EsndF1CEchFDsubF_5b0_5dEp_5f151942186 7F2CC; p=6; c_user=1519421867; datr=fokptf9njodb9oshmlkp5e_ong..; lu=ggojzj70pxz6grarmpcnirxw; xs=1%3a889419cc0700aee88497c71b1015fa45%3a0%3a1331242820; locale=en_us

10 CONTENT SERVED VIDEO? SSL/TLS?

Which content is served? 11 Video content is highly predominant

Which type of content is served? 12 Video content is highly predominant 90% of Google is YouTube video download

Which type of content is served? 13 Video content is highly predominant 90% of Google is YouTube video download The majority of FileHosting is actually video content

Which type of content is served? 14 20% of Akamai is Facebook

Which type of content is served? 15 20% of Akamai is Facebook Google, Akamai e Facebook have some HTTPS traffic

Evolution of HTTPS 16 Compare HTTPS traffic from June 11 and October 11 % Bytes % Flows Organization Jun Oct Jun Oct Google 1.75 2.67 11.1 17.34 Akamai 3.46 2.95 10.55 17.6 Leaseweb 0.46 0.26 0.55 1.85 Level3 1.22 0.34 1.49 2.48 Limelight 0.05 0.78 1.82 2.16 Megaupload - - - - Facebook 13.3 26.8 15.35 24.3 No. of HTTPS connections are increasing for all the organizations 25% of Facebook connections were HTTPS in October 11 +7% of HTTPS volume for Akamai and Google in < 6 months

17

Flow size 18 The majority of the flows are small >50% of connection have < 10kB File Hosting organizations are serving a lot of short flows

Bulk download rate (1/2) 19 Focusing on connections with > 1MB, what is the download rate?

ORGANIZATIONS INFRASTRUCTURE BEHIND THE SCENES 20

RTT Latency towards the Internet 21 Minimum RTT is measured by Tstat on per-flow base Facebook has 2 locations (100ms e 170ms) Akamai and Limelight are the closest to the ISP (5ms) 3 Google datacenters are preferred Only <30% of request are served by the closest one (12ms)

Number of IPs 22 Organization No. % No. Top5 %bytes Google 3678 0.76 135 94.1 Akamai 10445 2.16 1255 86.1 Leaseweb 3833 0.79 546 80.0 Level3 1868 0.39 572 65.8 Limelight 1179 0.24 115 97.2 Megaupload 808 0.17 15 64.0 Facebook 338 0.07 27 74.5 Total 33798 7 4596-33800 IP addresses serve 65% of HTTP traffic Google handles 2x the Akamai volume with 1/3 of IPs Most of the traffic is served by few preferred IPs within an organization

Volume served by %IP 23

Bulk download rate (2/2) 24 Considering connection with > 1MB 1 2 All organizations but Akamai (and Facebook) have >90% connection >500kb/s Caching policies may have an impact: 1 Content already available à high bitrate 2 Content retrieved from backend à low bitrate

Conclusions 25 We investigated how the web looks like these days Some clear trends are visible Majority of traffic is handled by few big players HTTPS is becoming very popular Lot of datacenters to manage demand It is very difficult to understand Who owns and who serves the content Which policies are used How much data leaks to these players This makes a tangled web which is very hard to discern

26 RTT variation during the day

Breakdown del traffico HTTP Traffico HTTP su PDF: La frazione di traffico gestita da ciascuna organizzazione è costante nel tempo 27

Comparing days & locations 28 Short-term stability with marginal differences with respect to Days of the week Locations of the users