Inside Dropbox: Understanding Personal Cloud Storage Services



Similar documents
How To Write A Blog Post On Dropbox

Inside Dropbox: Understanding Personal Cloud Storage Services

Inside Dropbox: Understanding Personal Cloud Storage Services

Inside Dropbox: Understanding Personal Cloud Storage Services

Inside Dropbox: Understanding Personal Cloud Storage Services

Personal Cloud Storage: Usage, Performance and Impact of Terminals

Monitoring Cloud Services using Flow-Based Measurements. Idilio Drago

Background. Personal cloud services are gaining popularity

A Comparison between Centralized and Distributed Cloud Storage Data-Center Topologies

Workload Models and Performance Evaluation of Cloud Storage Services

Internet Storage Sync Problem Statement

QuickSync: Improving Synchronization Efficiency for Mobile Cloud Storage Services

Measurements for the Paranoid: The Effect of Encrypting Files in Cloud Storage

A First Look at Mobile Cloud Storage Services: Architecture, Experimentation and Challenge

DISSECTING VIDEO SERVER SELECTION STRATEGIES IN THE CDN [ICDCS 2011]

Personal Cloud Storage Benchmarks and Comparison

MOBILE APPLICATIONS AND CLOUD COMPUTING. Roberto Beraldi

Cloud Sync White Paper. Based on DSM 6.0

Modeling the Dropbox Client Behavior

Benchmarking Cloud Storage Systems

The HPS3 service: reduction of cost and transfer time for storing data on clouds

Secure Cloud Computing with FlexCloud

Monitoring commercial cloud service providers

Skyfiles: Efficient and Secure Cloud-assisted File Management for Mobile Devices

Uncovering the Big Players of the Web

Towards Web Service Classification using Addresses and DNS

WAN OPTIMIZATION. Srinivasan Padmanabhan (Padhu) Network Architect Texas Instruments, Inc.

Internet Management and Measurements Measurements

Airlift: Video Conferencing as a Cloud Service using Inter- Datacenter Networks

Efficient Batched Synchronization in Dropbox-like Cloud Storage Services

First Midterm for ECE374 03/24/11 Solution!!

Live Traffic Monitoring with Tstat: Capabilities and Experiences

Configuration Guide. BlackBerry Enterprise Service 12. Version 12.0

Efficient Batched Synchronization in Dropbox-like Cloud Storage Services

Scalable Internet Services and Load Balancing

Assignment # 1 (Cloud Computing Security)

Overview. Timeline Cloud Features and Technology

Monitoring WAAS Using Cisco Network Analysis Module. Information About NAM CHAPTER

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

Cisco WAAS Context-Aware DRE, the Adaptive Cache Architecture

Server Installation ZENworks Mobile Management 2.7.x August 2013

SVN5800 Secure Access Gateway

Configuration Guide BES12. Version 12.1

Where Do You Tube? Uncovering YouTube Server Selection Strategy

Configuration Guide BES12. Version 12.2

Technical Overview Simple, Scalable, Object Storage Software

Aspera Direct-to-Cloud Storage WHITE PAPER

CumuLogic Load Balancer Overview Guide. March CumuLogic Load Balancer Overview Guide 1

Dynamic Content Acceleration: Lightning-Fast Web Apps with Amazon CloudFront and Amazon Route 53

Frequently Asked Questions

Berkeley Ninja Architecture

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Cloud Storage Service Benchmarking: Methodologies and Experimentations

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services

SiteCelerate white paper

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Scalable Internet Services and Load Balancing

TCP IN A WORLD OF CLOUD SERVICES

Should Internet Service Providers Fear Peer-Assisted Content Distribution?

BlackBerry Enterprise Service 10. Secure Work Space for ios and Android Version: Security Note

Data Networks Summer 2007 Homework #3

Exploring the Cloud from Passive Measurements: the Amazon AWS Case

The Genealogy Cloud: Which Online Storage Program is Right For You Page , copyright High-Definition Genealogy. All rights reserved.

Cloud Gateway. Agenda. Cloud concepts Gateway concepts My work. Monica Stebbins

Testing and Restoring the Nasuni Filer in a Disaster Recovery Scenario

ASPERA HIGH-SPEED TRANSFER SOFTWARE. Moving the world s data at maximum speed

Accelerating Cloud Based Services

Office 365 Migration Performance & Server Requirements

Hitachi Content Platform (HCP)

LDA, the new family of Lortu Data Appliances

Datasheet iscsi Protocol

AKAMAI WHITE PAPER. Delivering Dynamic Web Content in Cloud Computing Applications: HTTP resource download performance modelling

Benchmarking Broadband Internet with Bismark

CloudFTP: A free Storage Cloud

Cloud Computing. Security Practices for General User. Examples of Popular Cloud Service Providers

Network traffic: Scaling

OVERVIEW OF TYPICAL WINDOWS SERVER ROLES

Key Components of WAN Optimization Controller Functionality

Deploying Riverbed wide-area data services in a LeftHand iscsi SAN Remote Disaster Recovery Solution

Multi-level Metadata Management Scheme for Cloud Storage System

Administering the Web Server (IIS) Role of Windows Server

ESPRESSO: An Encryption as a Service for Cloud Storage Systems

Administering the Web Server (IIS) Role of Windows Server 10972B; 5 Days

WAN Optimization, Web Cache, Explicit Proxy, and WCCP. FortiOS Handbook v3 for FortiOS 4.0 MR3

activecho Frequently Asked Questions

SECURE YOUR DATA EXCHANGE WITH SAFE-T BOX

Predicting Network Performance for Internet Activities Using a Web Browser

Introduction to Cloud Storage GOOGLE DRIVE

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

ANALYSIS OF THE EFFECTIVENESS IN IMAGE COMPRESSION FOR CLOUD STORAGE FOR VARIOUS IMAGE FORMATS

Getting Started with SandStorm NoSQL Benchmark

Cloud Storage Security

Testing & Assuring Mobile End User Experience Before Production. Neotys

Setting Up Resources in VMware Identity Manager

How To Use Attix5 Pro For A Fraction Of The Cost Of A Backup

DISK IMAGE BACKUP. For Physical Servers. VEMBU TECHNOLOGIES TRUSTED BY OVER 25,000 BUSINESSES

Cisco Wide Area Application Services Optimizes Application Delivery from the Cloud

Apache Hadoop. Alexandru Costan

Jay Aikat, Kevin Jeffay Derek O Neill, Ben Newton

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Transcription:

Inside Dropbox: Understanding Personal Cloud Storage Services Idilio Drago Marco Mellia Maurizio M. Munafò Anna Sperotto Ramin Sadre Aiko Pras IRTF Vancouver

Motivation and goals 1 Personal cloud storage services are already popular Dropbox in 2012 the largest deployed networked file system in history over 50 million users one billion files every 48 hours Little public information about the system How does Dropbox work? What are the potential performance bottlenecks? Are there typical usage scenarios?

Methodology How does Dropbox work? 2 Public information Native client, Web interface, LAN-Sync etc. Files are split in chunks of up to 4 MB Delta encoding, deduplication, encrypted communication To understand the client protocol MITM against our own client Squid proxy, SSL-bump and a self-signed CA certificate Replace a trusted CA certificate in the heap at run-time Proxy logs and decrypted packet traces

How does Dropbox (v1.2.52) work? 3 Clear separation between storage and meta-data/client control Sub-domains identifying parts of the service sub-domain Data-center Description client-lb/clientx Dropbox Meta-data notifyx Dropbox Notifications api Dropbox API control www Dropbox Web servers d Dropbox Event logs dl Amazon Direct links dl-clientx Amazon Client storage dl-debugx Amazon Back-traces dl-web Amazon Web storage api-content Amazon API Storage HTTP/HTTPs in all functionalities

How does Dropbox (v1.2.52) work? 4 Notification Kept open Not encrypted Device ID Folder IDs

How does Dropbox (v1.2.52) work? 4 Client control Login File hash Meta-data

How does Dropbox (v1.2.52) work? 4 Storage Amazon EC2 Retrieve vs. Store Sequential ACKs

Methodology Dropbox characterization 5 Rely on Tstat 1 to export layer-4 flows Isolate Dropbox flows DN-Hunter 2, TSL/SSL certificates, IP addresses Device IDs and folder IDs Use the knowledge from our own decrypted flows to Tag Dropbox flows e.g., storing or retrieving content Estimate the number of chunks in a flow 1 http://tstat.polito.it/ 2 DNS to the Rescue: Discerning Content and Services in a Tangled Web

Datasets 6 Type IP Addrs. Dropbox Flows Vol. (GB) Devices Campus 1 Wired 400 167,189 146 283 Campus 2 Wired/Wireless 2,528 1,902,824 1,814 6,609 Home 1 FTTH/ADSL 18,785 1,438,369 1,153 3,350 Home 2 ADSL 13,723 693,086 506 1,313 Total 4,204,666 3,624 11,561 42 consecutive days in March and April 2012 4 vantage points in Europe

Datasets 6 Type IP Addrs. Dropbox Flows Vol. (GB) Devices Campus 1 Wired 400 167,189 146 283 Campus 2 Wired/Wireless 2,528 1,902,824 1,814 6,609 Home 1 FTTH/ADSL 18,785 1,438,369 1,153 3,350 Home 2 ADSL 13,723 693,086 506 1,313 Total 4,204,666 3,624 11,561 42 consecutive days in March and April 2012 4 vantage points in Europe Number of IP addresses in home probes installations

Datasets 6 Type IP Addrs. Dropbox Flows Vol. (GB) Devices Campus 1 Wired 400 167,189 146 283 Campus 2 Wired/Wireless 2,528 1,902,824 1,814 6,609 Home 1 FTTH/ADSL 18,785 1,438,369 1,153 3,350 Home 2 ADSL 13,723 693,086 506 1,313 Total 4,204,666 3,624 11,561 42 consecutive days in March and April 2012 4 vantage points in Europe Number of IP addresses in home probes installations 11,561 unique devices 2nd capture in Campus 1 in June 2012

How much traffic to personal cloud storage? 7 2400 Number of IP addrs. 1600 800 0 24/03 31/03 07/04 14/04 21/04 28/04 05/05 icloud Dropbox SkyDrive Google Drive Others Server names to check popularity (DN-Hunter) 6 12 % adoption in home networks icloud tops in terms of devices

How much traffic to personal cloud storage? 8 0.2 0.15 YouTube Dropbox Share 0.1 0.05 0 24/03 31/03 07/04 14/04 21/04 28/04 05/05 Date Equivalent to 1/3 of YouTube volume at Campus 2 90 % of the Dropbox traffic is from the native client

How does the storage traffic look like? 9 1 Store 1 Retrieve Flow size CDF 0.8 0.6 0.4 0.2 0 1k 10k 100k 1M 10M100M 1G 0.8 0.6 0.4 0.2 0 1k Campus 1 Campus 2 Home 1 Home 2 10k 100k 1M 10M100M 1G Store: 40 % 80 %<100 kb Small files and deltas Larger retrieve flows

How does the storage traffic look like? 9 1 Store 1 Retrieve Flow size CDF 0.8 0.6 0.4 0.2 0 1k 10k 100k 1M 10M100M 1G 0.8 0.6 0.4 0.2 0 1k Campus 1 Campus 2 Home 1 Home 2 10k 100k 1M 10M100M 1G Store: 40 % 80 % < 100 kb Small files and deltas Larger retrieve flows CDF 1 0.8 0.6 0.4 0.2 0 Store 1 10 100 1 0.8 0.6 0.4 0.2 0 Retrieve Campus 1 Campus 2 Home 1 Home 2 1 10 100 Chunks per flow 80 % 10 chunks Remaining: up to 100 Limited by the client

Where are the servers located? 10 1 Storage 1 Control 0.8 0.8 CDF 0.6 0.4 0.2 0 80 90 100 110 120 Time (ms) 0.6 0.4 Campus 1 0.2 Campus 2 Home 1 0 Home 2 140 160 180 200 220 Time (ms) Minimum RTT per flow stable over 42 days PlanetLab experiments the same U.S. data centers worldwide less than 35 % of our users are from the USA

How is the performance far from the data centers? 11 10M θ 1M Throughput (bits/s) 100k 10k Chunks 1k 1 2-5 6-50 51-100 100 256 1k 4k 16k 64k 256k 1M 4M 16M 64M 400M Upload (bytes) Storage throughput in campuses Most flows experience a low throughput

How is the performance far from the data centers? 11 10M θ 1M Throughput (bits/s) 100k 10k 1k Chunks 1 100 256 1k 4k 16k 64k 256k 1M 4M 16M 64M 400M Upload (bytes) Flows carrying 1 chunk Size 4 MB, RTT 100 ms Most of them finish in TCP slow-start

How is the performance far from the data centers? 11 Application layer sequential ACKs

How is the performance far from the data centers? 11 10M θ 1M Throughput (bits/s) 100k 10k 1k Chunks 1 100 2-5 256 1k 4k 16k 64k 256k 1M 4M 16M 64M 400M Upload (bytes) Flows carrying several chunks Pause between chunks RTT and client/server reaction

How is the performance far from the data centers? 11 10M θ 1M Throughput (bits/s) 100k 10k Chunks 1k 1 2-5 6-50 51-100 100 256 1k 4k 16k 64k 256k 1M 4M 16M 64M 400M Upload (bytes) Flows carrying several chunks Transferring 100 chunks takes more than 30 s RTTs 10 s of inactivity

How is the performance far from the data centers? 11 10M θ 1M Throughput (bits/s) 100k 10k Chunks 1k 1 2-5 6-50 51-100 100 256 1k 4k 16k 64k 256k 1M 4M 16M 64M 400M Upload (bytes) Delaying acknowledgments Bundling chunk deployed after our 1st capture Distributing servers storage traffic is heavy!

How much improvement from chunk bundling? 12 New protocol released on Apr 2012 (v 1.4.0) Small chunks are bundled together Mar/Apr Jun/Jul Median Average Median Average Flow size Store 16.28 kb 3.91 MB 42.36 kb 4.35 MB Retrieve 42.20 kb 8.57 MB 70.69 kb 9.36 MB Throughput (kbits/s) Store 31.59 358.17 81.82 552.92 Retrieve 57.72 782.99 109.92 1293.72 Less small flows less TCP slow-start effects Average throughput is up to 65 % higher

How are other storage systems implemented? 13 Dropbox SkyDrive Wuala Google Drive Cloud Drive Chunking 4 MB variable variable 8 MB Bundling Deduplication Delta encoding Compression always never never smart never Comparison of design choices of different providers Benchmarking Personal Cloud Storage IMC 2013

How are other storage systems implemented? 13 Dropbox SkyDrive Wuala Google Drive Cloud Drive Chunking 4 MB variable variable 8 MB Bundling Deduplication Delta encoding Compression always never never smart never Are batches of files exchanged in a single transaction? Cloud Drive and Google Drive open several TCP connections per file

Are there typical usage scenarios? 14 100G 10G 1G Store (bytes) 100M 10M 1M 100k 10k 1k Home networks only 1k 10k 100k 1M 10M 100M 1G 10G 100G Retrieve (bytes) More downloads download/upload ratio up to 2.4 What about download/upload per user?

Are there typical usage scenarios? 14 100G Store (bytes) 10G 1G 100M 10M 1M Occasional: Users: 31 % Devices per user: 1.22 100k 10k 1k 1k 10k 100k 1M 10M 100M 1G 10G 100G Retrieve (bytes) Abandoned Dropbox clients No storage activity for 42 days

Are there typical usage scenarios? 14 100G Store (bytes) 10G 1G 100M 10M 1M 100k 10k 1k 1k 10k 100k 1M 10M 100M 1G 10G 100G Retrieve (bytes) Backup and content sharing Geographically dispersed devices Upload-only: Users: 6 % Uploads: 11 21 % Devices per user: 1.36 Download-only: Users: 26 % Downloads: 25 28 % Devices per user: 1.69

Are there typical usage scenarios? 14 100G Store (bytes) 10G 1G 100M 10M 1M 100k Heavy: Users: 37 % Uploads: 79 89 % Downloads: 72 75 % Devices per user: 2.65 10k 1k 1k 10k 100k 1M 10M 100M 1G 10G 100G Retrieve (bytes) Synchronization of content in a household

Conclusions 15 1st to analyze Dropbox usage on the Internet Cloud storage is a new data-intensive application Adoption above 6 % in our datasets Architecture and performance Bottlenecks from system design choices Extensive characterization of workload and usage User groups, number of devices, daily activity etc.

Questions? 16 Thank You. Anonymized traces and scripts http://traces.simpleweb.org/dropbox/