FS2You: Peer-Assisted Semi-Persistent Online Storage at a Large Scale



Similar documents
Online Storage and Content Distribution System at a Large-scale: Peer-assistance and Beyond

FS2You: Peer-Assisted Semipersistent Online Hosting at a Large Scale

Understanding the Roles of Servers in Large-scale Peer-Assisted Online Storage Systems

FS2You: Peer-Assisted Semi-Persistent Online Storage at a Large Scale

FS2You: Peer-Assisted Semi-Persistent Online Storage at a Large Scale

Peer-Assisted Online Storage and Distribution: Modeling and Server Strategies

Leveraging the Clouds for improving P2P Content Distribution Networks Performance

Quota: Rationing Server Resources in Peer-Assisted Online Hosting Systems

apt-p2p: A Peer-to-Peer Distribution System for Software Package Releases and Updates

Firewall Security: Policies, Testing and Performance Evaluation

Trace Driven Analysis of the Long Term Evolution of Gnutella Peer-to-Peer Traffic

A Dell Technical White Paper Dell Compellent

Object Request Reduction in Home Nodes and Load Balancing of Object Request in Hybrid Decentralized Web Caching

SharePoint Performance Optimization

Load Balancing in Fault Tolerant Video Server

WAVE: Popularity-based and Collaborative In-network Caching for Content-Oriented Networks

International Journal of Advanced Research in Computer Science and Software Engineering

Understanding Demand Volatility in Large VoD Systems

Self-Compressive Approach for Distributed System Monitoring

Key Considerations and Major Pitfalls

Large-Scale IP Traceback in High-Speed Internet

Improving Deployability of Peer-assisted CDN Platform with Incentive

CDN and Traffic-structure

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

The HPS3 service: reduction of cost and transfer time for storing data on clouds

Maginatics Cloud Storage Platform for Elastic NAS Workloads

WHITE PAPER SCALABLE NETWORKED STORAGE. Convergence of SAN and NAS with HighRoad. SPONSORED RESEARCH PROGRAM

A Week in the Life of the Most Popular BitTorrent Swarms

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique

Configuring SonicWALL TSA on Citrix and Terminal Services Servers

Assistant Professor Office: (780) Department of Electrical and Computer Engineering

Executive Brief for Sharing Sites & Digital Content Providers. Leveraging Hybrid P2P Technology to Enhance the Customer Experience and Grow Profits

Monitoring Large Flows in Network

Should Internet Service Providers Fear Peer-Assisted Content Distribution?

A Novel Load Balancing Optimization Algorithm Based on Peer-to-Peer

Set Up a VM-Series Firewall on the Citrix SDX Server

Moore s Law and Network Optimization

Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems

Designing a Cloud Storage System

IMPROVING QUALITY OF VIDEOS IN VIDEO STREAMING USING FRAMEWORK IN THE CLOUD

Introduction to NetApp Infinite Volume

This presentation covers virtual application shared services supplied with IBM Workload Deployer version 3.1.

The Definitive Guide to Cloud Acceleration

MagFS: The Ideal File System for the Cloud

Cloud Based E-Learning Platform Using Dynamic Chunk Size

An Empirical Study of Free-Riding Behavior in the Maze P2P File-Sharing System

From Centralization to Distribution: A Comparison of File Sharing Protocols

Scala Storage Scale-Out Clustered Storage White Paper

AmazingStore: Available, Low-cost Online Storage Service Using Cloudlets

On dynamic server provisioning in multichannel P2P live streaming

Cisco WAAS Express. Product Overview. Cisco WAAS Express Benefits. The Cisco WAAS Express Advantage

Strategies of collaboration in multi-channel P2P VoD streaming

Globule: a Platform for Self-Replicating Web Documents

Deployment Guide Microsoft IIS 7.0

A Measurement Study of Peer-to-Peer File Sharing Systems

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

Proactive Prioritized Mixing of Scalable Video Packets in Push-Based Network Coding Overlays

The Methodology Behind the Dell SQL Server Advisor Tool

White Paper. The Next Generation Video Codec Scalable Video Coding (SVC)

Internet Video Streaming and Cloud-based Multimedia Applications. Outline

The Role and uses of Peer-to-Peer in file-sharing. Computer Communication & Distributed Systems EDA 390

Guidelines for Selecting Hadoop Schedulers based on System Heterogeneity

Virtual Leased Line (VLL) for Enterprise to Branch Office Communications

DEPLOYMENT GUIDE Version 1.2. Deploying the BIG-IP system v10 with Microsoft Exchange Outlook Web Access 2007

Master of Applied Science, 2010 Thesis: Measurements on Large-Scale Peer-Assisted Live Streaming: A Survival Analysis Approach

Clustering in Peer-to-Peer File Sharing Workloads

Simulating a File-Sharing P2P Network

Java Bit Torrent Client

DAS, NAS or SAN: Choosing the Right Storage Technology for Your Organization

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Architectural Model for Wireless Peer-to-Peer (WP2P) File Sharing for Ubiquitous Mobile Devices

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

The Value of a Content Delivery Network

AUTOMATED AND ADAPTIVE DOWNLOAD SERVICE USING P2P APPROACH IN CLOUD

New Features in PSP2 for SANsymphony -V10 Software-defined Storage Platform and DataCore Virtual SAN

Subtitle. VoIP Migration Strategy. Keys to a Successful Planning and Transition. VoIP Migration Strategy Compare Business Products

Cisco and EMC Solutions for Application Acceleration and Branch Office Infrastructure Consolidation

Internet Content Distribution

DEPLOYMENT GUIDE. Deploying F5 for High Availability and Scalability of Microsoft Dynamics 4.0

DEPLOYMENT GUIDE Version 1.2. Deploying the BIG-IP System v9.x with Microsoft IIS 7.0 and 7.5

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB

A Survey on Content Delivery of Web-Pages

Bloom Filter based Inter-domain Name Resolution: A Feasibility Study

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

Data Center Storage Solutions

On the Feasibility of Prefetching and Caching for Online TV Services: A Measurement Study on Hulu

Cisco Wide Area Application Services Optimizes Application Delivery from the Cloud

Peak Hosting, founded in 2001, provides comprehensive ITas-a-service

TIME EFFICIENT DISTRIBUTED FILE STORAGE AND SHARING USING P2P NETWORK IN CLOUD

Clustering in Peer-to-Peer File Sharing Workloads

A Case Study of Traffic Locality in Internet P2P Live Streaming Systems

Violin: A Framework for Extensible Block-level Storage

Cloud Data Storage Services Considering Public Audit for Security

Energy Constrained Resource Scheduling for Cloud Environment

AppSense Environment Manager. Enterprise Design Guide

Impact of Control Theory on QoS Adaptation in Distributed Middleware Systems

Dimension Data Public Compute-as-a-Service Regional Rate Plan Pricing Options

Content Delivery Networks. Shaxun Chen April 21, 2009

HyLARD: A Hybrid Locality-Aware Request Distribution Policy in Cluster-based Web Servers

Characterize Performance in Horizon 6

Transcription:

FS2You: Peer-Assisted Semi-Persistent Online Storage at a Large Scale Ye Sun +, Fangming Liu +, Bo Li +, Baochun Li*, and Xinyan Zhang # Email: lfxad@cse.ust.hk + Hong Kong University of Science & Technology *University of Toronto # Roxbeam Inc. April 22, 2009 @ IEEE INFOCOM, Rio de Janeiro, Brazil

Online storage and distribution, peer-assisted and semi-persistence in in nature Design and full implementation of the system Extensive trace and measurement study system dynamics, file characteristics, service quality & server involvement

Online Storage and Content Distribution A new type of content sharing service online storage and distribution has become increasingly popular One-click hosting service: allow end users to easily upload files, of both small and large sizes, onto dedicated servers, to be shared among a group of interested users Ease of use: simple URL that can be shared to others, e.g., discussion forums Rapidshare, Megaupload, etc. Files hosted in either CDNs or dedicated large data centers Rapidshare, 1500 terabytes of online storage in its data centers Skyrocketing server bandwidth costs: yearly 15~20 million USD

Peer-assisted Online Storage and Distribution Design Objective Take advantage of peer bandwidth contributions to mitigate server bandwidth costs without sacrificing the ease of use, reliability, and performance of one-click hosting Balance two extremes of the cost-performance tradeoff P2P File Sharing Good scalability no guarantees on file availability A Seamless Integration Server-based Online Storage Guarantee file availability at the prohibitive cost of server bandwidth and storage Peer-assisted Online Storage and Distribution Couple peer upload contribution & strategic server provisioning in a complementary manner Improve file availability & downloading performance, while conserving substantial server costs

Design Challenges The reduction of server involvement may bring adverse effects on file availability & downloading performance How do we substantially conserve server bandwidth costs, while mitigating such adverse effects and maintaining an adequate level of service quality and user experience? As the system scales up to a large population, we intend to store contents that are as valuable to users as possible, with a limited pool of server storage While files will be available on a semi-persistent basis, how do we mitigate the drawback of degraded file availability with a limited pool of server storage?

FS2You: Architecture Tracking Server Channels (files) Info & MD5 Bootstrapping List of peers in channels Hosting Servers Upload Download Hosting A real-world large-scale peerassisted online storage system One of the most popular online storage systems in China Peers Upload Download

Peer Partnership & Overlay Management Combine coarse-grained tracking servers & decentralized gossiping Peers involved in a file are organized into an overlay for exchanging data availability info & file blocks Either downloading the file or holding a replica of the file Periodic partnership update Resilient to peer dynamics & Bound control overhead Periodic status-report (Peer ID & IP) & top N local channels Periodic refresh of peer list in each channel to assist peers to gain access to active partners that are most helpful

Content Delivery Each file is divided into fixed size blocks of 256 KB A Block Map (BM) the availability of blocks at each peer Periodic exchange of BMs among peers enables them to locate the needed blocks Retrieve distinct blocks from multiple partners Periodic scheduling sequentially request missing blocks up to NO. current active partners Request-from-server conditions No partners (unpopular file or connection fail) None of the partners hold the desired block Aggregate downloading rate from partners < 10 KB/second The threshold is empirically determined to prevent peers from aggressively consuming server bandwidth

Server-side Strategies: Uploading Service Not only provide online storage, but also cooperate with content distribution Uploading service No size/format limitations attract millions of users 500GB~1TB content routinely uploaded per day 1 copy stored in one of the servers that is nearest to the user requesting to upload Avoid consuming excessive storage resources Reduce uploading time Mitigate unnecessary inter-as traffic

Server-side Strategies: Downloading Service Complement peers to supply file blocks, especially to those suffering poor downloading rates How to properly satisfy a potentially large number of requests without incurring prohibitively high bandwidth costs? 1st block user experience Probabilistic serving based on file popularity File popularity index inversely proportional to NO. requests Direct peers in popular channels to largely rely on peer assistance rather than servers Allocate more server resources to unpopular files

Server-side Strategies: Hosting Service Semi-persistent availability with a limited pool of server storage Maintain the availability of recently/frequently accessed files, while replacing less popular & large ones when necessary File reference index File Size / Access Frequency (per day) If < a threshold (empirically 100) either small or frequently accessed remain persistent If > a threshold over a sustained period of time (empirically 5 days) evicted/replaced Store large files only if substantial user interests and popularity persist, in order to avoid excessive use of server storage

Trace Collection To evaluate the performance of FS2You, we have implemented a detailed logging mechanism 350 GB traces from 3.3 million users, from June 21 to July 18, 2008 Each peer reports activities & status to the trace server Download Event Summary (event-driven) Peer ID, Channel IDs, File size Time of open/close/completion Total downloaded volume, downloaded volume from servers File Source Snapshot (periodically, overhead/accuracy)

Peer assistance Peer dynamics and behaviors Reflect user demand & Fine tune server strategies File Characteristics Service Quality User Experience File availability & downloading rate

Overall Scale and Performance A large number of users Up to 80% contributed by P2P alleviate server load The architectural & protocol designs in FS2You can scale to a large number of peers, and to withstand the test of a tremendous volume of traffic (in the order of terabytes per day) over a long period of time The cost of server capacity has been substantially saved by peer assistance

File Characteristics: Popularity 47% compressed archives (e.g., zip/rar) most multimedia content 30% videos, 12% audio, 11% others Flatter than Zipf prediction Immutability of files, and the fetch-at-most-once behavior Well fitted with the stretched exponential distribution Useful for workload synthesis

File Characteristics: Peer Assistance Efficiency In general, popular files enjoy higher P2P efficiency Highly popular ones even enjoy 80%~90% P2P efficiency encouraging! Increasing noises variations in P2P efficiency increase as popularity decreases Interestingly, less popular ones can also enjoy high P2P efficiency, as some of them used to be popular with sufficient replicas among peers

Correlation: File Size & Popularity Users preference in large files 300 MB ~ 1 GB most popular range typical sizes of videos Empirical threshold for evicting large yet unpopular files: 1 GB/10 = 100 MB/hits Large files can survive only if sufficiently popular Many small files can remain available reasonable user demand & have not occupied excessive storage (72.4% < 100 MB, only 21.4% of server storage) Though prefer to download large files, not tend to keep large files in local storage

Server Involvement & Service Quality Valley potential negative effects of the collaboration between current design of request-from-server threshold & server-side probabilistic serving strategy Both files that are completely supplied by servers and those that are mainly supported by P2P enjoy high average downloading performance Most experienced favorable downloading performance: Avg. 66 KB/s; Lowest 40 KB/s

Summary Key challenges involved in peer-assisted online storage and distribution systems Architecture & protocol design of a real-world system: FS2You Couple peer assistance & strategic server provisioning in a complementary & transparent manner The system can conserve substantial server costs, while maintaining high service availability & downloading performance at a large scale Verified by extensive measurement study, supported by real-world traces from millions of users Further reveals many interesting observations on system dynamics, file characteristics, and server involvement

Summary The architectural & protocol designs in FS2You can scale to a large number of peers, and to withstand the test of a tremendous volume of traffic over a long period of time The cost of server capacity has been substantially saved by peer assistance The FS2You design provides excellent file availability & a satisfactory download experience to a large number of users with cost-effective server involvement

Thanks! Q&A