P2P Network Assembling A Model For Success

Size: px
Start display at page:

Download "P2P Network Assembling A Model For Success"

Transcription

1 Improving P2P Applications by Breaking the Architecture Symmetry Paweł J. Garbacki

2 Cover design: Joanna Garbacka,

3 Improving P2P Applications by Breaking the Architecture Symmetry Proefschrift ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus prof.dr.ir. J.T. Fokkema, voorzitter van het College voor Promoties, in het openbaar te verdedigen op dinsdag 9 december 2008 om 10:00 uur door Paweł Jacek GARBACKI doctorandus in de informatica geboren te Warschau, Polen

4 Dit proefschrift is goedgekeurd door de promotoren: Prof.dr.ir. H.J. Sips en Prof.dr.ir. M. van Steen Samenstelling promotiecommissie: Rector Magnificus voorzitter Prof.dr.ir. H.J. Sips Technische Universiteit Delft, promotor Prof.dr.ir. M. van Steen Vrije Universiteit Amsterdam, promotor Dr.ir. D.H.J. Epema Technische Universiteit Delft, copromotor Prof.dr.ir. R.L. Lagendijk Technische Universiteit Delft Prof.dr. K. Aberer EPFL, Switzerland Prof.dr. P. Felber Université de Neuchâtel, Switzerland Dr. Ch. Gkantsidis Microsoft Research, UK Published and distributed by: Paweł J. Garbacki ISBN: - ¼- ¼- ¼ -½ Keywords: peer-to-peer, heterogeneity, asymmetric architecture Copyright c 2008 by Paweł J. Garbacki All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without written permission of the author. Printed in The Netherlands by: Wöhrmann Print Service The work described in this thesis has been carried out in the ASCI graduate school. ASCI dissertation series number 170. Advanced School for Computing and Imaging This research was supported by a grant of the Netherlands Organisation for Scientific Research (NWO).

5 To my parents, with love and gratitude

6

7 CONTENTS i Contents 1 Introduction P2P applications File transfer, live streaming and video on demand Search for content Resource management in content delivery networks Heterogeneity of the Internet Problem statement Asymmetric P2P architectures Research contributions and thesis outline A case study of the BitTorrent P2P network The BitTorrent content distribution system Measurement framework Pre-processing Measurement Post-processing Measurement setup Measurement results Resource heterogeneity Configuration heterogeneity Social heterogeneity Summary Collaborative bandwidth exchange Bandwidth exchange incentives in P2P networks Related work The amortized tit-for-tat protocol Mechanisms of ATFT Collaborative downloads Amortization

8 ii CONTENTS Benefits of ATFT Analysis of ATFT System model Selection of the optimal number of helpers Download speedup Finding bartering partners Incentive for contribution Borrowers set size Performance evaluation Experimental setup Collaborative downloads evaluation results Amortization evaluation results Summary Optimizing peer relationships in a super-peer network Organizing peer relationships Related work The architecture of SOSPNET Architecture overview System model Two-level caching The search protocol The insert protocol Balancing the load among super-peers Discussion Performance model Notation Models of the semantic structure Optimal caching performance Performance evaluation Experimental setup Results Summary Managing resources with brokers Resource brokerage in latency-aware P2P networks Related work The problem of broker placement System model

9 CONTENTS iii Formal background Problem statement An example The broker-placement algorithm Solution outline The algorithm The computation of the region weights Algorithm analysis and improvements Worst-case complexity Optimizations Performance Evaluation Experimental setup Performance results Summary Conclusion Conclusions Suggestions for future work Bibliography 111 Summary 127 Samenvatting 131 Acknowledgements 135 About the author 137

10

11 1 Chapter 1 Introduction The peer-to-peer (P2P) paradigm is the basis for the design of a new wave of distributed Internet applications. P2P applications are built of the resources of their users, with users providing service to one another. In this sense, a P2P design is often contrasted with a client-server design where the roles of the service provider (server) and the service consumer (client) are clearly separated at the architectural level. In P2P applications, the roles of client and server are merged into the single role of peer. The Merriam-Webster Online dictionary [8] defines a peer as one that is of equal standing with another. This definition reflects well the role of a peer in a traditional, symmetric P2P architecture which imposes a functional equality of peer roles. From a practical point of view, the architectural symmetry of a P2P system achieved by assigning peers similar roles may be, however, shifting the balance too far to one end of the spectrum where the other end is occupied by the client-server architectures. In a highly heterogenous environment as the Internet, some diversity of peer roles may be required to fully exploit the differences in peer capacities instead of ignoring them. Assigning different roles to peers breaks the architectural symmetry of traditional P2P applications, and the resulting architectural asymmetry may lead to significant improvements in their robustness and performance. The relevant heterogeneity aspects of the Internet environment as well as the nature of the architecture asymmetry reflected in peer role diversity depend on the type of the application. In this thesis we address the challenge of designing and analyzing asymmetric architectures for the dominant P2P application types which are file downloading, live video streaming, video on demand, content search, and content delivery networks. The remaining part of this chapter is organized as follows. Section 1.1 describes the main application domains of P2P technology. Section 1.2 categorizes the heterogeneity aspects of the Internet environment. Section 1.3 gives the problem statement of this thesis. Section 1.4 describes the existing asymmetric P2P architectures, while Section 1.5 presents the research contributions and outlines the structure of this thesis.

12 2 1. Introduction 1.1 P2P applications The application domain of P2P technology is fairly broad [20, 89, 94, 117, 124]. The most common uses of this technology include file transfer, live streaming and video on demand, content search, and content delivery networks. In this section we briefly describe those uses and give examples of existing P2P applications. More details on the individual P2P applications are provided in later chapters of this thesis File transfer, live streaming and video on demand Early P2P networks such as Napster [10] and Gnutella [116] used simple, single-source, ftp-like protocols for transferring files between peers. The content shared in the early P2P networks consisted mostly of small-size music files. With the later increase of the popularity of large-size movie files, the bandwidth required to move files around in a P2P network became the bottleneck. The absence of protocol-enforced incentive mechanisms resulted in the plague of free-riders [16], i.e., users exploiting (bandwidth) resources of others while refusing to contribute their own resources. The revolution came with the BitTorrent [40] protocol the first widely used P2P data transfer protocol that enforces bandwidth contributions from the peers. BitTorrent employs a technique called tit-for-tat that effectively limits a peer s download bandwidth based on its upload bandwidth. Furthermore, peers in BitTorrent obtain data from multiple sources at a time which allows for data exchange between peers with different link capacities, fully utilizing those capacities. The topology formed by BitTorrent peers has the structure of a random mesh. Subsequent P2P file transfer protocols target some of the limitations of BitTorrent. Slurpie [121] improves the download source selection strategy of BitTorrent in order to have a more efficient bandwidth utilization. Avalanche [58] employs a network coding technique to increase the diversity of the data across peers and thus increases the chance that two randomly selected peers can establish a data exchange connection. The continuous increase in Internet bandwidth capacity offerings for end users opens up possibilities for live streaming and video on demand (VoD). Video streaming over the Internet has been one of the most widely investigated research areas over the last few years [22, 33, 65, 142]. Most research efforts concentrate on designing video distribution systems that can support a large number of users. Multicasting has been proposed to provide a scalable video streaming service, even in the presence of non-homogeneous receivers [73, 86, 105]. Multicasting is a natural paradigm for live video streaming, but today s Internet does not offer native support for multicasting. In the absence of such support, there have been many proposals to use an overlay multicast for live streaming [3, 32, 38, 64, 133, 143]. Also mesh-based P2P networks have been used for this purpose [92, 146].

13 1.1 P2P applications 3 The concept of multicast has been extended for supporting near video on demand (nvod) services. In the simplest approach one periodically starts a new broadcast [18]. More elaborate systems propose to divide the video into segments and distribute each segment in a different multicast channel [14, 65, 137]. A different approach has been suggested in the BASS [47] system which builds an nvod service on the BitTorrent protocol. BASS assumes the existence of a streaming server and uses the P2P network to alleviate the load imposed on the server. Even though BASS reduces the load at the server by a significant amount, the design of the system is still server oriented, and hence, the bandwidth requirements at the server increase linearly with the number of users. The BiToS [138] system, which is also based on BitTorrent, offers serverless VoD. The main idea of BiToS is to divide the missing pieces into the high-priority set and the remaining-piece set, and to request the pieces in the remaining-piece set with lower priority. The basic idea of piece prioritization has been extended with a more efficient piece distribution scheme in the Give-to-Get system [95]. Also mechanisms such as network coding and optimized overlay topology management have proven their efficiency in a VoD environment [21] Search for content Early P2P protocols such as Gnutella [116] and Freenet [39] concentrated mainly on efficient content location. The content was usually present in the form of files shared by the users of a P2P network. The P2P protocols for searching are traditionally divided into the two classes of structured and unstructured approaches. Structured P2P networks assume that each content item is attached a unique identifier or key. The items can then be located based on the value of the corresponding key. Due to the conceptual similarity with an algorithmic data structure called hash table, structured P2P networks are often referred to as distributed hash tables (DHTs). Each peer in a structured P2P network is responsible for the content items that have keys that belong to a certain subspace of the key space. The assignment of key subspaces to individual peers determines the topology of the overlay interconnecting the peers. By exploiting some specific properties of the key space, structured P2P networks optimize searching. Typically, the cost of search in a structured P2P network measured in the number of hops, i.e., the number peers consulted while routing the query, is a sublinear function of the number of peers in the system. If the key specified in a search query exists and the system operation is not disrupted, guarantees can be provided that the query will be resolved successfully. The details of the assignment of keys to peers and the search query routing algorithm differ between various designs. The Chord [129] system organizes peers into a logical ring. The Content Addressable Network (CAN) [115] maps keys to points in a multi-dimensional Cartesian space. Pastry [118] and Tapestry [147] represent keys with numerical values and employ a longest prefix routing technique for locating

14 4 1. Introduction content. The use of structured P2P networks reaches beyond file sharing. The Chord system architecture has been applied for file mirroring in the Cooperative File System (CFS) [46] in which multiple providers of content store and serve each other s files. A Chord-based DNS [42] provides a host name lookup service that maps host names to IP addresses and other host-related information. CAN has been used successfully in large scale storage management systems such as OceanStore [76] and Farsite [17]. Unstructured P2P networks do not route queries based on abstract keys but rather employ a generic search technique such as flooding, random walks or expanding ring time-to-live (TTL) search. Each visited peer checks if the locally stored content meets the criteria of the search query. Compared to the functionality of searching for keys offered by structured P2P networks, the queries in unstructured P2P networks can be more expressive. In particular, keyword searches or range queries, which are difficult to support in a structured network, are trivial to realize in an unstructured network. However, the unstructured approaches pay for the querying flexibility with performance and reliability. A hard guarantee of locating a content item can be provided only if the search query is propagated to all peers in the system. A number of unstructured P2P network architectures have been proposed to date but only a few of them have become popular. A typical example of an unstructured search protocol is flooding with a TTL implemented in the Gnutella [116] network. Several optimizations to the original Gnutella protocol intended to improve its load balancing, scalability, and response time have been proposed in [35]. The loose topological dependencies between peers in an unstructured network open up possibilities for organizing the network topology to exploit the semantic properties of the content for more efficient searching. Semantic clustering employed in systems as Proxsem [31] and Edutella [100] increase the query locality by grouping together content items which are semantically related Resource management in content delivery networks File transfer, live streaming, VoD, and content location can be collectively described as mechanisms for content management. Apart from content management, also computer resource management can be provided with a P2P solution. In many applications, computer resources such as disk space, computing power and bandwidth available at one peer have value for other peers. Typical examples of such applications are content delivery networks (CDNs) [124, 134] that lower the costs of Web hosting while improving the scalability. Moreover, CDNs increase the resilience to a sudden overload following publication, a phenomenon that has been nicknamed the Slashdot effect, after a popular web site that periodically links to under-provisioned servers, driving unsustainable levels of traffic to them. CDNs place the replicas of Web sites on the peer-managed resources of their users and distribute the load in order to improve the response time. Individual Web

15 1.2 Heterogeneity of the Internet 5 sites are usually small in size compared to the multimedia content exchanged in file sharing P2P networks. Therefore, not the storage disk space or the transfer bandwidth, but rather the latency of accessing the Web sites becomes the bottleneck for which the system is optimized. The Globule [6] CDN assigns to each peer serving Web site replicas and to each client requesting those replicas a point in a metric space where the distance between points approximates the internode latency [130]. The latency-space abstraction allows for the easy computation of client request concentration points and placement of Web site replicas in the proximity of these points [131]. The clients are sent to close-by replicas with DNS redirection. An alternative to Globule, the Coral [49] CDN, is built on top of a key/value indexing infrastructure called distributed sloppy hash table (DSHT). DSHT is similar to a traditional DHT but it provides a weaker consistency model. Coral allows clients to locate nearby cached copies of Web sites without querying more distant nodes. It also prevents hot spots in the infrastructure, even under degenerate loads by using node clustering provided with an epidemic algorithm. Another CDN built on top of a DHT is the SCAN [36] network. SCAN combines dynamic replica placement with a self-organizing applicationlevel multicast tree to meet the service quality and the server resource usage constraints. It utilizes an underlying distributed object routing and location system which is implemented on top of the Pastry DHT [118]. One of the main goals of SCAN is to keep the number of replicas at a minimum to reduce the replication overhead. A number of systems [127, 128] have been designed specifically to address the problem of the Slashdot effect or, in a broader sense, of flashcrowds, which are large bursts of requests over a short time period. These systems focus solely on mitigating the client request load and keep the optimization of the request latency out of their scope. 1.2 Heterogeneity of the Internet Most of the P2P applications are designed with the Internet as the intended deployment platform. The peer role equality promoted by the pure P2P designs contrasts, however, different heterogeneity aspects of the Internet environment. A central concept in the Internet philosophy is the diversity of computer and network resources. The distributed ownership of Internet resources results in a multitude of often very different configurations of hardware and software components. The wide scope of the applicability of Internet technology results in a user base which is socially diverse. Below we identify instances of the resource, configuration and social heterogeneity of the Internet and discuss their impact on the performance and robustness of P2P applications. Resource heterogeneity. The support for heterogenous computer and network resources is the fundament upon which the Internet is built. The processing speed, memory capacity and disk space differ across the Internet-connected computers. Compute-

16 6 1. Introduction intensive P2P applications should be able to adjust the volume of the workload to the processing speed and memory capacity. The content exchange P2P protocols have to respect disk-space limits. Apart from the computer resources, also the network properties have to be taken into account in the design of P2P applications. Depending on the type of the application, the relevant bottleneck property of the network is either the bandwidth or the latency. The bandwidth is crucial for applications where peers exchange high-volume data such as file sharing P2P networks. The latency, on the other hand, plays an important role in applications providing live user experience such as video streaming and systems in which the communicating peers exchange content items of relatively small sizes as in CDNs. Configuration heterogeneity. Many components of the Internet infrastructure, individual computers, and P2P applications, can be configured. The ownership of the Internet network infrastructure is partitioned among Internet service providers (ISPs). A P2P application aware of network partitioning can optimize its topology to reduce the volume of the traffic crossing ISP borders. The reduction of cross-isp traffic has an important economical benefit as the contracts between ISPs are negotiated based on the amount of traffic coming in and out of their networks. A network configuration aspect which is controlled by the infrastructure owners as well as the users of individual computers is the connectivity determining whether two peers can communicate directly. Direct communication is possible only if at least one of the communicating peers can accept external connections. Due to security risks, many Internet-connected computers are shielded with firewalls and network address translators (NATs) that block incoming connections. Firewalls and NATs are either operated at the ISP level or they are integrated with the operating systems of the individual computers. Enabling communication between shielded peers requires techniques such as connection relaying or firewall traversing. Although Internet end users have little influence on the configuration of the network infrastructure, they have full control over the P2P application. The application uptime is determined by the users who decide for how long the application will run on their computers. The uptime determines the availability periods of the resources controlled by the P2P application. Peers with long uptimes acting as well known access points can be used for bootstrapping newcomers. Social heterogeneity. The user-centric nature of P2P systems requires addressing the differences in social aspects of the Internet with the same attention as resource and configuration heterogeneity. Social aspects of Internet communities such as the nature of the relations between users, their content interests, expert knowledge, social commitment, and language, geographical and cultural backgrounds differ across individual users. The nature of the social relationships between users can be specified in some P2P applications. A closer relation usually implies a higher level of trust between users which translates in the context of a P2P network to less restricted access to the user resources. A semantic correlation between user interests is known to exist in content sharing P2P networks.

17 1.3 Problem statement 7 Clustering of users with similar interests improves the performance of content search and provides a basis for content recommendation. The expert knowledge and commitment to the community can be exploited by assigning specific roles to selected users in a P2P system. In some file sharing communities, a group of committed users is assigned the role of moderator. The responsibility of the moderators is to remove inconsistent metadata from the system. By expressing the social commitment in a form of a ranking such as top content injectors or top uploaders, P2P applications stimulate user contributions. The language, geographical and cultural diversity of the communities, if identified and properly interpreted, can improve P2P system usability. 1.3 Problem statement The heterogeneity of the Internet environment, if ignored or not handled properly in P2P application design, can have a negative impact on application performance. Some P2P designs try to reduce the negative influence of heterogeneity by masking it from the applications and the users. In this thesis we take a different approach by not masking heterogeneity but rather using it in favor of the application. We address the research problem of exploiting Internet heterogeneity by breaking the architectural symmetry of traditional P2P applications in an effort to improve their performance and robustness. This highlevel problem statement can be detailed in the following series of finer-grained questions of assessing the level of the heterogeneity in real-world P2P networks, and of proposing architectural modifications specific to different application types. How heterogenous is the environment of Internet-deployed P2P networks? Assessing the extent of the heterogeneity of the environment in which P2P networks are deployed is required to motivate the work presented in this thesis and to provide realistic parameters for the models used to evaluate the solutions that we will devise. The assessment should be done by measuring existing P2P networks. The deployment size of the popular P2P networks poses a practical challenge of building a measurement infrastructure able to deal with systems consisting of millions of peers. How can we exploit asymmetric architectures to improve file sharing, live streaming and VoD? The heterogeneity of peer bandwidth capacities as well as the diversity of bandwidth usage patterns open up possibilities for bandwidth utilization improvements. In particular, we are interested in the performance impact of the discrepancy in the bandwidth capacity among peers as well as between the upload and download links of the same peer. The diversity of bandwidth usage patterns opens up possibilities for idle peers to transfer their bandwidth resources to peers downloading data. Such bandwidth transfer requires, however, rethinking of the resource-exchange model of current content transfer protocols.

18 8 1. Introduction How can we make asymmetric content search networks more efficient and robust? The research on improving the performance of content search by leveraging the heterogeneity of peers in terms of available bandwidth and processing power has resulted in asymmetric architectures based on super-peers. The super-peer architectures proposed to date (see Section 1.4) use a fixed topology where each peer is connected with a single, randomly selected super-peer. Building a super-peer network with a dynamically organizing topology and super-peer selection optimized for search performance and robustness is a challenging research problem. How do the asymmetric architectures fit into resource management in CDNs? The management of peer resources in a CDN can be seen as a task required by the main functionality of the system, which is serving Web sites. It makes much sense to delegate this task to selected peers with higher capacities that can accept additional work without adversely affecting the performance of serving Web sites. The design of an architecture for resource management in CDNs needs to address the issues of resource discovery, allocation and load balancing. 1.4 Asymmetric P2P architectures Asymmetric architectures have been an active field of research in P2P systems. The concept of leveraging the heterogeneity of peers by exploiting high-capacity nodes in the system design has proven to have great potential [145]. The resulting architectures break the symmetry of pure P2P systems by assigning additional responsibilities to highcapacity nodes called super-peers. In a super-peer network, a super-peer acts as a server to client (ordinary, weak) peers. Weak peers submit requests to their super-peers and receive service from them. Super-peers are connected to each other by an overlay network of their own, submitting and answering requests on behalf of the weak peers. KaZaa [83] and Morpheus [9], which are both based on the FastTrack [4] protocol, are widely used file sharing systems that make use of super-peers. Although FastTrack is a proprietary technology with no detailed documentation, it is known that FastTrack peers are automatically elected to become super-peers if they have sufficient bandwidth and processing power (a configuration parameter may allow users to disable this feature). A central bootstrapping server provides new peers with a list of one or more super-peers to which they can connect. Super-peers index the files shared by peers connected to them, and proxy search requests on behalf of these peers. All queries are therefore initially directed to the super-peers. An extension of the basic Gnutella [116] system has an architecture based on ultrapeers [123], which are conceptually equivalent to super-peers. Any new peer with enough bandwidth and CPU power immediately becomes an ultrapeer and establishes connections with other ultrapeers, forming an overlay network. A new ultrapeer is required to estab-

19 1.5 Research contributions and thesis outline 9 lish a predefined minimum number of connections to client peers within a specified time. If this minimum is not reached, the ultrapeer turns into a regular client peer and can try to become an ultrapeer again after some period of time. Super-peer architectures integrate naturally with P2P systems based on semantic clustering [87] such as the Edutella [100] network. Systems for semantic clustering create a logical layer on top of the base P2P network topology by grouping peers with similar content interests. The clustering is performed by matching the semantic information provided by the peers to clusters, with each cluster being maintained by a super-peer. In addition to controlling the internal structure of the cluster, super-peers are responsible also for routing messages between peers from different clusters. Super-peer architectures have also been proposed for structured P2P networks [57, 97]. Such architectures group nearby peers based on some criterion such as network latency or adjacency in the key space, and organize the communication between groups using a super-peer layer. To find a peer that is responsible for a key, the top layer overlay network routes among the super-peers to determine the group responsible for the key. The responsible group then uses an intra-group overlay network to determine the specific peer that is responsible for the key. The lookup time in structured super-peer networks depends on the size of the state maintained by each super-peer and on the total number of super-peers. Some architectures [97] are even able to guarantee a constant-time lookup. 1.5 Research contributions and thesis outline In this thesis we investigate the possibilities of leveraging the inherent heterogeneity of the Internet environment in the asymmetry of P2P application design. The scope of our research spans from the measurement, analysis, and modeling of deployed P2P networks, through optimizations of existing P2P designs, to proposing novel P2P architectures. Below we describe the main contributions of this thesis. Measuring P2P networks (Chapter 2). We describe the design, implementation and deployment of a framework for measuring large-scale P2P networks, which we have used to measure the BitTorrent network and the popular content indexing site Suprnova.org. Over a period of three years we have collected traces spanning in total several months which we analyze to identify patterns and extract outliers. The analysis builds a case for the characterization of the heterogeneity of P2P networks and provides parameters for models used in the performance evaluation of various solutions proposed in this thesis. The content of this chapter is based on our research published in [68, 111]. Improving bandwidth utilization with bandwidth exchange (Chapter 3). We propose a protocol called amortized tit-for-tat (ATFT) that improves the bandwidth utilization in file sharing, live streaming and VoD P2P networks. The ATFT protocol provides incentives for peers with idle bandwidth to support ongoing downloads, effectively in-

20 10 1. Introduction creasing the bandwidth capacity of the P2P system. ATFT implicitly addresses in its design the issue of peers with asymmetric (ADSL) links. We implement ATFT making it the integral part of Tribler, a popular P2P client compatible with the BitTorrent protocol. An analytical and experimental study indicates that with realistic settings, ATFT reduces the download time more than threefold compared to the current state-of-the-art transfer protocols. The content of this chapter is based on our research published in [51, 53, 56, 112]. Improving search performance in super-peer networks (Chapter 4). Addressing the limitations of the existing super-peer networks, we propose a self-organizing super-peer network architecture (SOSPNET). Unlike other super-peer networks, SOSP- NET adapts the topology of inter-peer connections to reflect the semantic correlation between the content items and the peers in an effort to improve the locality of search requests. In addition to the improved search performance, SOSPNET offers simple solutions to the generally difficult problems of fault tolerance and load balancing. Using a trace-based model of a P2P network, we show that SOSPNET achieves close-to-optimal search performance and offers a better robustness than alternative architectures for content searching. The content of this chapter is based on our research published in [52, 54]. Improving resource management in CDNs (Chapter 5). We propose an architecture for resource management in CDNs in which high-capacity peers take the role of resource broker and become a mediator between the resource providers and the resource requesters. We propose a method for deciding on the sets of resources that are managed by specific brokers which allows to locate easily resources with certain latency properties. We then combine this method with an algorithm that controls the load imposed on the brokers preventing them from becoming overloaded. We show that this algorithm is optimal up to a linear factor in the number of resources in the system. The content of this chapter is based on our research published in [55].

21 11 Chapter 2 A case study of the BitTorrent P2P network A proper motivation for the mechanisms proposed in this thesis requires a thorough understanding of the properties of the environments for which those mechanisms are intended. The Internet environment, with its heterogeneity aspects relevant for P2P networks, can be reliably characterized by studying actually deployed P2P networks. At the time of writing this thesis, the most popular P2P network was the BitTorrent system. BitTorrent, originally designed to transfer files, has been successfully employed for other types of P2P applications including live video streaming and video on demand. A number of services built around the BitTorrent system provide the functionality missing in the original BitTorrent design including content search. We believe that the widespread popularity and the universal applicability of BitTorrent makes it a representative P2P system and an excellent subject for a measurement study characterizing P2P environments. The distributed nature of BitTorrent and its weak dependency on central components preclude the possibility of collecting the system data at a single point. To meet the scale requirements of collecting data from individual peers, we have designed a custom measurement framework and deployed it on a distributed super computer. The framework is built as a composite architecture that consists of components that collect data from the central services for peer and content discovery as well as from the individual peers. The data collected during two measurement periods in total span a period of several months. The remaining part of this chapter is organized as follows. Section 2.1 describes the basis of the BitTorrent system and the accompanying content discovery services. Section 2.2 outlines our measurement framework while Section 2.3 describes the deployment details and the configuration of the framework during the measurements. Section 2.4 presents and interprets the data collected during the measurements. Section 2.5 summarizes this chapter.

22 12 2. A case study of the BitTorrent P2P network file pieces seeder leecher piece transfer Figure 2.1: File piece exchange in a BitTorrent swarm. 2.1 The BitTorrent content distribution system BitTorrent [40] is a content distribution protocol optimized for the dissemination of large content items (i.e., files) concurrently to many receivers. The files distributed in the Bit- Torrent network are divided into pieces, which are individually exchanged between peers. Piece exchange is controlled by the tit-for-tat mechanism that enforces fairness in the bandwidth contributions of the peers. BitTorrent tit-for-tat implements a bartering economy in which a peer selects to upload pieces to a limited number of peers that recently contributed the most pieces to the uploading peer. Peers uploading pieces at a higher rate can therefore expect higher download speeds. The dependency between the upload and the download rates of peers builds incentives for data contributions, and BitTorrent is historically the first widely deployed P2P content transfer protocol that does provide such incentives. BitTorrent uses specific terminology to describe the system elements. The set of peers downloading the same file at the same time is referred to as a swarm. The peers that have obtained the entire file but choose to stay in the network and serve the content to other peers for free are called seeders. The peers whose downloads are still in progress are named leechers. The exchange of the file pieces between peers in BitTorrent system is illustrated in Figure 2.1. Each file distributed in BitTorrent has an associated metadata file with a name traditionally appended with the extension.torrent. Hence, the metadata files are often referred

23 2.1 The BitTorrent content distribution system 13 Web site mirrors Web site Tracker Peer 1 Peer 2 points to torrent files server torrent file points to Peer n search for content get torrent file get peers User new peer connect to peers Peer 1 Peer 2 Peer n points to Figure 2.2: Content and peer discovery in BitTorrent system. to as torrents. In addition to the typical information included in content metadata such as the content file names and sizes, a torrent contains the identity (i.e., the IP address and the port number) of a tracker. A tracker is a component of a BitTorrent network which provides peer discovery functionality. More precisely, a tracker maintains a list of all online and connectable (non-firewalled) peers downloading or seeding a certain set of content files. Upon request from a peer who is interested in some file, a tracker returns a random sample from the list of peers it maintains. For reliability, the torrent can contain a list of several alternative trackers maintaining information about the corresponding swarm. Usually, trackers are responsible for peer discovery in multiple swarms. The obvious scalability bottleneck inherent to the centralized nature of a tracker has resulted in a recent extension to the base BitTorrent protocol that replaces trackers with the Kademlia DHT [93] for distributed peer discovery [44]. Despite the superior scalability offered by the DHT-based solution, presently a vast majority of the BitTorrent swarms are still managed by centralized trackers. Another important type of meta information included in the torrents are the SHA1 hashes for all pieces of the content. Building hash-based content validation into the protocol provides a mechanism for preventing content pollution. Separation of the hashes from the content itself by embedding them in torrents has implications on the way fake content filtered is out from the BitTorrent network. In principle, if a torrent has been obtained from a trusted source, a peer can use it to reliably validate every downloaded piece. The sources for torrents are usually torrent file servers which are accessible through dedicated Web sites. Those Web sites are often mirrored for better scalability. A common technique employed to remove fake files from the Web sites is content moderation. On a moderated site, a community of selected, highly trusted users called moderators manually inspect the

24 14 2. A case study of the BitTorrent P2P network submitted torrents for their consistency with the described content and decide which files are posted online. In the example of the Suprnova.org [13] site, moderation has proven to be very reliable in fake content filtering. Unfortunately, the manual metadata inspection is expensive and does not scale well with the content availability, and hence, not many among the torrent distribution sites are currently moderated. Nevertheless, most of those Web sites offer a possibility for posting comments on the torrents which can be used for marking fake files. Apart from fake content filtering, the main function of the torrent distribution Web sites is to provide a content location service. BitTorrent does not support any content search functionality on the protocol level, so the Web sites take the role of the gateways to the system, allowing users to locate the content they are interested in. In addition to a typical keyword search, the Web sites support also the categorization of the files. The categories are usually fixed and they represent the type of the file content. The categories are usually defined hierarchically. For instance, the generic category Movies can be divided into more specific subcategories as Comedy, Action, Adventure, etc. The content categorization allows for locating content by top-down browsing rather than file-name searching. The browsing is more handy in situations when the user, rather than looking for a specific file, is interested in a certain content type. The interactions between different BitTorrent components involved in content and peer discovery are shown in Figure Measurement framework The extent of control decentralization in P2P systems as BitTorrent makes the measurements of those systems a challenging task. In principle, gathering detailed peer statistics is possible only by tracking individual peers. The resource consumption of the measuring system is thus proportional to the size of the measured P2P system and the variety of the measured aspects. To reduce the otherwise preventive costs, it is common among the measurement efforts to concentrate on a sample of the peers and to capture only certain aspects of the P2P system. Another way of reducing the cost of the measurements is to modify the P2P client software by adding functionality for collecting the data directly from the peers. The data collected at each peer can then be archived in a central repository, of which maintenance is basically the only cost for conducting the measurements. Despite its obvious advantages including the first-hand accuracy of the collected data and the low operational cost, the intrusiveness of this approach is not acceptable for many users. User reluctance to any software collecting information about the local interactions, commonly known as spyware, is fully understandable from the perspective of privacy protection. Collecting peer statistics locally is in most of the cases not an option and more indirect measurement methods are needed. Considering all the pros and cons of the different measurement approaches, we have

25 2.2 Measurement framework 15 Figure 2.3: The measurement framework. designed a distributed measurement framework that captures the properties of the BitTorrent network without intruding on the network operation. Our measurement framework does not require any changes in the BitTorrent client, the tracker, or the torrents distribution Web site. All information about the network is obtained (ab)using the standard communication protocol of the peers, trackers and Web sites. Our measurement framework consists of several components involved in the process of data collecting and analyzing, presented in Figure 2.3. The process itself is divided into three stages: pre-processing, measurement, and post-processing. During pre-processing, high-level statistics describing the BitTorrent swarms are collected from the Web sites. Based on those statistics, swarms are selected for the measurements that involve direct contact with the tracker and with individual peers. The data collected during the measurements is analyzed during post-processing. The components involved in the three stages of the data collecting and analysis process take the form of four data repositories, four custom data-collecting scripts, and one modified BitTorrent client. We will now describe the components involved in each stage of the measurement process in the order of their appearance in the figure going from left to right Pre-processing As we have explained in Section 2.1, BitTorrent uses Web sites as metadata directories. The Web site statistics are collected by the WebCrawler script. WebCrawler traverses the torrents distribution Web sites, parses their content and extracts the relevant information. A typical Web site describes a torrent with the associated content name, the date and time when the torrent was posted, the size of the content file, the number of seeders and leechers, and the category classifying the content. A screenshot of a table with content information as presented on a typical torrents distribution site is shown in Fig-

26 16 2. A case study of the BitTorrent P2P network Figure 2.4: Information organization on a typical torrents distribution Web site. ure 2.4. The data collected by WebCrawler is stored in the Web site stats repository. The information exposed on the Web sites does not provide a complete description of the associated swarms. In particular, the identities of the trackers required to establish connections with the BitTorrent peers are not available on the Web. To fill this information gap, WebCrawler downloads also the torrents linked from the Web pages and stores them in the torrents repository. Even when limiting ourselves to the files indexed by a single torrents distribution site, the total number of the peers downloading those files can be prohibitively large for measurement software which has only a limited amount of resources available. The statistics collected in the Web site stats and the torrents repositories allow to select the swarms that are relevant for a particular measurement objective Measurement The measurement itself uses two independent approaches to collect peer statistics. These approaches differ in the method of discovering peers in the swarm and they can be described as the active start and the passive start measurements. The active start measurements involve the GetPeers script that connects to the tracker and explicitly requests the list of peers in the swarm. Since the protocol of the tracker allows for requesting only a small number of randomly selected peers, the GetPeers script continuously polls the tracker for new peers at a constant time interval. The list of the discovered peers is passed to the PingPeers script that contacts individual peers and obtains statistics such as the number of file pieces they possess and their uptime. The peers that are not connectable as a result of being located behind a firewall or a network address translator (NAT), are not registered at the trackers, and even if they were, the PingPeers script would not be able to establish a connection with them. The discovery of non-connectable peers is the task of the passive start measurements. The passive start measurements register at the tracker a number of BitTorrent peers with modified code called ListenPeers that act as honeypots, waiting for other, also non-connectable peers to establish a connection with them. The code modifications, apart from preventing ListenPeers clients from downloading any data

27 2.3 Measurement setup 17 from the swarm, enable the logging of the IP addresses of the discovered peers. The peers discovered during the active or the passive start measurements are probed from multiple network locations by the TrackPeers scripts with the purpose of characterizing the topology of the Internet links interconnecting the peers and measuring the corresponding round trip times (RTTs). The operation of the TrackPeers scripts can be described as a multi-source traceroute in that the measurements are performed concurrently from multiple network locations Post-processing The data describing peers collected during the measurements is analyzed in the postprocessing stage. Post-processing aggregates and correlates the peer-level statistics into a system-level view of the P2P network. The data on the piece download progress is used to estimate the peer download bandwidth. The IP addresses of the peers are mapped to the geographical locations of their origin. The connectivity graph depicting the topology of the network interconnecting the peers is constructed. The network graph is then segmented based on the Autonomous Systems (AS) adhesion of the network routers and the estimated RTTs are attached to the network connections. 2.3 Measurement setup We have performed two independent measurements of the BitTorrent network. The first, long-run measurement spans a period of 8 months from June 2003 to March 2004, while the second, short-run measurement was performed from May 5 to May 11, The objective of the long-run measurement was to observe the trends in the BitTorrent network at a coarse granularity but over a long period of time. The short-run measurements, on the other hand, capture more detailed network statistics but over a short time period. An important decision that could affect the credibility of the measurements is the selection of the torrents distribution site. In the case of both the long-run and the short-run measurements, we simply select the largest (in terms of the number of torrents) Web site. At the time of performing the long-run measurements, this was Suprnova.org with around 50,000 torrents, and during the short-run measurements this was Piratebay.org [12] with nearly 120,000 torrents. We have used two platforms for the deployment of our measurement framework. The components performing active start measurements have been deployed on 100 of the 400 nodes available on our Distributed ASCI Supercomputer (DAS) [24], located in the Netherlands. For the passive start measurements, we have used 50 PlanetLab [106] nodes to run ListenPeers, and 300 PlanetLab nodes for the deployment of the probes used by the multi-source traceroute invoked by the TrackPeers scripts. The 300 traceroute nodes

RESEARCH ISSUES IN PEER-TO-PEER DATA MANAGEMENT

RESEARCH ISSUES IN PEER-TO-PEER DATA MANAGEMENT RESEARCH ISSUES IN PEER-TO-PEER DATA MANAGEMENT Bilkent University 1 OUTLINE P2P computing systems Representative P2P systems P2P data management Incentive mechanisms Concluding remarks Bilkent University

More information

An Introduction to Peer-to-Peer Networks

An Introduction to Peer-to-Peer Networks An Introduction to Peer-to-Peer Networks Presentation for MIE456 - Information Systems Infrastructure II Vinod Muthusamy October 30, 2003 Agenda Overview of P2P Characteristics Benefits Unstructured P2P

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 Load Balancing Heterogeneous Request in DHT-based P2P Systems Mrs. Yogita A. Dalvi Dr. R. Shankar Mr. Atesh

More information

Varalakshmi.T #1, Arul Murugan.R #2 # Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam

Varalakshmi.T #1, Arul Murugan.R #2 # Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam A Survey on P2P File Sharing Systems Using Proximity-aware interest Clustering Varalakshmi.T #1, Arul Murugan.R #2 # Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam

More information

The Bittorrent P2P File-sharing System: Measurements And Analysis J.A. Pouwelse, P. Garbacki, D.H.J. Epema, H.J. Sips Department of Computer Science,

The Bittorrent P2P File-sharing System: Measurements And Analysis J.A. Pouwelse, P. Garbacki, D.H.J. Epema, H.J. Sips Department of Computer Science, The Bittorrent P2P File-sharing System: Measurements And Analysis J.A. Pouwelse, P. Garbacki, D.H.J. Epema, H.J. Sips Department of Computer Science, Delft University of Technology, the Netherlands BitTorrent

More information

A PROXIMITY-AWARE INTEREST-CLUSTERED P2P FILE SHARING SYSTEM

A PROXIMITY-AWARE INTEREST-CLUSTERED P2P FILE SHARING SYSTEM A PROXIMITY-AWARE INTEREST-CLUSTERED P2P FILE SHARING SYSTEM Dr.S. DHANALAKSHMI 1, R. ANUPRIYA 2 1 Prof & Head, 2 Research Scholar Computer Science and Applications, Vivekanandha College of Arts and Sciences

More information

Decentralized Peer-to-Peer Network Architecture: Gnutella and Freenet

Decentralized Peer-to-Peer Network Architecture: Gnutella and Freenet Decentralized Peer-to-Peer Network Architecture: Gnutella and Freenet AUTHOR: Jem E. Berkes umberkes@cc.umanitoba.ca University of Manitoba Winnipeg, Manitoba Canada April 9, 2003 Introduction Although

More information

P2P: centralized directory (Napster s Approach)

P2P: centralized directory (Napster s Approach) P2P File Sharing P2P file sharing Example Alice runs P2P client application on her notebook computer Intermittently connects to Internet; gets new IP address for each connection Asks for Hey Jude Application

More information

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution Peer-to-Peer Networks Chapter 6: P2P Content Distribution Chapter Outline Content distribution overview Why P2P content distribution? Network coding Peer-to-peer multicast Kangasharju: Peer-to-Peer Networks

More information

A Measurement of NAT & Firewall Characteristics in Peer to Peer Systems

A Measurement of NAT & Firewall Characteristics in Peer to Peer Systems A Measurement of NAT & Firewall Characteristics in Peer to Peer Systems L. D Acunto, J.A. Pouwelse, and H.J. Sips Department of Computer Science Delft University of Technology, The Netherlands l.dacunto@tudelft.nl

More information

Multicast vs. P2P for content distribution

Multicast vs. P2P for content distribution Multicast vs. P2P for content distribution Abstract Many different service architectures, ranging from centralized client-server to fully distributed are available in today s world for Content Distribution

More information

The Role and uses of Peer-to-Peer in file-sharing. Computer Communication & Distributed Systems EDA 390

The Role and uses of Peer-to-Peer in file-sharing. Computer Communication & Distributed Systems EDA 390 The Role and uses of Peer-to-Peer in file-sharing Computer Communication & Distributed Systems EDA 390 Jenny Bengtsson Prarthanaa Khokar jenben@dtek.chalmers.se prarthan@dtek.chalmers.se Gothenburg, May

More information

8 Conclusion and Future Work

8 Conclusion and Future Work 8 Conclusion and Future Work This chapter concludes this thesis and provides an outlook on future work in the area of mobile ad hoc networks and peer-to-peer overlay networks 8.1 Conclusion Due to the

More information

BitTorrent Peer To Peer File Sharing

BitTorrent Peer To Peer File Sharing BitTorrent Peer To Peer File Sharing CS290F: Networking for Multimedia Mini PhD Major Area Exam I) Introduction Content distribution is an important topic in networking and has been evolving from the start

More information

Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems

Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems Kunwadee Sripanidkulchai Bruce Maggs Hui Zhang Carnegie Mellon University, Pittsburgh, PA 15213 {kunwadee,bmm,hzhang}@cs.cmu.edu

More information

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at distributing load b. QUESTION: What is the context? i. How

More information

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing Department of Computer Science Institute for System Architecture, Chair for Computer Networks File Sharing What is file sharing? File sharing is the practice of making files available for other users to

More information

N6Lookup( title ) Client

N6Lookup( title ) Client CS 640: Introduction Networks AdityaAkella Peer-to-Peer Lecture 24 -to Computer p2p Uses Downloading: Searching Centralized Flooding Smarter Routing file of sharing p2p The (Freenet, (Gnutella, flooding

More information

A Review on Efficient File Sharing in Clustered P2P System

A Review on Efficient File Sharing in Clustered P2P System A Review on Efficient File Sharing in Clustered P2P System Anju S Kumar 1, Ratheesh S 2, Manoj M 3 1 PG scholar, Dept. of Computer Science, College of Engineering Perumon, Kerala, India 2 Assisstant Professor,

More information

PEER TO PEER FILE SHARING USING NETWORK CODING

PEER TO PEER FILE SHARING USING NETWORK CODING PEER TO PEER FILE SHARING USING NETWORK CODING Ajay Choudhary 1, Nilesh Akhade 2, Aditya Narke 3, Ajit Deshmane 4 Department of Computer Engineering, University of Pune Imperial College of Engineering

More information

Internet Content Distribution

Internet Content Distribution Internet Content Distribution Chapter 4: Content Distribution Networks (TUD Student Use Only) Chapter Outline Basics of content distribution networks (CDN) Why CDN? How do they work? Client redirection

More information

Globule: a Platform for Self-Replicating Web Documents

Globule: a Platform for Self-Replicating Web Documents Globule: a Platform for Self-Replicating Web Documents Guillaume Pierre Maarten van Steen Vrije Universiteit, Amsterdam Internal report IR-483 January 2001 Abstract Replicating Web documents at a worldwide

More information

Illustrations: Your Illustrator Cover design: Joke Herstel, Wenk

Illustrations: Your Illustrator Cover design: Joke Herstel, Wenk blabla It s me Illustrations: Your Illustrator Cover design: Joke Herstel, Wenk blabla Proefschrift ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector

More information

Web Email DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

Web Email DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing) 1 1 Distributed Systems What are distributed systems? How would you characterize them? Components of the system are located at networked computers Cooperate to provide some service No shared memory Communication

More information

Overlay Networks. Slides adopted from Prof. Böszörményi, Distributed Systems, Summer 2004.

Overlay Networks. Slides adopted from Prof. Böszörményi, Distributed Systems, Summer 2004. Overlay Networks An overlay is a logical network on top of the physical network Routing Overlays The simplest kind of overlay Virtual Private Networks (VPN), supported by the routers If no router support

More information

Peer-to-Peer File Sharing Across Private Networks Using Proxy Servers

Peer-to-Peer File Sharing Across Private Networks Using Proxy Servers Peer-to-Peer File Sharing Across Private Networks Using Proxy Servers by Shruti Dube DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING INDIAN INSTITUTE OF TECHNOLOGY, KANPUR May 2008 Peer-to-Peer File Sharing

More information

ENABLING SEMANTIC SEARCH IN STRUCTURED P2P NETWORKS VIA DISTRIBUTED DATABASES AND WEB SERVICES

ENABLING SEMANTIC SEARCH IN STRUCTURED P2P NETWORKS VIA DISTRIBUTED DATABASES AND WEB SERVICES ENABLING SEMANTIC SEARCH IN STRUCTURED P2P NETWORKS VIA DISTRIBUTED DATABASES AND WEB SERVICES Maria Teresa Andrade FEUP / INESC Porto mandrade@fe.up.pt ; maria.andrade@inescporto.pt http://www.fe.up.pt/~mandrade/

More information

Simulating a File-Sharing P2P Network

Simulating a File-Sharing P2P Network Simulating a File-Sharing P2P Network Mario T. Schlosser, Tyson E. Condie, and Sepandar D. Kamvar Department of Computer Science Stanford University, Stanford, CA 94305, USA Abstract. Assessing the performance

More information

Unit 3 - Advanced Internet Architectures

Unit 3 - Advanced Internet Architectures Unit 3 - Advanced Internet Architectures Carlos Borrego Iglesias, Sergi Robles Carlos.Borrego@uab.cat,Sergi.Robles@uab.cat Departament d Enginyeria de la Informació i de les Comunicacions Universitat Autònoma

More information

Information Searching Methods In P2P file-sharing systems

Information Searching Methods In P2P file-sharing systems Information Searching Methods In P2P file-sharing systems Nuno Alberto Ferreira Lopes PhD student (nuno.lopes () di.uminho.pt) Grupo de Sistemas Distribuídos Departamento de Informática Universidade do

More information

How To Create A P2P Network

How To Create A P2P Network Peer-to-peer systems INF 5040 autumn 2007 lecturer: Roman Vitenberg INF5040, Frank Eliassen & Roman Vitenberg 1 Motivation for peer-to-peer Inherent restrictions of the standard client/server model Centralised

More information

Professor Yashar Ganjali Department of Computer Science University of Toronto. yganjali@cs.toronto.edu http://www.cs.toronto.

Professor Yashar Ganjali Department of Computer Science University of Toronto. yganjali@cs.toronto.edu http://www.cs.toronto. Professor Yashar Ganjali Department of Computer Science University of Toronto yganjali@cs.toronto.edu http://www.cs.toronto.edu/~yganjali Announcements Programming assignment 2 Extended Deadline: Fri.

More information

Style Characterization of Machine Printed Texts

Style Characterization of Machine Printed Texts Style Characterization of Machine Printed Texts Andrew D. Bagdanov This book is typeset by the author using L A TEX2 ε. The main body of the text is set using the Computer Modern family of fonts. The images

More information

Multi-Datacenter Replication

Multi-Datacenter Replication www.basho.com Multi-Datacenter Replication A Technical Overview & Use Cases Table of Contents Table of Contents... 1 Introduction... 1 How It Works... 1 Default Mode...1 Advanced Mode...2 Architectural

More information

Trace Driven Analysis of the Long Term Evolution of Gnutella Peer-to-Peer Traffic

Trace Driven Analysis of the Long Term Evolution of Gnutella Peer-to-Peer Traffic Trace Driven Analysis of the Long Term Evolution of Gnutella Peer-to-Peer Traffic William Acosta and Surendar Chandra University of Notre Dame, Notre Dame IN, 46556, USA {wacosta,surendar}@cse.nd.edu Abstract.

More information

Clustering in Peer-to-Peer File Sharing Workloads

Clustering in Peer-to-Peer File Sharing Workloads Clustering in Peer-to-Peer File Sharing Workloads F. Le Fessant, S. Handurukande, A.-M. Kermarrec & L. Massoulié INRIA-Futurs and LIX, Palaiseau, France Distributed Programming Laboratory, EPFL, Switzerland

More information

Adapting Distributed Hash Tables for Mobile Ad Hoc Networks

Adapting Distributed Hash Tables for Mobile Ad Hoc Networks University of Tübingen Chair for Computer Networks and Internet Adapting Distributed Hash Tables for Mobile Ad Hoc Networks Tobias Heer, Stefan Götz, Simon Rieche, Klaus Wehrle Protocol Engineering and

More information

P2P Storage Systems. Prof. Chun-Hsin Wu Dept. Computer Science & Info. Eng. National University of Kaohsiung

P2P Storage Systems. Prof. Chun-Hsin Wu Dept. Computer Science & Info. Eng. National University of Kaohsiung P2P Storage Systems Prof. Chun-Hsin Wu Dept. Computer Science & Info. Eng. National University of Kaohsiung Outline Introduction Distributed file systems P2P file-swapping systems P2P storage systems Strengths

More information

P2P File Sharing: BitTorrent in Detail

P2P File Sharing: BitTorrent in Detail ELT-53206 Peer-to-Peer Networks P2P File Sharing: BitTorrent in Detail Mathieu Devos Tampere University of Technology Department of Electronics & Communications Engineering mathieu.devos@tut.fi TG406 2

More information

Architectures and protocols in Peer-to-Peer networks

Architectures and protocols in Peer-to-Peer networks Architectures and protocols in Peer-to-Peer networks Ing. Michele Amoretti [amoretti@ce.unipr.it] II INFN SECURITY WORKSHOP Parma 24-25 February 2004 Contents - Definition of Peer-to-Peer network - P2P

More information

PSON: A Scalable Peer-to-Peer File Sharing System Supporting Complex Queries

PSON: A Scalable Peer-to-Peer File Sharing System Supporting Complex Queries PSON: A Scalable Peer-to-Peer File Sharing System Supporting Complex Queries Jyoti Ahuja, Jun-Hong Cui, Shigang Chen, Li Lao jyoti@engr.uconn.edu, jcui@cse.uconn.edu, sgchen@cise.ufl.edu, llao@cs.ucla.edu

More information

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL This chapter is to introduce the client-server model and its role in the development of distributed network systems. The chapter

More information

Interoperability of Peer-To-Peer File Sharing Protocols

Interoperability of Peer-To-Peer File Sharing Protocols Interoperability of -To- File Sharing Protocols Siu Man Lui and Sai Ho Kwok -to- (P2P) file sharing software has brought a hot discussion on P2P file sharing among all businesses. Freenet, Gnutella, and

More information

Trace analysis of Tribler BuddyCast. V. Jantet, D. Epema, M. Meulpolder

Trace analysis of Tribler BuddyCast. V. Jantet, D. Epema, M. Meulpolder Trace analysis of Tribler BuddyCast V. Jantet, D. Epema, M. Meulpolder Trace analysis of Tribler BuddyCast Inter ship report in Computer Science Parallel and Distributed Systems group Faculty of Electrical

More information

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

1. Comments on reviews a. Need to avoid just summarizing web page asks you for: 1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of

More information

Web Application Hosting Cloud Architecture

Web Application Hosting Cloud Architecture Web Application Hosting Cloud Architecture Executive Overview This paper describes vendor neutral best practices for hosting web applications using cloud computing. The architectural elements described

More information

Mapping the Gnutella Network: Macroscopic Properties of Large-Scale Peer-to-Peer Systems

Mapping the Gnutella Network: Macroscopic Properties of Large-Scale Peer-to-Peer Systems Mapping the Gnutella Network: Macroscopic Properties of Large-Scale Peer-to-Peer Systems Matei Ripeanu, Ian Foster {matei, foster}@cs.uchicago.edu Abstract Despite recent excitement generated by the peer-to-peer

More information

File sharing using IP-Multicast

File sharing using IP-Multicast File sharing using IP-Multicast Kai Trojahner, Peter Sobe University of Luebeck, Germany Institute of Computer Engineering email: sobe@iti.uni-luebeck.de Abstract: File sharing systems cause a huge portion

More information

Peer-to-Peer: an Enabling Technology for Next-Generation E-learning

Peer-to-Peer: an Enabling Technology for Next-Generation E-learning Peer-to-Peer: an Enabling Technology for Next-Generation E-learning Aleksander Bu lkowski 1, Edward Nawarecki 1, and Andrzej Duda 2 1 AGH University of Science and Technology, Dept. Of Computer Science,

More information

IPTV AND VOD NETWORK ARCHITECTURES. Diogo Miguel Mateus Farinha

IPTV AND VOD NETWORK ARCHITECTURES. Diogo Miguel Mateus Farinha IPTV AND VOD NETWORK ARCHITECTURES Diogo Miguel Mateus Farinha Instituto Superior Técnico Av. Rovisco Pais, 1049-001 Lisboa, Portugal E-mail: diogo.farinha@ist.utl.pt ABSTRACT IPTV and Video on Demand

More information

JXTA TM : Beyond P2P File Sharing the Emergence of Knowledge Addressable Networks

JXTA TM : Beyond P2P File Sharing the Emergence of Knowledge Addressable Networks JXTA TM : Beyond P2P File Sharing the Emergence of Knowledge Addressable Networks Bernard Traversat tra@jxta.org JXTA Chief Architect Sun Microsystems 2005 JavaOne SM Conference Session 7208 Extended and

More information

Politehnica University of Timisoara. Distributed Mailing System PhD Report I

Politehnica University of Timisoara. Distributed Mailing System PhD Report I Politehnica University of Timisoara PhD Report I Patrik Emanuel Mezo Prof. Dr. Ing. Mircea Vladutiu PhD Student PhD Coordinator ABSTRACT This PhD Report describes the research activity carried on as part

More information

Carsten Griwodz, Thomas Plagemann, Ralf Steinmetz 2nd July 2004

Carsten Griwodz, Thomas Plagemann, Ralf Steinmetz 2nd July 2004 Content Distribution Infrastructures Carsten Griwodz, Thomas Plagemann, Ralf Steinmetz 2nd July 2004 1 Public Outreach Since the early days of the world wide web (WWW), the information infrastructure provided

More information

Using Peer to Peer Dynamic Querying in Grid Information Services

Using Peer to Peer Dynamic Querying in Grid Information Services Using Peer to Peer Dynamic Querying in Grid Information Services Domenico Talia and Paolo Trunfio DEIS University of Calabria HPC 2008 July 2, 2008 Cetraro, Italy Using P2P for Large scale Grid Information

More information

Peer-to-Peer Systems: "A Shared Social Network"

Peer-to-Peer Systems: A Shared Social Network Peer-to-Peer Systems: "A Shared Social Network" Nguyen Hoang Anh Helsinki University of Technology hanguyen@cc.hut.fi Abstract In the last few years, the success of the Napster online music sharing program

More information

Detecting rogue systems

Detecting rogue systems Product Guide Revision A McAfee Rogue System Detection 4.7.1 For use with epolicy Orchestrator 4.6.3-5.0.0 Software Detecting rogue systems Unprotected systems, referred to as rogue systems, are often

More information

A Measurement Study of Peer-to-Peer File Sharing Systems

A Measurement Study of Peer-to-Peer File Sharing Systems CSF641 P2P Computing 點 對 點 計 算 A Measurement Study of Peer-to-Peer File Sharing Systems Stefan Saroiu, P. Krishna Gummadi, and Steven D. Gribble Department of Computer Science and Engineering University

More information

Content Distribution over IP: Developments and Challenges

Content Distribution over IP: Developments and Challenges Content Distribution over IP: Developments and Challenges Adrian Popescu, Blekinge Inst of Technology, Sweden Markus Fiedler, Blekinge Inst of Technology, Sweden Demetres D. Kouvatsos, University of Bradford,

More information

System Requirement Specification for A Distributed Desktop Search and Document Sharing Tool for Local Area Networks

System Requirement Specification for A Distributed Desktop Search and Document Sharing Tool for Local Area Networks System Requirement Specification for A Distributed Desktop Search and Document Sharing Tool for Local Area Networks OnurSoft Onur Tolga Şehitoğlu November 10, 2012 v1.0 Contents 1 Introduction 3 1.1 Purpose..............................

More information

On the Penetration of Business Networks by P2P File Sharing

On the Penetration of Business Networks by P2P File Sharing On the Penetration of Business Networks by P2P File Sharing Kevin Lee School of Computer Science, University of Manchester, Manchester, UK. +44 () 161 2756132 klee@cs.man.ac.uk Danny Hughes Computing,

More information

International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.

International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer. RESEARCH ARTICLE ISSN: 2321-7758 GLOBAL LOAD DISTRIBUTION USING SKIP GRAPH, BATON AND CHORD J.K.JEEVITHA, B.KARTHIKA* Information Technology,PSNA College of Engineering & Technology, Dindigul, India Article

More information

A Network Monitoring System with a Peer-to-Peer Architecture

A Network Monitoring System with a Peer-to-Peer Architecture A Network Monitoring System with a Peer-to-Peer Architecture Paulo Salvador, Rui Valadas University of Aveiro / Institute of Telecommunications Aveiro E-mail: salvador@av.it.pt; rv@det.ua.pt Abstract The

More information

Guidance Regarding Skype and Other P2P VoIP Solutions

Guidance Regarding Skype and Other P2P VoIP Solutions Guidance Regarding Skype and Other P2P VoIP Solutions Ver. 1.1 June 2012 Guidance Regarding Skype and Other P2P VoIP Solutions Scope This paper relates to the use of peer-to-peer (P2P) VoIP protocols,

More information

Advanced Peer to Peer Discovery and Interaction Framework

Advanced Peer to Peer Discovery and Interaction Framework Advanced Peer to Peer Discovery and Interaction Framework Peeyush Tugnawat J.D. Edwards and Company One, Technology Way, Denver, CO 80237 peeyush_tugnawat@jdedwards.com Mohamed E. Fayad Computer Engineering

More information

Middleware and Distributed Systems. Peer-to-Peer Systems. Martin v. Löwis. Montag, 30. Januar 12

Middleware and Distributed Systems. Peer-to-Peer Systems. Martin v. Löwis. Montag, 30. Januar 12 Middleware and Distributed Systems Peer-to-Peer Systems Martin v. Löwis Peer-to-Peer Systems (P2P) Concept of a decentralized large-scale distributed system Large number of networked computers (peers)

More information

How To Understand The Power Of A Content Delivery Network (Cdn)

How To Understand The Power Of A Content Delivery Network (Cdn) Overview 5-44 5-44 Computer Networking 5-64 Lecture 8: Delivering Content Content Delivery Networks Peter Steenkiste Fall 04 www.cs.cmu.edu/~prs/5-44-f4 Web Consistent hashing Peer-to-peer CDN Motivation

More information

EDOS Distribution System: a P2P architecture for open-source content dissemination

EDOS Distribution System: a P2P architecture for open-source content dissemination EDOS Distribution System: a P2P architecture for open-source content Serge Abiteboul 1, Itay Dar 2, Radu Pop 3, Gabriel Vasile 1 and Dan Vodislav 4 1. INRIA Futurs, France {firstname.lastname}@inria.fr

More information

Delft University of Technology Parallel and Distributed Systems Report Series. The Peer-to-Peer Trace Archive: Design and Comparative Trace Analysis

Delft University of Technology Parallel and Distributed Systems Report Series. The Peer-to-Peer Trace Archive: Design and Comparative Trace Analysis Delft University of Technology Parallel and Distributed Systems Report Series The Peer-to-Peer Trace Archive: Design and Comparative Trace Analysis Boxun Zhang, Alexandru Iosup, and Dick Epema {B.Zhang,A.Iosup,D.H.J.Epema}@tudelft.nl

More information

Global Server Load Balancing

Global Server Load Balancing White Paper Overview Many enterprises attempt to scale Web and network capacity by deploying additional servers and increased infrastructure at a single location, but centralized architectures are subject

More information

AdvOSS Session Border Controller

AdvOSS Session Border Controller AdvOSS Session Border Controller Product Data Sheet Find latest copy of this document from http://www.advoss.com/pdf/advoss-sbc-productdatasheet.pdf Copyright AdvOSS.com, 2007-2011 All Rights Reserved

More information

PROPOSAL AND EVALUATION OF A COOPERATIVE MECHANISM FOR HYBRID P2P FILE-SHARING NETWORKS

PROPOSAL AND EVALUATION OF A COOPERATIVE MECHANISM FOR HYBRID P2P FILE-SHARING NETWORKS PROPOSAL AND EVALUATION OF A COOPERATIVE MECHANISM FOR HYBRID P2P FILE-SHARING NETWORKS Hongye Fu, Naoki Wakamiya, Masayuki Murata Graduate School of Information Science and Technology Osaka University

More information

Client/server and peer-to-peer models: basic concepts

Client/server and peer-to-peer models: basic concepts Client/server and peer-to-peer models: basic concepts Dmitri Moltchanov Department of Communications Engineering Tampere University of Technology moltchan@cs.tut.fi September 04, 2013 Slides provided by

More information

CSCI-1680 CDN & P2P Chen Avin

CSCI-1680 CDN & P2P Chen Avin CSCI-1680 CDN & P2P Chen Avin Based partly on lecture notes by Scott Shenker and John Jannotti androdrigo Fonseca And Computer Networking: A Top Down Approach - 6th edition Last time DNS & DHT Today: P2P

More information

Lecture 6 Content Distribution and BitTorrent

Lecture 6 Content Distribution and BitTorrent ID2210 - Distributed Computing, Peer-to-Peer and GRIDS Lecture 6 Content Distribution and BitTorrent [Based on slides by Cosmin Arad] Today The problem of content distribution A popular solution: BitTorrent

More information

p2p: systems and applications Internet Avanzado, QoS, Multimedia 2006-2007 Carmen Guerrero carmen.guerrero@uc3m.es

p2p: systems and applications Internet Avanzado, QoS, Multimedia 2006-2007 Carmen Guerrero carmen.guerrero@uc3m.es p2p: systems and applications Internet Avanzado, QoS, Multimedia 2006-2007 Carmen Guerrero carmen.guerrero@uc3m.es Dpto. Ingeniería Telemática Index Introduction Taxonomy Classification of p2p overlay

More information

Sync Security and Privacy Brief

Sync Security and Privacy Brief Introduction Security and privacy are two of the leading issues for users when transferring important files. Keeping data on-premises makes business and IT leaders feel more secure, but comes with technical

More information

Designing a Cloud Storage System

Designing a Cloud Storage System Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes

More information

How to Choose Between Hadoop, NoSQL and RDBMS

How to Choose Between Hadoop, NoSQL and RDBMS How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A

More information

Distributed Systems. 23. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2015

Distributed Systems. 23. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2015 Distributed Systems 23. Content Delivery Networks (CDN) Paul Krzyzanowski Rutgers University Fall 2015 November 17, 2015 2014-2015 Paul Krzyzanowski 1 Motivation Serving web content from one location presents

More information

Intelligent Content Delivery Network (CDN) The New Generation of High-Quality Network

Intelligent Content Delivery Network (CDN) The New Generation of High-Quality Network White paper Intelligent Content Delivery Network (CDN) The New Generation of High-Quality Network July 2001 Executive Summary Rich media content like audio and video streaming over the Internet is becoming

More information

On the Penetration of Business Networks by P2P File Sharing

On the Penetration of Business Networks by P2P File Sharing On the Penetration of Business Networks by P2P File Sharing Kevin Lee School of Computer Science, University of Manchester, Manchester, M13 9PL, UK. +44 (0) 161 2756132 klee@cs.man.ac.uk Danny Hughes Computing,

More information

PEER-TO-PEER (P2P) systems have emerged as an appealing

PEER-TO-PEER (P2P) systems have emerged as an appealing IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 21, NO. 4, APRIL 2009 595 Histogram-Based Global Load Balancing in Structured Peer-to-Peer Systems Quang Hieu Vu, Member, IEEE, Beng Chin Ooi,

More information

Copyright www.agileload.com 1

Copyright www.agileload.com 1 Copyright www.agileload.com 1 INTRODUCTION Performance testing is a complex activity where dozens of factors contribute to its success and effective usage of all those factors is necessary to get the accurate

More information

Approximate Object Location and Spam Filtering on Peer-to-Peer Systems

Approximate Object Location and Spam Filtering on Peer-to-Peer Systems Approximate Object Location and Spam Filtering on Peer-to-Peer Systems Feng Zhou, Li Zhuang, Ben Y. Zhao, Ling Huang, Anthony D. Joseph and John D. Kubiatowicz University of California, Berkeley The Problem

More information

D1.1 Service Discovery system: Load balancing mechanisms

D1.1 Service Discovery system: Load balancing mechanisms D1.1 Service Discovery system: Load balancing mechanisms VERSION 1.0 DATE 2011 EDITORIAL MANAGER Eddy Caron AUTHORS STAFF Eddy Caron, Cédric Tedeschi Copyright ANR SPADES. 08-ANR-SEGI-025. Contents Introduction

More information

Improving Query Processing Performance in Large Distributed Database Management Systems

Improving Query Processing Performance in Large Distributed Database Management Systems Norvald H. Ryeng Improving Query Processing Performance in Large Distributed Database Management Systems Thesis for the degree of Philosophiae Doctor Trondheim, November 2011 Norwegian University of Science

More information

Optimizing and Balancing Load in Fully Distributed P2P File Sharing Systems

Optimizing and Balancing Load in Fully Distributed P2P File Sharing Systems Optimizing and Balancing Load in Fully Distributed P2P File Sharing Systems (Scalable and Efficient Keyword Searching) Anh-Tuan Gai INRIA Rocquencourt anh-tuan.gai@inria.fr Laurent Viennot INRIA Rocquencourt

More information

Characterizing the Query Behavior in Peer-to-Peer File Sharing Systems*

Characterizing the Query Behavior in Peer-to-Peer File Sharing Systems* Characterizing the Query Behavior in Peer-to-Peer File Sharing Systems* Alexander Klemm a Christoph Lindemann a Mary K. Vernon b Oliver P. Waldhorst a ABSTRACT This paper characterizes the query behavior

More information

Online Transaction Processing in SQL Server 2008

Online Transaction Processing in SQL Server 2008 Online Transaction Processing in SQL Server 2008 White Paper Published: August 2007 Updated: July 2008 Summary: Microsoft SQL Server 2008 provides a database platform that is optimized for today s applications,

More information

A Topology-Aware Relay Lookup Scheme for P2P VoIP System

A Topology-Aware Relay Lookup Scheme for P2P VoIP System Int. J. Communications, Network and System Sciences, 2010, 3, 119-125 doi:10.4236/ijcns.2010.32018 Published Online February 2010 (http://www.scirp.org/journal/ijcns/). A Topology-Aware Relay Lookup Scheme

More information

Parallel Processing over Mobile Ad Hoc Networks of Handheld Machines

Parallel Processing over Mobile Ad Hoc Networks of Handheld Machines Parallel Processing over Mobile Ad Hoc Networks of Handheld Machines Michael J Jipping Department of Computer Science Hope College Holland, MI 49423 jipping@cs.hope.edu Gary Lewandowski Department of Mathematics

More information

Optimizing Congestion in Peer-to-Peer File Sharing Based on Network Coding

Optimizing Congestion in Peer-to-Peer File Sharing Based on Network Coding International Journal of Emerging Trends in Engineering Research (IJETER), Vol. 3 No.6, Pages : 151-156 (2015) ABSTRACT Optimizing Congestion in Peer-to-Peer File Sharing Based on Network Coding E.ShyamSundhar

More information

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 08: Computer networks Version: March 3, 2011 2 / 53 Contents

More information

THE BITTORRENT P2P FILE-SHARING SYSTEM: MEASUREMENTS AND ANALYSIS. J.A. Pouwelse, P. Garbacki, D.H.J. Epema, H.J. Sips

THE BITTORRENT P2P FILE-SHARING SYSTEM: MEASUREMENTS AND ANALYSIS. J.A. Pouwelse, P. Garbacki, D.H.J. Epema, H.J. Sips THE BITTORRENT P2P FILE-SHARING SYSTEM: MEASUREMENTS AND ANALYSIS J.A. Pouwelse, P. Garbacki, D.H.J. Epema, H.J. Sips Department of Computer Science, Delft University of Technology, the Netherlands j.a.pouwelse@ewi.tudelft.nl

More information

5. Peer-to-peer (P2P) networks

5. Peer-to-peer (P2P) networks 5. Peer-to-peer (P2P) networks PA191: Advanced Computer Networking I. Eva Hladká Slides by: Tomáš Rebok Faculty of Informatics Masaryk University Autumn 2015 Eva Hladká (FI MU) 5. P2P networks Autumn 2015

More information

Peer-to-Peer Networks. Chapter 2: Initial (real world) systems Thorsten Strufe

Peer-to-Peer Networks. Chapter 2: Initial (real world) systems Thorsten Strufe Chapter 2: Initial (real world) systems Thorsten Strufe 1 Chapter Outline Overview of (previously) deployed P2P systems in 3 areas P2P file sharing and content distribution: Napster, Gnutella, KaZaA, BitTorrent

More information

Internet Protocol: IP packet headers. vendredi 18 octobre 13

Internet Protocol: IP packet headers. vendredi 18 octobre 13 Internet Protocol: IP packet headers 1 IPv4 header V L TOS Total Length Identification F Frag TTL Proto Checksum Options Source address Destination address Data (payload) Padding V: Version (IPv4 ; IPv6)

More information

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf

More information

Impact of Peer Incentives on the Dissemination of Polluted Content

Impact of Peer Incentives on the Dissemination of Polluted Content Impact of Peer Incentives on the Dissemination of Polluted Content Fabricio Benevenuto fabricio@dcc.ufmg.br Virgilio Almeida virgilio@dcc.ufmg.br Cristiano Costa krusty@dcc.ufmg.br Jussara Almeida jussara@dcc.ufmg.br

More information

VIA COLLAGE Deployment Guide

VIA COLLAGE Deployment Guide VIA COLLAGE Deployment Guide www.true-collaboration.com Infinite Ways to Collaborate CONTENTS Introduction... 3 User Experience... 3 Pre-Deployment Planning... 3 Connectivity... 3 Network Addressing...

More information