Delft University of Technology Parallel and Distributed Systems Report Series. The Peer-to-Peer Trace Archive: Design and Comparative Trace Analysis

Size: px
Start display at page:

Download "Delft University of Technology Parallel and Distributed Systems Report Series. The Peer-to-Peer Trace Archive: Design and Comparative Trace Analysis"

Transcription

1 Delft University of Technology Parallel and Distributed Systems Report Series The Peer-to-Peer Trace Archive: Design and Comparative Trace Analysis Boxun Zhang, Alexandru Iosup, and Dick Epema Completed April 2. To be submitted after revision report number PDS-2-3 PDS ISSN

2 Published and produced by: Parallel and Distributed Systems Section Faculty of Information Technology and Systems Department of Technical Mathematics and Informatics Delft University of Technology Zuidplantsoen BZ Delft The Netherlands Information about Parallel and Distributed Systems Report Series: reports@pds.ewi.tudelft.nl Information about Parallel and Distributed Systems Section: c 2 Parallel and Distributed Systems Section, Faculty of Information Technology and Systems, Department of Technical Mathematics and Informatics, Delft University of Technology. All rights reserved. No part of this series may be reproduced in any form or by any means without prior written permission of the publisher.

3 Abstract Real-world measurements play a key role in studying the characteristics and improving the design of Peer-to-Peer (P2P) systems. Although many P2P measurements have been carried out in the last decade, few traces are publicly accessible, and the available traces are available online in different formats. This situation hampers researchers in exchanging, studying, and reusing existing traces. As a result, many P2P studies have been based on unrealistic assumptions about the characteristics of P2P systems, and many P2P algorithms and methods still lack a realistic evaluation. To address this problem, in this work we introduce the P2P Trace Archive, which we design as a virtual meeting place for the community to exchange P2P traces. First, we design the Trace Archive, including a single, flexible data format for storing anonymized P2P traces. Using the tools we have developed as part of the Archive, we add to the Archive more than 2 traces collected from 2 P2P communities; the traces capture the characteristics of millions of user sessions between 23 and 2. Second, we make a comparative analysis of traces in the Archive that focuses on content characteristics, peer arrivals and departures, and peer sharing behavior. We find that the characteristics and usage patterns differ significantly among systems and among communities, and that they change significantly over multi-year intervals. Third, we investigate how different methods for identifying peers and sessions in P2P traces may lead to very different analysis results.

4 Contents Introduction 5 2 Requirements for a P2P Trace Archive 6 3 The P2P Trace Archive 6 3. A Unified Trace Format The Archive Design Traces Currently in the Archive 8 4. Community Dataset: SuprNova Community Dataset: PirateBay Community Dataset: FileList.org Community Dataset: LegalTorrents.com Community Dataset: etree.org Community Dataset: tlm-project.org Community Dataset: transamrit.net Community Dataset: unix-ag.uni-kl.de Community Dataset: idsoftware.com Community Dataset: boenielsen.dk Community Dataset: alluvion.org Community Dataset: Gnutella Community Dataset: edonkey Community Dataset: PP Live Community Dataset: Skype A Comparative Trace Analysis 2 5. Content characteristics Peer arrival and departure Bandwidth characteristics Peer Sharing Behavior Identifying Peers and Sessions 3 6. Peer Identification Session Identification Related Work 39 8 Conclusion and Ongoing Work 39 9 Acknowledgements 39 2

5 List of Figures of the file size in 6 traces collected between 23 and 25 (horizontal axis in logarithmic scale) of the file size in 4 communities measured in 25 and of the file popularity in 6 traces collected between 23 and 25 (horizontal axis in logarithmic scale) of the file popularity in 4 communities measured in 25 and 29 (horizontal axis in logarithmic scale) of the (hourly) peer arrival rate in 5 traces collected between 23 and 25 (horizontal axis in logarithmic scale) of the (hourly) peer arrival rate in 4 communities measured in 25 and 29 (horizontal axis in logarithmic scale) of the peer session length in 5 traces collected between 23 and 25 (horizontal axis in logarithmic scale) of the peer session length in 4 traces collected in 29 (horizontal axis in logarithmic scale). 2 9 of the peer session length in 4 communities measured in 25 and 29 (horizontal axis in logarithmic scale) of the peer download speed in 5 traces collected between 23 and of the peer download speed in 4 communities measured in Comparison of the peer upload speed distributions in 4 traces collected in 25 (horizontal axis in logarithmic scale) Comparison of the peer upload speed distributions in 4 communities measured in 29(horizontal axis in logarithmic scale) of the download completion of traces collected between 23 and of the download completion in 4 communities measured in 25 and of the seeding time in 5 traces collected between 23 and 25 (horizontal axis in logarithmic scale) of the seeding time in 4 communities measured in 29 (horizontal axis in logarithmic scale) of the seeding-after-leeching time in 5 traces collected between 23 and 25 (horizontal axis in logarithmic scale) of the seeding-after-leeching time in 4 communities measured in 29 (horizontal axis in logarithmic scale) of peer arrival rate for various peer identification intervals (horizontal axis in logarithmic scale) of session length resulting for various peer identification intervals (horizontal axis in logarithmic scale) of download speed for various peer identification intervals (horizontal axis in logarithmic scale) of peer arrival rate resulting from various session identification intervals (horizontal axis in logarithmic scale) of session length resulting from various session identification intervals (horizontal axis in logarithmic scale) of download speed resulting from various session identification intervals (horizontal axis in logarithmic scale)

6 List of Tables Data format for dynamic peer-level data Summary of the datasets File Size Statistics P-values from KS and AD test for file size distributions Parameters of fitting distributions for file size File Popularity Statistics P-values from KS and AD test for file popularity distributions Parameters of fitting distributions for file popularity Peer Arrival Rate Statistics Peer Arrival Rate Statistics Peer Arrival Rate Statistics Session Length Statistics Session Length Statistics Session Length Statistics Download Speed Statistics Download Speed Statistics Download Speed Statistics Upload Speed Statistics Upload Speed Statistics Upload Speed Statistics Download Completion Statistics Download Completion Statistics Downlod Completion Statistics Seeding Time Statistics Seeding Time Statistics Seeding Time Statistics Seeding-after-Leeching Time Statistics Seeding-after-Leeching Time Statistics Seeding-after-Leeching Time Statistics Hourly Peer Arrival Rate Statistics Session Length Statistics Peer Download Speed Statistics Peer Difference Statistics Peer Difference Statistics Peer Difference Statistics Peer Difference Statistics Hourly Peer Arrival Rate Statistics Session Length Statistics Peer Download Speed Statistics Session Difference Statistics Session Difference Statistics Session Difference Statistics

7 Introduction Peer-to-Peer (P2P) systems have gained phenomenal popularity in the past few years, and several studies [, 2] show that P2P applications generate large amounts of Internet traffic. Measurement data collected from real P2P systems are fundamental for gaining solid knowledge and understanding of the usage patterns and the characteristics of these systems. Thus, measurement data are important for the modeling, the design, and the evaluation of P2P systems. Although many P2P measurements have been carried out in the last decade, few measurement results [6, 2, 5] are publicly available, and for these few the data are presented in different formats. This situation makes it difficult for researchers to exchange, study, and reuse existing traces. Furthermore, due to the lack of available datasets, many P2P studies have been based on unrealistic assumptions about the characteristics and usage patterns of P2P systems, and as a consequence, many P2P algorithms and methods still lack a realistic evaluation. Until now, no effort has been put into making existing P2P traces accessible to the research community. To remedy this situation, in this work we present the P2P Trace Archive (P2PTA): a virtual meeting place that facilitates the collection and exchange of P2P traces. In addition, we perform a comparative analysis of many of the traces in the P2PTA. One of the main benefits of building the P2PTA is that the Archive paves the way for comparative studies of P2P systems, which may help researchers to consider various(types of) P2P systems and to capture their overall characteristics simultaneously, and to discover the long-term evolution in the behavior of P2P systems. Such studies may lead to better knowledge of the commonalities and differences in usage patterns in P2P systems, so that it becomes possible to envisage the usage pattern of a new P2P system by looking at those of similar systems. Another important benefit of the P2PTA is that it complements the current model-based approaches with a trace-based approach. In this way, the hidden patterns that exist in real traces of existing P2P systems will be implicitly used to improve the testing and tuning of P2P systems. In this work, we first present the P2P Trace Archive. The main design goal of the Archive is to facilitate and simplify the exchange of P2P traces. To achieve this goal, we design a unified data format to represent traces in the Archive with three main considerations. First, the data format is designed to fully reflect the structure of P2P systems. Secondly, the data format can be easily extended for new traces, and extending the data format will not affect the traces already stored in the Archive. Thirdly, the data format ensures the anonymization of user information in the Archive. With the tools associated with the data format, we add to the Archive more than 2 traces collected from 2 communities, which capture the characteristics of millions of users between 23 and 29. Besides the unified data format, the Archive also has several software modules for trace collection, anonymization, and processing, respectively. Secondly, we perform a comparative analysis of traces in the Archive, both across multiple P2P systems and across time. The analysis focuses on content characteristics, peer arrivals and departures, peer bandwidth, and peer sharing behavior, respectively. We find that these characteristics differ significantly in different communities, and some characteristics also change dramatically over the years. This result indicates the need to calibrate P2P models and algorithms with a sufficient number of traces. We also investigate how different ways of identifying peers and sessions in traces in the face of dynamic IP-address reassignment influence the analysis results. Our contribution in this work is threefold:. We establish the largest P2P trace archive to date, and adopt a unified data format to represent anonymized traces (Section 3). 2. We conduct a multi-angle comparative trace analysis, and we find that P2P systems differ significantly and evolve rapidly over the years (Section 5). 3. We investigate how different ways of identifying peers and sessions in the face of dynamic IP-address reassignment impacts the results of analyzing P2P traces (Section 6). 5

8 2 Requirements for a P2P Trace Archive In this section, we formulate five requirements for building a P2P trace archive. The first three of these are for designing the data format used to include traces in the Archive, while the last two are for building the actual Archive. Requirement : Trace Archiving. First, the data format used to include traces in the Archive must reflect as much as possible the structure of P2P systems. Thus, a common set of operational levels must be found across P2P systems. Second, because of the complexity and fast evolution of P2P systems, the format must be flexible and extensible, in order to not only support existing, but also future traces. Finally, existing traces in the Archive should not be affected when the data format is extended for new traces. Requirement 2: Trace Comparison. The data format should ease the process of trace comparison. The data format should organize the traces in such a way that it is straightforward for researchers to compare traces collected from different P2P systems, traces collected from same P2P system but in different years, and traces collected with different measurement techniques. Requirement 3: Trace Processing. First, for privacy and ethical reasons, information that can be used to identify users should be anonymized in the Archive. Previous privacy breaches of AOL [5] and NetFlix [9] indicate that simply anonymizing user names is not enough to preserve privacy, as other relevant information can still be used to identify users. Thus, all user-related information in the Archive must be anonymized thoroughly to ensure user privacy. Furthermore, traces originally represented in other formats must be converted into the trace format without losing useful information. Requirement 4: Trace Using. To facilitate the usage of traces, the Archive should provide a set of tools to extract commonly used properties of P2P traces, such as peer arrival rate and bandwidth. The Archive should also provide tools for generating input to P2P simulators. Requirement 5: Trace Sharing. The Archive must host its traces at a place that is accessible for large numbers of users. The Archive should also allow researchers to rank traces and share use cases of these traces. This information will be considered as feedback on and suggestions for improving the Archive, and will provide other prospective trace users extra information about traces in the Archive, helping them to select the appropriate traces for their research. 3 The P2P Trace Archive In this section, we present our P2P Trace Archive (P2PTA). We first introduce the data format of the traces, and then we introduce the main software modules in the Archive. 3. A Unified Trace Format In order to simplify the exchange and reuse of traces, we design a unified data format to represent all the traces in the Archive, and this data format is extended from the format proposed in our previous work [26]. We now introduce the main design features of this data format, and also show how this design addresses Requirements.-3. formulated in Section 2. To address Requirement, in our design the trace data are stored at four different levels, three corresponding each to one of the community, the swarm, and the peer levels, and a fourth level to store data that characterize the interaction between the P2P application and the resources it uses (hard disks, bandwidth, etc.). At each level, we distinguish between static and dynamic data, which are stored separately. As an example, Table shows the format for storing dynamic peer-level data. Each peer event such as starting downloading a file and sending a query message is stored in a record with information identifying the peer, the event type, etc. and one or more data fields. In our experience, three values for each event type are enough for all the traces in the Archive. We keep a separate event mapping table that records for every event type its ID and additional information about the event type. In this way, new event types can be easily added by adding new entries 6

9 into the event mapping table without affecting existing traces in the Archive, which also addresses one part of Requirement 3 of Section 2. Static information like file names and sizes are stored in a similar format, without a time stamp. To address Requirement 2 of Section 2, in our design we distinguish between traces and community datasets. A trace is the result of a single measurement collected from a P2P community. A community dataset is a set of traces collected from the same P2P community by possibly different authors, in different years, and with different measurement techniques. Traces in a community dataset are further grouped by the year when they were collected and by the measurement techniques used to collect them. This design simplifies the study of characteristics of different systems (by comparing different community datasets), the study of the evolution of P2P systems (by comparing traces collected in different years in one community dataset), and also the study of measurement techniques (by comparing traces within one community dataset but collected with different measurement techniques.) To address Requirement 3 of Section 2, in our design we employ user mapping tables, one per trace, to store the relationships between information identifying users (e.g., IP address) and integer user identifiers generated by tools in the Archive. When the user mapping table for a converted dataset is not made public, this approach effectively anonymizes the traces, with the notable drawback of loosing information (e.g., (approximate) geographical location). When converting a trace into our format, all event types in the original trace should first be identified and added to the corresponding event mapping table, and then the data related to actual events are stored in the records for dynamic and static data, respectively. Since the unique identifiers in the mapping tables are stored as integral values, the mapping tables greatly reduce the storage requirements and significantly increase the speed of processing the stored data, especially for dynamic data. Another benefit of using event mapping tables is that the research community can work together to establish such tables with generic event types of the systems, which will further simplify the trace comparison process. Furthermore, since the data volume of most P2P traces is large, the Archives trace format is designed to include initially only the minimal amount of information needed to reproduce accurately the original trace. However, through its extensibility features our format can also store information derived through intensive computation, thus reducing the trace processing efforts of the Archive users. ID Field Description Time Stamp Timestamp when data are collected (only for dynamic data) 2 Swarm ID Unique identifier of the swarm or the group the measured peer belongs to 3 Peer ID Unique identifier of the measured peer 4 Event ID Unique identifier of peer event type 5 ival The integer value of the peer event 6 fval The float value of the peer event 7 sval The string value of the peer event Table : Data format for dynamic peer-level data. 3.2 The Archive Design We envision three main roles for P2PTA members. The contributor is the legal owner of P2P traces, and agrees to offer these traces to the Archive. The archive administrator manages the operation of the P2PTA and helps 7

10 contributors to add and convert traces. The trace user uses the traces in the Archive for their research but does not own these traces. We now introduce the main software modules in the P2PTA collectively, they make the P2PTA meet Requirements 4 and 5 of Section 2. The trace collection module is responsible for collecting traces from contributors. If the collected trace is already anonymized by the contributor, it will be converted into the unified trace format directly by the trace conversion module. Otherwise, the trace will be anonymized by the trace anonymization module, and mapping tables for user relevant information will be sent back to the trace contributor but will not be included in the Archive. The trace processing module provides basic functions to extract common features of P2P systems, like peer bandwidth and content popularity, and the simulator module is designed for generating input for simulators. Both these modules are open to the research community, so that they can be improved by the community for future research. We also invite the community to contribute tools for complex trace analysis to the Archive. Finally, the trace sharing module is responsible for hosting the traces in the Archive and providing space for users to rank and comment on traces. 4 Traces Currently in the Archive In this section we present the traces included in the Archive currently more than 2 heterogeneous traces collected from the BitTorrent, the Gnutella, and the edonkey P2P systems by various researchers. In particular, the Archive includes a rich collection of traces taken from BitTorrent, one of the most popular file-sharing systems. From the community perspective, the BitTorrent traces focus on communities with either general or very specific types of content, and communities that are accessible either to everyone or that are only open to a small number of users and adopt sharing-ratio enforcement. From the community size perspective, these traces have been collected from the largest communities in the world at the time of the data collection down to small communities, both in terms of number of users and number of shared files. Table II gives an overview of all the traces in the P2PTA; many of these traces have not been analyzed before. Besides including traces that we have collected ourselves in the Archive, we have also converted traces collected by others, such as the community dataset edonkey (T3 3 and T3 4) [2, 7], the trace T5 5 (small) [4], and the trace T 3 [6]. The trace T2 4 is a subset of the Gnutella trace collected by [5]; because of time constraints, we currently only include data of out of the 56 days of the trace collected in the original measurement, but we plan to include the rest of the data in the Archive in the near future.below, we describe three community datasets in some detail. 4. Community Dataset: SuprNova Trace (): T 3 This trace was collected from SuprNova during the period between 23 and 24, and it was first studied by [2]. SuprNova was the biggest BitTorrent community at that time and it distributed various types of contents. This trace contains detailed peer level data, which was collected from 2 big swarms during the period between Dec 6, 23and Jan 7, 24, with a sampling interval of 2.5 minutes; in total, 28,423,47sessions were captured. In this trace, peer s IP address, port number, download progress (number of downloaded chunks), and error messages are recorded. 4.2 Community Dataset: PirateBay Trace (): T2 4 This trace was collected from the ThePirateBay during the period between 5 May 25 and May 25, and it was first studied by [3]. ThePiratecommunity distributes vary types of contents. The trace contains peer level data. which was collected from 4, swarms with sampling interval of 2.5 minutes, and in total 35,88,338 8

11 ID Trace description (content type) Period Sampling Files Sessions Traffic T 3 SuprNova, (general) 6 Dec 23 to 7 Jan min 2 28,423,47 n/a T2 5 ThePirateBay, (general) 5- May min 4,8 35,88,338 2 PB/year T3 5 FileList.org, (general) 4 Dec 25 until 4 Apr 26 6 min 3, 2,72,738 n/a T4 5 LegalTorrents.com 22 Mar to 9 Jul 25 5 min 4 n/a 698 GB/day T4 9 (General) 24 Sep 29 to Feb 2 5 min 83 n/a. TB/day T5 5 etree.org 22 Mar to 9 Jul 25 5 min 52 65,68 9 GB/day T5 5 (small) collected by [4] Mar 24 3 min,55 8,584 n/a T5 (Recorded events & meetings) 24 Sep 29 to Feb 2 5 min 45 69, GB/day T6 5 tlm-project.org 22 Mar to 3 Apr 25 min ,7 735 GB/day T6 9 (Linux OS) 24 Sep 29 to Feb 2 min 74 2,529 5 GB/day T7 5 transamrit.net 22 Mar to 9 Jul 25 5 min 4 3, GB/day T7 9 (Slackware OS) 24 Sep 29 to Feb 2 5 min 6 6, 84 GB/day T8 5 unix-ag.uni-kl.de 22 Mar to 9 Jul 25 5 min 279, GB/day T8 9 (Knoppix OS) 24 Sep 29 to Feb 2 5 min 2 6, GB/day T9 5 zerowing.idsoftware.com 22 Mar to 9 Jul 25 5 min 3 48,27 9 GB/day T9 9 (Game demos) 24 Sep 29 to Feb 2 5 min 37 4,697 2 GB/day T 5 boegenielsen.dk (Knoppix OS) 22 Mar to 9 Jul 25 5 min 5 36,39 38 GB/day T 3 alluvion.org, (general) [6] Oct to Jan min,476 73,532 n/a T2 4 Gnutella, (general) Mar 9 24 to Mar n/a 2,896,885 n/a n/a T3 3 edonkey Oct 4 23 to Oct 6 23 n/a,282,42 n/a n/a T3 4 (general) Dec 9 23 to Feb 2 24 n/a 23,965,65 n/a n/a T4 7 PPLive (Streaming & VoD) Jan 27 min n/a 67,5 n/a T5 7 Skype (VoIP) Sep 25 3 min n/a 29,28 n/a Table 2: Summary of the traces in the P2PTA. sessions were captured, which contains peers IP address, port number, client ID, download progress (number of downloaded chunks), and error messages. The estimated annual throughput of this community during that period is 2 PB. 4.3 Community Dataset: FileList.org Trace (): T3 5 The trace T3 6 was collected from Filelist.org during the period from Dec 4, 25 until Apr 4, 26, and it was first studied by [22]. FileList.org is a private BitTorrent community that distributes various types of contents. This community adopts a sharing-ratio enforcement scheme and removes users who do not actively contribute to the community, which is its main difference from most other BitTorrent communities represented in the Archive. This trace contains data collected from 3, swarms this community, and in each swarm peer s ID, download and upload amount, download and upload speed, connectivity, and connected time are recorded, which captures 2,72,738 sessions. At the time when this trace was collected, the FileList.org community had around, members. 4.4 Community Dataset: LegalTorrents.com Traces (2): T4 5, T4 9 T3 5 was collected from the LegalTorrents.com during the period between 22 Mar 25 and 7 Jul 25, and T3 9 has been collected from this community since 24 Sep 29 with 5 minute sampling interval. This community mainly distributes general types of contents. Both datasets only contain community-level data, which is the number of leechers and seeders, total number of completed downloads and traffic of each swarm. And both datasets contain descriptive information of measured torrents including file name, added time, file size, number of files in each torrent and description. 9

12 In 25, 4 swarms were measured and the daily throughput of this community was 698 GB traffic. In 29, 83 swarms until now are measured and the daily throughput of this community is. TB traffic. 4.5 Community Dataset: etree.org Traces (3): T5 5, T5 5(small), T5 9 T5 5 was collected from etree.org during the period between 22 Mar 25 and 7 Jul 25, T5 9 has been colloected from this community since 24 Sep 29 with 5 minute sampling interval, and T5 5(small) was collected in a -day duration in 25 May with 3 minutes sampling interval. Both of T5 5 and T5 9 are collected by the PDS group of TU Delft, and T5 5(small) was collected by [4]. This community mainly distributes recorded events and only provides legal contents. Both datasets only contain swarm level data, which is peer s ip address with last byte blinded, client type, port number, download amount, upload amount, connected time, sharing ratio, download progress, download speed and upload speed of in each swarm. And both datasets contain descriptive information of measured torrents including file name, infohash, added time, file size, number of files in each torrent and torrent description. In 25, 65,68 sessions in 52 swarms were measured and the daily throughput of this community was 9 GB. In 29, until now 69,768 sessions in 45 swarms are measured and the daily throughput of this community is 43 GB traffic. 4.6 Community Dataset: tlm-project.org Traces (2): T6 5, T6 9 T6 5 was collected from tlm-project.org during the period between 22 Mar 25 and 3 Apr 25, and T6 9 has been collected from this community since 24 Sep 29 with minute sampling interval. This community mainly distributes various linux distributions and only provides legal contents. Both datasets contain community level and swarm level data: community level data contains the number of leechers and seeders, total number of completed downloads and traffic of each measured swarm; peer level data contains peer s ip address with last byte blinded, port number, download amount, upload amount, download progress, connected time, sharing ratio in each swarm, and T5 9 also includes peer s download and upload speed. And both datasets contain descriptive information of torrents including file name, infohash, added time, file size, number of files in each torrent. In 25, 49,7 sessions in 264 swarms were measured and the daily throughput of this community was 735 GB. In 29, until now 2,529 sessions in 74 torrents are measured and the daily throughput of this community is 5 GB. 4.7 Community Dataset: transamrit.net Traces (2): T7 5, T7 9 T7 5 was collected from the transamrit.net during the period between 22 Mar 25 and 9 Jul 25, and T7 9 has been collected from this community since 24 Sep 29 with 5 minute sampling interval. This community mainly distributes Slackware linux distributions and only provides legal contents. Both datasets contain community level and swarm level data: community level data contains the number of leechers and seeders, total number of completed downloads and traffic of each measured swarm; peer level data contains ip address with last byte blinded, port number, download amount, upload amount, connected time, sharing ratio, download progress, download speed and upload speed in each measured swarm. And both datasets contain descriptive information of torrents including file name, infohash, added time, file size and number of files in each torrent. In 25, 3,253 sessions in 4 swarms were measured and the daily throughput of this community was 258 GB. In 29, until now 6, sessions in 6 swarms are measured and the daily throughput of this community

13 is 84 GB. 4.8 Community Dataset: unix-ag.uni-kl.de Traces (2): T8 5, T8 9 T8 5 was collected from unix-ag.uni-kl.de during the period between 22 Mar 25 and 9 Jul 25, and T8 9 has been collected from this community since 24 Sep 29 with 5 minute sampling interval. This community mainly distributes Knoppix linux distributions and only provides legal contents. Both datasets contain community level and swarm level data: community level data contains the number of leechers and seeders, total number of completed downloads, total traffic and average download progress of all participating peers of each swarm; peer level data contains peer s ip address with last byte blinded, port number, download amount, upload amount, connected time, sharing ratio, download progress, download speed and upload speed in each measured swarm. And both datasets contain descriptive information of torrents including file name, infohash, added time, file size and number of files in each torrent. In 25, 279,323 sessions in swarms were measured and the daily throughput of this community was 493 GB. In 29, until now 6,522 sessions in 2 swarms are measured and the daily throughput of this community is 348 GB. 4.9 Community Dataset: idsoftware.com Traces (2): T9 5, T9 9 T9 5 was collected from idsoftware.com during the period between 22 Mar 25 and 9 Jul 25, and T9 9 has been collected from this community since 24 Sep 29 with 5 minute sampling interval. This community distributes demos of games from id Software and only provides legal contents. Both datasets contain community level and swarm level data: community level data contains the number of leechers and seeders in each swarm: peer level data contains peer s ip address with last byte blinded, port number, download amount, upload amount, connected time, download progress and sharing ratio in each measured swarm. And both datasets contain descriptive information of torrents including file name, infohash, added time, file size and number of files in each torrent. In 25, 48,27 sessions in 3 swarms were measured and the daily throughput of this community was 9 GB. In 29, until now 4,697 sessions in 37 swarms are measured and the daily throughput of this community is 2 GB. 4. Community Dataset: boenielsen.dk Datasets (2): T 5 T 5 was collected from boegenielsen.dk during the period between 22 Mar 25 and 9 Jul 25 with 5 minute sampling interval. This community mainly distributed Knoppix linux distributions and only provided legal contents. The dataset contains community level and swarm level data: community level data contains the number of leechers and seeders, total number of completed downloads, total traffic and average download progress of all peers of each swarm: peer level data contains peer s ip address with last byte blinded, port number, download amount, upload amount, connected time, download progress and sharing ratio in measured swarms. And the dataset also contains descriptive information of torrents including file name, infohash, added time, file size and number of files in each torrent. In 25, 36,39 sessions in 5 swarms were measured and the daily throughput of this community was 38 GB. 4. Community Dataset: alluvion.org Trace (): T 3

14 The trace T 3 was collected from alluvion.org during the period from Dec 4, 25 until Apr 4, 26, and it was collected by [6]. Alluvion.org is a BitTorrent tracker for users of the Something Awful forums. SA members can upload torrents and anyone can download them. This trace contains data collected from,476 swarms in this community, and in each swarm peer s ID, download and upload amount, download and upload speed, connectivity, and connected time are recorded, which captures 73,532 sessions. 4.2 Community Dataset: Gnutella Trace (): T2 4 The trace T2 4 is a subset of the Gnutella trace collected by [5]; because of time constraints, we currently only include data of out of the 56 days of the trace collected in the original measurement, but we plan to include the rest of the data in the Archive in the near future. 4.3 Community Dataset: edonkey Trace (2): T3 3, T3 4 Trace T3 3and T3 4were collected by [2, 7] in 4-6october 23and 9 december 23-2 february 24, respectively. These traces were collected from a fake client, connecting to other clients, and asking for their lists of files. 4.4 Community Dataset: PP Live Trace: T4 7 The trace T4 7 was collected and studied by [24] in 27 by taking snapshot of PP Live network. The measurement was conducted in two video channels on PP Live for one day, with minute sampling interval. As a result, 67,5 sessions were collected. 4.5 Community Dataset: Skype Trace: T5 5 The trace T5 5 was collected and studied by [] in 25 by pinging the super nodes in Skype network. With 3 minute sampling interval, 29,28 sessions were collected. 5 A Comparative Trace Analysis In this section, we present a comparative analysis of the traces currently in the Archive. Our analysis focuses on content characteristics, peer arrivals and departures, peer bandwidth, and peer sharing behavior, respectively. We show how these characteristics differ across P2P communities and evolve over the years. In the analysis results, IQR stands for the Inter-Quartile Range of a stochastic variable. 5. Content characteristics The size and the popularity of content distributed in P2P systems are basic properties to characterize these systems. We find that the content size distributions differ significantly in P2P communities. In Gnutella and edonkey, more than 7% of the file sizes are less than MB. In contrast, most of the files distributed in most of the BitTorrent communities are much larger, as shown in Figure. We also notice that in some communities, the file size distribution changes dramatically over time and the evolution trend is different among communities: Most of the files distributed in LegalTorrents (T4 5, 9) in 29 were smaller than in 25, while most of the files in etree (T5 5, 9) and id Software (T9 5, 9) were larger in 29 than in 25; the file size distribution 2

15 Gnutella BitTorrent IQR T3 5 T6 5 edonkey T 5 T2 4 T3 4. File Size (MB) Figure : of the file size in 6 traces collected between 23 and 25 (horizontal axis in logarithmic scale). T4 5 T T6 5 T File Size (MB) T5 5 T T9 5 T File Size (MB) Figure 2: of the file size in 4 communities measured in 25 and 29. of tlm-project (T6 5, 9) remained almost unchanged between 25 and 29, as shown in Figure 2. Statistics of the file size in the traces analyzed in this section further support this finding, as shown in Table 3. The file size distributions of most traces can be fit with either Weibull, Log-Normal, or Gamma distributions, but only the file size distribution of edonkey (T3) can be fit with Pareto distribution, as shown in Table 4. Table 5 shows the parameters for each fitting distributions. We also find that the file popularity distributions are very different in P2P communities. Files distributed in most BitTorrent communities are requested by thousands of peers. In contrast more than 8% of the files in edonkey and Gnutella are owned or requested by less than peers, as shown in Figure 3. Similar to file size distribution, the file popularity distributions change significantly over time in some communities, and again its evolution trend is not same among communities. Many of the files distributed in tlm-project (T6 5, 9) and unix-ag.uni-kl (T8 5, 9) were requested by much fewer peers in 29 than in 25, some of the files in id Software (T9 5, 9) were requested by more peers in 29 than in 25, and the file popularity distribution in transamrit (T7 5, 9) remained almost unchanged between 25 and 29, as shown in Figure 4. Statistics of the file popularity in the traces analyzed in this section further support this finding, as shown in Table 6. 3

16 Trace Max Mean StDev Q Median Q3 IQR T3 5 7,963,53, , T4 5, T4, T5 5, T5 9 8,2 6,648 7,75 79,87 5,682 4,89 T6 5 3, T6 9 2, T T T 3 9, T2 4 4, T3 4 4, Table 3: File Size Statistics (MB). Trace Exponential Weibull Pareto Log-Normal Gamma T T T T T T T5 5(s) T T T T T T T T T T T T T T Table 4: P-values from KS and AD test for file size distributions. 4

17 Trace Exp(µ) Wbl(λ, κ) Pareto LogN(µ, σ) Gam(κ, λ) T T T T T T T5 5(s) T T T T T T T T T T T T T T Table 5: Parameters of fitting distributions for file size. Table 7 and Table 8 shows the significance values from GOF test and parameters of fitting distributions for file popularity, respectively. 5.2 Peer arrival and departure The peer arrival rate is one of the key elements to model churn in P2P networks, and we find that it differs significantly across P2P communities. The peer arrival rate in SuprNova (T 3) can reach a few thousand per hour, while in alluvion (T 4) it is less than peers per hour most of the time, as shown in Figure 5. When comparing the peer arrival rate of the same communities in different years, we do not find significant differences in most communities, except that in trasamrit (T7 5, 9) for most of the time the peer arrival rate was lower in 29 than 25, as shown in Figure 6. Statistics of the peer arrival rate in the traces analyzed in this section are shown in Table 9. Trace Max Mean StDev Q Median Q3 IQR T2 5 9, T3 5 3,68 79, T6 5 8, , T6 9 3, , T7 5 49,99 9,34 5, ,374 3,29 T7 9 9,485 7,892 2, ,47 6,294 T8 5 29,978 25,393 35,39 5,794,784 27,579 2,785 T ,98 9,64 42, ,55 9,92 9,845 T9 5 23,493 3,448 5,97 36,445 3,4 3,5 T9 9 9,725 2,8 2, ,289 4,539 3,64 T 4 2, T2 4 9, 2 4 T3 4 5, Table 6: File Popularity Statistics (number of requests per file). 5

18 IQR T3 5 T6 5 T 5 T2 4 T3 4 File Popularity Figure 3: of the file popularity in 6 traces collected between 23 and 25 (horizontal axis in logarithmic scale). T6 5 T T8 5 T File Popularity T7 5 T T9 5 T File Popularity Figure 4: of the file popularity in 4 communities measured in 25 and 29(horizontal axis in logarithmic scale). 6

19 Trace Exponential Weibull Pareto Log-Normal Gamma T T T T T5 5(s) T T T T T T T T T T T T T T Table 7: P-values from KS and AD test for file popularity distributions. Trace Exp(µ) Wbl(λ, κ) Pareto LogN(µ, σ) Gam(κ, λ) T T T T T5 5(s) T T T T T T T T T T T T T T Table 8: Parameters of fitting distributions for file popularity. 7

20 Trace Max Mean StDev Q Median Q3 IQR T 3 2, T3 5 4, T T T T T T T T T T Table 9: Peer Arrival Rate Statistics (number of peers per hour). Trace Exponential Weibull Pareto Log-Normal Gamma T T T T T5 5(s) T T T T T T T T T T T T4 7() T Table : P-values from KS and AD test for arrival rate distributions. Table and Table shows the significance values from GOF test and parameters of fitting distributions for peer arrival rate, respectively. Session length is another important element to model churn in P2P systems. We find that the session length distributions are very different in communities of different types, as shown in Figure 7. We also find that the session length distributions in communities of similar types are very close, as shown in Figure 8. Furthermore, the session length distribution does not change dramatically within one community over the years, as shown in Figure [?]. This result suggests a possible correlation between the session length distribution and the community type. Statistics of the session length in the traces analyzed in this section are shown in Table 2. Table 3 and Table 4 shows the significance values from GOF test and parameters of fitting distributions for session length, respectively. 5.3 Bandwidth characteristics Bandwidth is one of the most frequently investigated properties in empirical P2P studies, as it is closely related to the service capacity of P2P systems. We find that the peer download speed differs significantly across P2P communities, and that the download speed has increased differently over the years in all measured communities, as shown in Figures and, respectively. Statistics of the peer download speed in the traces analyzed in this 8

21 IQR T 3 T3 5 T7 5 T9 5 T 4 Peer Arrival Rate (per hour) Figure 5: of the (hourly) peer arrival rate in 5 traces collected between 23 and 25 (horizontal axis in logarithmic scale). T6 5 T T8 5 T Peer Arrival Rate (hourly) T7 5 T T9 5 T Peer Arrival Rate (hourly) Figure 6: of the (hourly) peer arrival rate in 4 communities measured in 25 and 29 (horizontal axis in logarithmic scale). 9

22 IQR T 3 T3 5 T7 5 T9 5 T 4 Session Length (min) Figure 7: of the peer session length in 5 traces collected between 23 and 25 (horizontal axis in logarithmic scale). IQR T6 9 T7 9 T8 9 T9 9 Session Length (min) Figure 8: of the peer session length in 4 traces collected in 29 (horizontal axis in logarithmic scale). 2

The Role and uses of Peer-to-Peer in file-sharing. Computer Communication & Distributed Systems EDA 390

The Role and uses of Peer-to-Peer in file-sharing. Computer Communication & Distributed Systems EDA 390 The Role and uses of Peer-to-Peer in file-sharing Computer Communication & Distributed Systems EDA 390 Jenny Bengtsson Prarthanaa Khokar jenben@dtek.chalmers.se prarthan@dtek.chalmers.se Gothenburg, May

More information

Evaluating the Effectiveness of a BitTorrent-driven DDoS Attack

Evaluating the Effectiveness of a BitTorrent-driven DDoS Attack Evaluating the Effectiveness of a BitTorrent-driven DDoS Attack Jurand Nogiec University of Illinois Fausto Paredes University of Illinois Joana Trindade University of Illinois 1. Introduction BitTorrent

More information

The Bittorrent P2P File-sharing System: Measurements And Analysis J.A. Pouwelse, P. Garbacki, D.H.J. Epema, H.J. Sips Department of Computer Science,

The Bittorrent P2P File-sharing System: Measurements And Analysis J.A. Pouwelse, P. Garbacki, D.H.J. Epema, H.J. Sips Department of Computer Science, The Bittorrent P2P File-sharing System: Measurements And Analysis J.A. Pouwelse, P. Garbacki, D.H.J. Epema, H.J. Sips Department of Computer Science, Delft University of Technology, the Netherlands BitTorrent

More information

Peer-to-Peer Networks. Chapter 2: Initial (real world) systems Thorsten Strufe

Peer-to-Peer Networks. Chapter 2: Initial (real world) systems Thorsten Strufe Chapter 2: Initial (real world) systems Thorsten Strufe 1 Chapter Outline Overview of (previously) deployed P2P systems in 3 areas P2P file sharing and content distribution: Napster, Gnutella, KaZaA, BitTorrent

More information

The BitTorrent Protocol

The BitTorrent Protocol The BitTorrent Protocol Taken from http://www.cs.uiowa.edu/~ghosh/bittorrent.ppt What is BitTorrent? Efficient content distribution system using file swarming. Usually does not perform all the functions

More information

Improving Deployability of Peer-assisted CDN Platform with Incentive

Improving Deployability of Peer-assisted CDN Platform with Incentive Improving Deployability of Peer-assisted CDN Platform with Incentive GLOBECOM 2009 Dec 2, 2009 Tatsuya Mori, Noriaki Kamiyama, Shigeaki Harada, Haruhisa Hasegawa, and Ryoichi Kawahara NTT Service Integration

More information

Online Storage and Content Distribution System at a Large-scale: Peer-assistance and Beyond

Online Storage and Content Distribution System at a Large-scale: Peer-assistance and Beyond Online Storage and Content Distribution System at a Large-scale: Peer-assistance and Beyond Bo Li Email: bli@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science &

More information

Peer-to-Peer (P2P) applications, including both P2P streaming and P2P

Peer-to-Peer (P2P) applications, including both P2P streaming and P2P Description of Working Group Peer-to-Peer (P2P) applications, including both P2P streaming and P2P file-sharing applications, make up a large fraction of traffic in the Internet today. One way to reduce

More information

SE4C03: Computer Networks and Computer Security Last revised: April 03 2005 Name: Nicholas Lake Student Number: 0046314 For: S.

SE4C03: Computer Networks and Computer Security Last revised: April 03 2005 Name: Nicholas Lake Student Number: 0046314 For: S. BitTorrent Technology How and why it works SE4C03: Computer Networks and Computer Security Last revised: April 03 2005 Name: Nicholas Lake Student Number: 0046314 For: S. Kartik Krishnan 1 Abstract BitTorrent

More information

P2P: centralized directory (Napster s Approach)

P2P: centralized directory (Napster s Approach) P2P File Sharing P2P file sharing Example Alice runs P2P client application on her notebook computer Intermittently connects to Internet; gets new IP address for each connection Asks for Hey Jude Application

More information

Should Internet Service Providers Fear Peer-Assisted Content Distribution?

Should Internet Service Providers Fear Peer-Assisted Content Distribution? Should Internet Service Providers Fear Peer-Assisted Content Distribution? Thomas Karagiannis, UC Riverside Pablo Rodriguez, Microsoft Research Cambridge Konstantina Papagiannaki, Intel Research Cambridge

More information

10 Key Things Your VoIP Firewall Should Do. When voice joins applications and data on your network

10 Key Things Your VoIP Firewall Should Do. When voice joins applications and data on your network 10 Key Things Your Firewall Should Do When voice joins applications and data on your network Table of Contents Making the Move to 3 10 Key Things 1 Security is More Than Physical 4 2 Priority Means Clarity

More information

Trace analysis of Tribler BuddyCast. V. Jantet, D. Epema, M. Meulpolder

Trace analysis of Tribler BuddyCast. V. Jantet, D. Epema, M. Meulpolder Trace analysis of Tribler BuddyCast V. Jantet, D. Epema, M. Meulpolder Trace analysis of Tribler BuddyCast Inter ship report in Computer Science Parallel and Distributed Systems group Faculty of Electrical

More information

A Week in the Life of the Most Popular BitTorrent Swarms

A Week in the Life of the Most Popular BitTorrent Swarms A Week in the Life of the Most Popular BitTorrent Swarms Mark Scanlon, Alan Hannaway and Mohand-Tahar Kechadi 1 UCD Centre for Cybercrime Investigation, School of Computer Science & Informatics, University

More information

From Centralization to Distribution: A Comparison of File Sharing Protocols

From Centralization to Distribution: A Comparison of File Sharing Protocols From Centralization to Distribution: A Comparison of File Sharing Protocols Xu Wang, Teng Long and Alan Sussman Department of Computer Science, University of Maryland, College Park, MD, 20742 August, 2015

More information

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing Department of Computer Science Institute for System Architecture, Chair for Computer Networks File Sharing What is file sharing? File sharing is the practice of making files available for other users to

More information

Java Bit Torrent Client

Java Bit Torrent Client Java Bit Torrent Client Hemapani Perera, Eran Chinthaka {hperera, echintha}@cs.indiana.edu Computer Science Department Indiana University Introduction World-wide-web, WWW, is designed to access and download

More information

Peer-to-peer filetransfer protocols and IPv6. János Mohácsi NIIF/HUNGARNET TF-NGN meeting, 1/Oct/2004

Peer-to-peer filetransfer protocols and IPv6. János Mohácsi NIIF/HUNGARNET TF-NGN meeting, 1/Oct/2004 -to-peer filetransfer protocols and IPv6 János Mohácsi NIIF/HUNGARNET TF-NGN meeting, 1/Oct/2004 Motivation IPv6 traffic is

More information

A Measurement of NAT & Firewall Characteristics in Peer to Peer Systems

A Measurement of NAT & Firewall Characteristics in Peer to Peer Systems A Measurement of NAT & Firewall Characteristics in Peer to Peer Systems L. D Acunto, J.A. Pouwelse, and H.J. Sips Department of Computer Science Delft University of Technology, The Netherlands l.dacunto@tudelft.nl

More information

N6Lookup( title ) Client

N6Lookup( title ) Client CS 640: Introduction Networks AdityaAkella Peer-to-Peer Lecture 24 -to Computer p2p Uses Downloading: Searching Centralized Flooding Smarter Routing file of sharing p2p The (Freenet, (Gnutella, flooding

More information

Trace Driven Analysis of the Long Term Evolution of Gnutella Peer-to-Peer Traffic

Trace Driven Analysis of the Long Term Evolution of Gnutella Peer-to-Peer Traffic Trace Driven Analysis of the Long Term Evolution of Gnutella Peer-to-Peer Traffic William Acosta and Surendar Chandra University of Notre Dame, Notre Dame IN, 46556, USA {wacosta,surendar}@cse.nd.edu Abstract.

More information

Towards an Optimized Big Data Processing System

Towards an Optimized Big Data Processing System Towards an Optimized Big Data Processing System The Doctoral Symposium of the IEEE/ACM CCGrid 2013 Delft, The Netherlands Bogdan Ghiţ, Alexandru Iosup, and Dick Epema Parallel and Distributed Systems Group

More information

Comparative Traffic Analysis Study of Popular Applications

Comparative Traffic Analysis Study of Popular Applications Comparative Traffic Analysis Study of Popular Applications Zoltán Móczár and Sándor Molnár High Speed Networks Laboratory Dept. of Telecommunications and Media Informatics Budapest Univ. of Technology

More information

Modeling and Analysis of Bandwidth-Inhomogeneous Swarms in BitTorrent

Modeling and Analysis of Bandwidth-Inhomogeneous Swarms in BitTorrent IEEE P2P'9 - Sept. 9-, 29 Modeling and Analysis of Bandwidth-Inhomogeneous Swarms in BitTorrent M. Meulpolder, J.A. Pouwelse, D.H.J. Epema, H.J. Sips Parallel and Distributed Systems Group Department of

More information

Optimizing Congestion in Peer-to-Peer File Sharing Based on Network Coding

Optimizing Congestion in Peer-to-Peer File Sharing Based on Network Coding International Journal of Emerging Trends in Engineering Research (IJETER), Vol. 3 No.6, Pages : 151-156 (2015) ABSTRACT Optimizing Congestion in Peer-to-Peer File Sharing Based on Network Coding E.ShyamSundhar

More information

AT&T Global Network Client for Windows Product Support Matrix January 29, 2015

AT&T Global Network Client for Windows Product Support Matrix January 29, 2015 AT&T Global Network Client for Windows Product Support Matrix January 29, 2015 Product Support Matrix Following is the Product Support Matrix for the AT&T Global Network Client. See the AT&T Global Network

More information

AUTOMATED AND ADAPTIVE DOWNLOAD SERVICE USING P2P APPROACH IN CLOUD

AUTOMATED AND ADAPTIVE DOWNLOAD SERVICE USING P2P APPROACH IN CLOUD IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 4, Apr 2014, 63-68 Impact Journals AUTOMATED AND ADAPTIVE DOWNLOAD

More information

Multi-service Load Balancing in a Heterogeneous Network with Vertical Handover

Multi-service Load Balancing in a Heterogeneous Network with Vertical Handover 1 Multi-service Load Balancing in a Heterogeneous Network with Vertical Handover Jie Xu, Member, IEEE, Yuming Jiang, Member, IEEE, and Andrew Perkis, Member, IEEE Abstract In this paper we investigate

More information

Lecture 6 Content Distribution and BitTorrent

Lecture 6 Content Distribution and BitTorrent ID2210 - Distributed Computing, Peer-to-Peer and GRIDS Lecture 6 Content Distribution and BitTorrent [Based on slides by Cosmin Arad] Today The problem of content distribution A popular solution: BitTorrent

More information

Simulating a File-Sharing P2P Network

Simulating a File-Sharing P2P Network Simulating a File-Sharing P2P Network Mario T. Schlosser, Tyson E. Condie, and Sepandar D. Kamvar Department of Computer Science Stanford University, Stanford, CA 94305, USA Abstract. Assessing the performance

More information

Hybrid network traffic engineering system (HNTES)

Hybrid network traffic engineering system (HNTES) Hybrid network traffic engineering system (HNTES) Zhenzhen Yan, Zhengyang Liu, Chris Tracy, Malathi Veeraraghavan University of Virginia and ESnet Jan 12-13, 2012 mvee@virginia.edu, ctracy@es.net Project

More information

The Challenges of Stopping Illegal Peer-to-Peer File Sharing

The Challenges of Stopping Illegal Peer-to-Peer File Sharing The Challenges of Stopping Illegal Peer-to-Peer File Sharing Kevin Bauer Dirk Grunwald Douglas Sicker Department of Computer Science University of Colorado Context: The Rise of Peer-to-Peer 1993-2000:

More information

A Measurement Study of Peer-to-Peer File Sharing Systems

A Measurement Study of Peer-to-Peer File Sharing Systems CSF641 P2P Computing 點 對 點 計 算 A Measurement Study of Peer-to-Peer File Sharing Systems Stefan Saroiu, P. Krishna Gummadi, and Steven D. Gribble Department of Computer Science and Engineering University

More information

The Impact of Background Network Traffic on Foreground Network Traffic

The Impact of Background Network Traffic on Foreground Network Traffic The Impact of Background Network Traffic on Foreground Network Traffic George Nychis Information Networking Institute Carnegie Mellon University gnychis@cmu.edu Daniel R. Licata Computer Science Department

More information

Data Deduplication in BitTorrent

Data Deduplication in BitTorrent Data Deduplication in BitTorrent João Pedro Amaral Nunes October 14, 213 Abstract BitTorrent is the most used P2P file sharing platform today, with hundreds of millions of files shared. The system works

More information

Unit 3 - Advanced Internet Architectures

Unit 3 - Advanced Internet Architectures Unit 3 - Advanced Internet Architectures Carlos Borrego Iglesias, Sergi Robles Carlos.Borrego@uab.cat,Sergi.Robles@uab.cat Departament d Enginyeria de la Informació i de les Comunicacions Universitat Autònoma

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

P2P File Sharing: BitTorrent in Detail

P2P File Sharing: BitTorrent in Detail ELT-53206 Peer-to-Peer Networks P2P File Sharing: BitTorrent in Detail Mathieu Devos Tampere University of Technology Department of Electronics & Communications Engineering mathieu.devos@tut.fi TG406 2

More information

The Internet is Flat: A brief history of networking over the next ten years. Don Towsley UMass - Amherst

The Internet is Flat: A brief history of networking over the next ten years. Don Towsley UMass - Amherst The Internet is Flat: A brief history of networking over the next ten years Don Towsley UMass - Amherst 1 What does flat mean? The World Is Flat. A Brief History of the Twenty-First Century, Thomas Friedman

More information

A Comparison of Mobile Peer-to-peer File-sharing Clients

A Comparison of Mobile Peer-to-peer File-sharing Clients 1. ABSTRACT A Comparison of Mobile Peer-to-peer File-sharing Clients Imre Kelényi 1, Péter Ekler 1, Bertalan Forstner 2 PHD Students 1, Assistant Professor 2 Budapest University of Technology and Economics

More information

P2P Filesharing Population Tracking Based on Network Flow Data

P2P Filesharing Population Tracking Based on Network Flow Data P2P Filesharing Population Tracking Based on Network Flow Data Arno Wagner, Thomas Dübendorfer, Lukas Hämmerle, Bernhard Plattner Contact: arno@wagner.name Communication Systems Laboratory Swiss Federal

More information

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS* COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) 2 Fixed Rates Variable Rates FIXED RATES OF THE PAST 25 YEARS AVERAGE RESIDENTIAL MORTGAGE LENDING RATE - 5 YEAR* (Per cent) Year Jan Feb Mar Apr May Jun

More information

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS* COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) 2 Fixed Rates Variable Rates FIXED RATES OF THE PAST 25 YEARS AVERAGE RESIDENTIAL MORTGAGE LENDING RATE - 5 YEAR* (Per cent) Year Jan Feb Mar Apr May Jun

More information

Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop

Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop R. David Idol Department of Computer Science University of North Carolina at Chapel Hill david.idol@unc.edu http://www.cs.unc.edu/~mxrider

More information

Research on Errors of Utilized Bandwidth Measured by NetFlow

Research on Errors of Utilized Bandwidth Measured by NetFlow Research on s of Utilized Bandwidth Measured by NetFlow Haiting Zhu 1, Xiaoguo Zhang 1,2, Wei Ding 1 1 School of Computer Science and Engineering, Southeast University, Nanjing 211189, China 2 Electronic

More information

Modeling an Agent-Based Decentralized File Sharing Network

Modeling an Agent-Based Decentralized File Sharing Network Modeling an Agent-Based Decentralized File Sharing Network Alex Gonopolskiy Benjamin Nash December 18, 2007 Abstract In this paper we propose a distributed file sharing network model. We take inspiration

More information

BitTorrent Peer To Peer File Sharing

BitTorrent Peer To Peer File Sharing BitTorrent Peer To Peer File Sharing CS290F: Networking for Multimedia Mini PhD Major Area Exam I) Introduction Content distribution is an important topic in networking and has been evolving from the start

More information

P2P Node Setup Guide Authored by: Unitsa Sungket, Prince of Songkla University, Thailand Darran Nathan, APBioNet

P2P Node Setup Guide Authored by: Unitsa Sungket, Prince of Songkla University, Thailand Darran Nathan, APBioNet Automatic Synchronization and Distribution of Biological Databases and Software over Low-Bandwidth Networks among Developing Countries P2P Node Setup Guide Authored by: Unitsa Sungket, Prince of Songkla

More information

CSCI-1680 CDN & P2P Chen Avin

CSCI-1680 CDN & P2P Chen Avin CSCI-1680 CDN & P2P Chen Avin Based partly on lecture notes by Scott Shenker and John Jannotti androdrigo Fonseca And Computer Networking: A Top Down Approach - 6th edition Last time DNS & DHT Today: P2P

More information

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Beth Plale Indiana University plale@cs.indiana.edu LEAD TR 001, V3.0 V3.0 dated January 24, 2007 V2.0 dated August

More information

The Algorithm of Sharing Incomplete Data in Decentralized P2P

The Algorithm of Sharing Incomplete Data in Decentralized P2P IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.8, August 2007 149 The Algorithm of Sharing Incomplete Data in Decentralized P2P Jin-Wook Seo, Dong-Kyun Kim, Hyun-Chul Kim,

More information

An apparatus for P2P classification in Netflow traces

An apparatus for P2P classification in Netflow traces An apparatus for P2P classification in Netflow traces Andrew M Gossett, Ioannis Papapanagiotou and Michael Devetsikiotis Electrical and Computer Engineering, North Carolina State University, Raleigh, USA

More information

Advanced Application-Level Crawling Technique for Popular Filesharing Systems

Advanced Application-Level Crawling Technique for Popular Filesharing Systems Advanced Application-Level Crawling Technique for Popular Filesharing Systems Ivan Dedinski and Hermann de Meer University of Passau, Faculty of Computer Science and Mathematics 94030 Passau, Germany {dedinski,

More information

Energy Adaptive Mechanism for P2P File Sharing Protocols

Energy Adaptive Mechanism for P2P File Sharing Protocols Energy Adaptive Mechanism for P2P File Sharing Protocols Mayank Raj 1, Krishna Kant 2, and Sajal K. Das 1 1 Center for Research in Wireless Mobility and Networking (CReWMaN), Department of Computer Science

More information

NBN Co Customer Collaboration Forum Wireless Focus Session

NBN Co Customer Collaboration Forum Wireless Focus Session NBN Co Customer Collaboration Forum Wireless Focus Session Presenters: Stephen Wright, Roy Brown, Merv Chessells Disclaimer This document sets out NBN Co s proposals in respect of certain aspects of the

More information

Peer-to-Peer Networks Organization and Introduction 1st Week

Peer-to-Peer Networks Organization and Introduction 1st Week Peer-to-Peer Networks Organization and Introduction 1st Week Department of Computer Science 1 Peer-to-Peer Networks Organization 2 2 Web & Dates Web page http://cone.informatik.uni-freiburg.de/lehre/vorlesung/

More information

How To Scale A Complex Big Data Workflow

How To Scale A Complex Big Data Workflow V for Vicissitude: The Challenge of Scaling Complex Big Data Workflows Bogdan Ghiț, Mihai Capotă, Tim Hegeman, Jan Hidders, Dick Epema, and Alexandru Iosup Parallel and Distributed Systems Group, Delft

More information

Bandwidth Modeling in Large Distributed Systems for Big Data Applications

Bandwidth Modeling in Large Distributed Systems for Big Data Applications Bandwidth Modeling in Large Distributed Systems for Big Data Applications Bahman Javadi School of Computing, Engineering and Mathematics University of Western Sydney, Australia Email: b.javadi@uws.edu.au

More information

Estimating user interaction strength in online networks

Estimating user interaction strength in online networks Delft University of Technology Parallel and Distributed Systems Report Series Estimating user interaction strength in online networks Adele L. Jia, Boudewijn Schoon, Johan A. Pouwelse, Dick H.J. Epema

More information

FS2You: Peer-Assisted Semi-Persistent Online Storage at a Large Scale

FS2You: Peer-Assisted Semi-Persistent Online Storage at a Large Scale FS2You: Peer-Assisted Semi-Persistent Online Storage at a Large Scale Ye Sun +, Fangming Liu +, Bo Li +, Baochun Li*, and Xinyan Zhang # Email: lfxad@cse.ust.hk + Hong Kong University of Science & Technology

More information

GRAPHALYTICS http://bl.ocks.org/mbostock/4062045 A Big Data Benchmark for Graph-Processing Platforms

GRAPHALYTICS http://bl.ocks.org/mbostock/4062045 A Big Data Benchmark for Graph-Processing Platforms GRAPHALYTICS http://bl.ocks.org/mbostock/4062045 A Big Data Benchmark for Graph-Processing Platforms Mihai Capotã, Yong Guo, Ana Lucia Varbanescu, Tim Hegeman, Jorai Rijsdijk, Alexandru Iosup, GRAPHALYTICS

More information

A Fuzzy Approach for Reputation Management using Voting Scheme in Bittorrent P2P Network

A Fuzzy Approach for Reputation Management using Voting Scheme in Bittorrent P2P Network A Fuzzy Approach for Reputation Management using Voting Scheme in Bittorrent P2P Network Ansuman Mahapatra, Nachiketa Tarasia School of Computer Engineering KIIT University, Bhubaneswar, Orissa, India

More information

Stability of QOS. Avinash Varadarajan, Subhransu Maji {avinash,smaji}@cs.berkeley.edu

Stability of QOS. Avinash Varadarajan, Subhransu Maji {avinash,smaji}@cs.berkeley.edu Stability of QOS Avinash Varadarajan, Subhransu Maji {avinash,smaji}@cs.berkeley.edu Abstract Given a choice between two services, rest of the things being equal, it is natural to prefer the one with more

More information

Technical report: An Estimate of Infringing Use of the Internet

Technical report: An Estimate of Infringing Use of the Internet Technical report: An Estimate of Infringing Use of the Internet January 2011 Version 1.8 Envisional Ltd, Betjeman House, 104 Hills Road, Cambridge, CB2 1LQ Telephone: +44 1223 372 400 www.envisional.com

More information

How To Analyse The Edonkey 2000 File Sharing Network

How To Analyse The Edonkey 2000 File Sharing Network The edonkey File-Sharing Network Oliver Heckmann, Axel Bock, Andreas Mauthe, Ralf Steinmetz Multimedia Kommunikation (KOM) Technische Universität Darmstadt Merckstr. 25, 64293 Darmstadt (heckmann, bock,

More information

How To Monitor A Network On A Network With Bro (Networking) On A Pc Or Mac Or Ipad (Netware) On Your Computer Or Ipa (Network) On An Ipa Or Ipac (Netrope) On

How To Monitor A Network On A Network With Bro (Networking) On A Pc Or Mac Or Ipad (Netware) On Your Computer Or Ipa (Network) On An Ipa Or Ipac (Netrope) On Michel Laterman We have a monitor set up that receives a mirror from the edge routers Monitor uses an ENDACE DAG 8.1SX card (10Gbps) & Bro to record connection level info about network usage Can t simply

More information

Characterizing the Query Behavior in Peer-to-Peer File Sharing Systems*

Characterizing the Query Behavior in Peer-to-Peer File Sharing Systems* Characterizing the Query Behavior in Peer-to-Peer File Sharing Systems* Alexander Klemm a Christoph Lindemann a Mary K. Vernon b Oliver P. Waldhorst a ABSTRACT This paper characterizes the query behavior

More information

Understanding the Roles of Servers in Large-scale Peer-Assisted Online Storage Systems

Understanding the Roles of Servers in Large-scale Peer-Assisted Online Storage Systems Understanding the Roles of Servers in Large-scale Peer-Assisted Online Storage Systems Fangming Liu, Ye Sun, Bo Li, Xinyan Zhang Hong Kong University of Science & Technology, Roxbeam Inc. Abstract Online

More information

Multimedia transmission in a managed P2P network: making sense?

Multimedia transmission in a managed P2P network: making sense? Multimedia transmission in a managed P2P network: making sense? L. Xu 1, A. Ksentini 2, K. Singh 1, G. Rubino 1, G. Straub 3, Y. L Azou 4 1 INRIA Rennes - Bretagne Atlantique, Rennes, France; 2 IRISA-University

More information

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Michael Bauer, Srinivasan Ravichandran University of Wisconsin-Madison Department of Computer Sciences {bauer, srini}@cs.wisc.edu

More information

Attacking a Swarm with a Band of Liars evaluating the impact of attacks on BitTorrent

Attacking a Swarm with a Band of Liars evaluating the impact of attacks on BitTorrent Attacking a Swarm with a Band of Liars evaluating the impact of attacks on BitTorrent Marlom A. Konrath, Marinho P. Barcellos, Rodrigo B. Mansilha PIPCA Programa de Pós-Graduação em Computação Aplicada

More information

Infographics in the Classroom: Using Data Visualization to Engage in Scientific Practices

Infographics in the Classroom: Using Data Visualization to Engage in Scientific Practices Infographics in the Classroom: Using Data Visualization to Engage in Scientific Practices Activity 4: Graphing and Interpreting Data In Activity 4, the class will compare different ways to graph the exact

More information

Delft University of Technology Parallel and Distributed Systems Report Series. Understanding User Behavior in Spotify

Delft University of Technology Parallel and Distributed Systems Report Series. Understanding User Behavior in Spotify Delft University of Technology Parallel and Distributed Systems Report Series Understanding User Behavior in Spotify Boxun Zhang, Gunnar Kreitz, Marcus Isaksson, Javier Ubillos {B.Zhang}@tudelft.nl Guido

More information

Peer-to-Peer Systems: "A Shared Social Network"

Peer-to-Peer Systems: A Shared Social Network Peer-to-Peer Systems: "A Shared Social Network" Nguyen Hoang Anh Helsinki University of Technology hanguyen@cc.hut.fi Abstract In the last few years, the success of the Napster online music sharing program

More information

FS2You: Peer-Assisted Semipersistent Online Hosting at a Large Scale

FS2You: Peer-Assisted Semipersistent Online Hosting at a Large Scale IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 21, NO. X, XXX 2010 1 FS2You: Peer-Assisted Semipersistent Online Hosting at a Large Scale Fangming Liu, Student Member, IEEE, Ye Sun, Bo Li,

More information

P2P File Sharing - A Model For Fairness Versus Performance

P2P File Sharing - A Model For Fairness Versus Performance 1 The Design Trade-offs of BitTorrent-like File Sharing Protocols Bin Fan John C.S. Lui Dah-Ming Chiu Abstract The BitTorrent (BT) file sharing protocol is very popular due to its scalability property

More information

On the Cost of Mining Very Large Open Source Repositories

On the Cost of Mining Very Large Open Source Repositories On the Cost of Mining Very Large Open Source Repositories Sean Banerjee Carnegie Mellon University Bojan Cukic University of North Carolina at Charlotte BIGDSE, Florence 2015 Introduction Issue tracking

More information

TIME EFFICIENT DISTRIBUTED FILE STORAGE AND SHARING USING P2P NETWORK IN CLOUD

TIME EFFICIENT DISTRIBUTED FILE STORAGE AND SHARING USING P2P NETWORK IN CLOUD TIME EFFICIENT DISTRIBUTED FILE STORAGE AND SHARING USING P2P NETWORK IN CLOUD Sapana Kapadnis 1,Prof. Ranjana Dahake 2 Department of Computer Engineering, MET BKC, Adgoan ABSTRACT In cloud computing most

More information

Finding Anomalies in Windows Event Logs Using Standard Deviation

Finding Anomalies in Windows Event Logs Using Standard Deviation Finding Anomalies in Windows Event Logs Using Standard Deviation John Dwyer Department of Computer Science Northern Kentucky University Highland Heights, KY 41099, USA dwyerj1@nku.edu Traian Marius Truta

More information

Figure 1. The cloud scales: Amazon EC2 growth [2].

Figure 1. The cloud scales: Amazon EC2 growth [2]. - Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 shinji10343@hotmail.com, kwang@cs.nctu.edu.tw Abstract One of the most important issues

More information

Storage Systems Autumn 2009. Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

Storage Systems Autumn 2009. Chapter 6: Distributed Hash Tables and their Applications André Brinkmann Storage Systems Autumn 2009 Chapter 6: Distributed Hash Tables and their Applications André Brinkmann Scaling RAID architectures Using traditional RAID architecture does not scale Adding news disk implies

More information

P2P File Sharing Ratio

P2P File Sharing Ratio Free-riding, Fairness, and Firewalls in P2P File-Sharing J.J.D. Mol, J.A. Pouwelse, D.H.J. Epema, and H.J. Sips Department of Computer Science Delft University of Technology P.O. Box 53, 26 GA Delft, The

More information

Author's personal copy

Author's personal copy Computer Communications 35 (202) 004 06 Contents lists available at SciVerse ScienceDirect Computer Communications journal homepage: www.elsevier.com/locate/comcom Characterizing SopCast client behavior

More information

Traffic Localization for DHT-based BitTorrent networks

Traffic Localization for DHT-based BitTorrent networks Traffic Localization for DHT-based BitTorrent networks Moritz Steiner and Matteo Varvello 1 OUTLINE Motivation DHT Traffic Localization DHT Traffic Localization in Action! Evaluation Conclusion Motivation

More information

Analysis One Code Desc. Transaction Amount. Fiscal Period

Analysis One Code Desc. Transaction Amount. Fiscal Period Analysis One Code Desc Transaction Amount Fiscal Period 57.63 Oct-12 12.13 Oct-12-38.90 Oct-12-773.00 Oct-12-800.00 Oct-12-187.00 Oct-12-82.00 Oct-12-82.00 Oct-12-110.00 Oct-12-1115.25 Oct-12-71.00 Oct-12-41.00

More information

ideas from RisCura s research team

ideas from RisCura s research team ideas from RisCura s research team thinknotes april 2004 A Closer Look at Risk-adjusted Performance Measures When analysing risk, we look at the factors that may cause retirement funds to fail in meeting

More information

Proceedings of the Federated Conference on Computer Science and Information Systems pp. 737 741

Proceedings of the Federated Conference on Computer Science and Information Systems pp. 737 741 Proceedings of the Federated Conference on Computer Science and Information Systems pp. 737 741 ISBN 978-83-60810-22-4 DCFMS: A Chunk-Based Distributed File System for Supporting Multimedia Communication

More information

Technology Update White Paper. High Speed RAID 6. Powered by Custom ASIC Parity Chips

Technology Update White Paper. High Speed RAID 6. Powered by Custom ASIC Parity Chips Technology Update White Paper High Speed RAID 6 Powered by Custom ASIC Parity Chips High Speed RAID 6 Powered by Custom ASIC Parity Chips Why High Speed RAID 6? Winchester Systems has developed High Speed

More information

Internet2 NetFlow Weekly Reports

Internet2 NetFlow Weekly Reports Internet2 NetFlow Weekly Reports Stanislav Shalunov Internet2 Fall Member Meeting, Indianapolis, 2003-10-13 What is NetFlow? Originally a Cisco proprietary technology Now supported by other vendors and

More information

Industry Environment and Concepts for Forecasting 1

Industry Environment and Concepts for Forecasting 1 Table of Contents Industry Environment and Concepts for Forecasting 1 Forecasting Methods Overview...2 Multilevel Forecasting...3 Demand Forecasting...4 Integrating Information...5 Simplifying the Forecast...6

More information

Altix Usage and Application Programming. Welcome and Introduction

Altix Usage and Application Programming. Welcome and Introduction Zentrum für Informationsdienste und Hochleistungsrechnen Altix Usage and Application Programming Welcome and Introduction Zellescher Weg 12 Tel. +49 351-463 - 35450 Dresden, November 30th 2005 Wolfgang

More information

SEO Presentation. Asenyo Inc.

SEO Presentation. Asenyo Inc. SEO Presentation What is Search Engine Optimization? Search Engine Optimization (SEO) : PPC and Organic Results Pay Per Click Ads The means of achieving top search engine results without having to incur

More information

Virtual Desktops Security Test Report

Virtual Desktops Security Test Report Virtual Desktops Security Test Report A test commissioned by Kaspersky Lab and performed by AV-TEST GmbH Date of the report: May 19 th, 214 Executive Summary AV-TEST performed a comparative review (January

More information

ABSTRACT Acknowledgments List of Abbreviations Contents ABSTRACT 3 Acknowledgments 5 List of Abbreviations 7 List of Figures 15 List of Tables 23 1 Introduction 25 2 Motivation and background 29 3 Overview

More information

How To Predict Bittorrent Eta

How To Predict Bittorrent Eta University of Warsaw Faculty of Mathematics, Computer Science and Mechanics VU University Amsterdam Faculty of Sciences Joint Master of Science Programme Piotr Powałowski Student no. 209403 (UW), 1735543

More information

Case 2:08-cv-02463-ABC-E Document 1-4 Filed 04/15/2008 Page 1 of 138. Exhibit 8

Case 2:08-cv-02463-ABC-E Document 1-4 Filed 04/15/2008 Page 1 of 138. Exhibit 8 Case 2:08-cv-02463-ABC-E Document 1-4 Filed 04/15/2008 Page 1 of 138 Exhibit 8 Case 2:08-cv-02463-ABC-E Document 1-4 Filed 04/15/2008 Page 2 of 138 Domain Name: CELLULARVERISON.COM Updated Date: 12-dec-2007

More information

On the Penetration of Business Networks by P2P File Sharing

On the Penetration of Business Networks by P2P File Sharing On the Penetration of Business Networks by P2P File Sharing Kevin Lee School of Computer Science, University of Manchester, Manchester, UK. +44 () 161 2756132 klee@cs.man.ac.uk Danny Hughes Computing,

More information

LOAD BALANCING WITH PARTIAL KNOWLEDGE OF SYSTEM

LOAD BALANCING WITH PARTIAL KNOWLEDGE OF SYSTEM LOAD BALANCING WITH PARTIAL KNOWLEDGE OF SYSTEM IN PEER TO PEER NETWORKS R. Vijayalakshmi and S. Muthu Kumarasamy Dept. of Computer Science & Engineering, S.A. Engineering College Anna University, Chennai,

More information

On the Feasibility of Prefetching and Caching for Online TV Services: A Measurement Study on Hulu

On the Feasibility of Prefetching and Caching for Online TV Services: A Measurement Study on Hulu On the Feasibility of Prefetching and Caching for Online TV Services: A Measurement Study on Hulu Dilip Kumar Krishnappa, Samamon Khemmarat, Lixin Gao, Michael Zink University of Massachusetts Amherst,

More information

Searching for Malware in BitTorrent

Searching for Malware in BitTorrent Searching for Malware in BitTorrent Andrew D. Berns and Eunjin (EJ) Jung April 24, 2008 Abstract One of the most widely publicized aspects of computer security has been the presence and propagation of

More information