How To Make A Game Of Gnutella A Cooperative Game

Similar documents

BitTorrent Peer To Peer File Sharing

Lecture 6 Content Distribution and BitTorrent

The Role and uses of Peer-to-Peer in file-sharing. Computer Communication & Distributed Systems EDA 390

Peer-to-Peer Networks. Chapter 2: Initial (real world) systems Thorsten Strufe

The BitTorrent Protocol

Modeling and Analysis of Bandwidth-Inhomogeneous Swarms in BitTorrent

Incentives Build Robustness in BitTorrent

The Bittorrent P2P File-sharing System: Measurements And Analysis J.A. Pouwelse, P. Garbacki, D.H.J. Epema, H.J. Sips Department of Computer Science,

N6Lookup( title ) Client

DDoS Vulnerability Analysis of Bittorrent Protocol

How To Accelerate Peer To Peer File Sharing With Social Networks

Peer-to-peer filetransfer protocols and IPv6. János Mohácsi NIIF/HUNGARNET TF-NGN meeting, 1/Oct/2004

Trace Driven Analysis of the Long Term Evolution of Gnutella Peer-to-Peer Traffic

The Algorithm of Sharing Incomplete Data in Decentralized P2P

CS5412: TORRENTS AND TIT-FOR-TAT

Java Bit Torrent Client

Guaranteeing Performance through Fairness in Peer-to-Peer File-Sharing and Streaming Systems. Alex Sherman

P2P File Sharing: BitTorrent in Detail

Evaluating the Effectiveness of a BitTorrent-driven DDoS Attack

SE4C03: Computer Networks and Computer Security Last revised: April Name: Nicholas Lake Student Number: For: S.

Modeling an Agent-Based Decentralized File Sharing Network

The Internet is Flat: A brief history of networking over the next ten years. Don Towsley UMass - Amherst

RESEARCH ISSUES IN PEER-TO-PEER DATA MANAGEMENT

Computation and Economics - Spring 2012 Assignment #3: File Sharing

The Challenges of Stopping Illegal Peer-to-Peer File Sharing

Impact of Peer Incentives on the Dissemination of Polluted Content

Multicast vs. P2P for content distribution

Peer-to-Peer Networks 02: Napster & Gnutella. Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg

Attacking a Swarm with a Band of Liars evaluating the impact of attacks on BitTorrent

A Measurement of NAT & Firewall Characteristics in Peer to Peer Systems

INTRUSION PREVENTION AND EXPERT SYSTEMS

Trace analysis of Tribler BuddyCast. V. Jantet, D. Epema, M. Meulpolder

Balanced Reputation Detective System (BREDS): Proposed Algorithm

BITSTALKER: ACCURATELY AND EFFICIENTLY MONITORING BITTORRENT TRAFFIC

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

Searching for Malware in BitTorrent

Delft University of Technology Parallel and Distributed Systems Report Series. The Peer-to-Peer Trace Archive: Design and Comparative Trace Analysis

A Comparison of Mobile Peer-to-peer File-sharing Clients

Peer-to-Peer Systems: "A Shared Social Network"

How To Test The Speed Of Bittorrent On A Bitt Client And Torrent On A Pc Or Ipad (For Free) On A Microsoft Flash Get 2.5 (For A Free) Computer (For Pc Or Mac) On An Ip

Peer-to-Peer Networks Organization and Introduction 1st Week

P2P: centralized directory (Napster s Approach)

CSCI-1680 CDN & P2P Chen Avin

On the Penetration of Business Networks by P2P File Sharing

CNT5106C Project Description

PEER-TO-PEER NETWORK

On the Penetration of Business Networks by P2P File Sharing

Simulating a File-Sharing P2P Network

A Measurement Study of Peer-to-Peer File Sharing Systems

A Week in the Life of the Most Popular BitTorrent Swarms

Peer-to-Peer Data Management

BitTorrent File Sharing in Mobile Ad-hoc Networks

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution

Clustering in Peer-to-Peer File Sharing Workloads

Project Orwell: Distributed Document Integrity Verification

Computers and Media: P2P and Business Models CSCI 1200 COMPUTERS & MEDIA, JAREK SZLICHTA

Peer-to-Peer Networks

Dynamic File Bundling for Large-scale Content Distribution

Using Two-sided Market Approach. to Analyze the Peer-to-peer Economy 1

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Analysis of the Traffic on the Gnutella Network

THE BITTORRENT P2P FILE-SHARING SYSTEM: MEASUREMENTS AND ANALYSIS. J.A. Pouwelse, P. Garbacki, D.H.J. Epema, H.J. Sips

One hop Reputations for Peer to Peer File Sharing Workloads

RatFish: A File Sharing Protocol Provably Secure Against Rational Users

A Torrent Recommender based on DHT Crawling

What Are Certificates?

BGP Prefix Hijack: An Empirical Investigation of a Theoretical Effect Masters Project

A block based storage model for remote online backups in a trust no one environment

MODIFIED BITTORRENT PROTOCOL AND ITS APPLICATION IN CLOUD COMPUTING ENVIRONMENT

The Internet and Network Technologies

Improving Deployability of Peer-assisted CDN Platform with Incentive

How To Handle Uncooperative Peers On P2P Live Streaming On A Live Stream On A Pc Or Mac Or Ipa (For Free) On A Free Rider (For A Free Ride) On Pc Or Ipad (For An Unco

A reputation-based trust management in peer-to-peer network systems

From Centralization to Distribution: A Comparison of File Sharing Protocols

Broadband v Ethernet

THE WINDOWS AZURE PROGRAMMING MODEL

Two-State Options. John Norstad. January 12, 1999 Updated: November 3, 2011.

Leveraging BitTorrent for End Host Measurements

Super-Agent Based Reputation Management with a Practical Reward Mechanism in Decentralized Systems

LiveSwarms: Adapting BitTorrent for end host multicast

Developing P2P Protocols across NAT

Managing Security Risks in Modern IT Networks

Seminar RVS MC-FTP (Multicast File Transfer Protocol): Simulation and Comparison with BitTorrent

Peer-to-Peer: an Enabling Technology for Next-Generation E-learning

PEER TO PEER FILE SHARING USING NETWORK CODING

Masters of Science in Information Technology

TIME EFFICIENT DISTRIBUTED FILE STORAGE AND SHARING USING P2P NETWORK IN CLOUD

Revisiting P2P content sharing in wireless ad hoc networks

RatFish: A File Sharing Protocol Provably Secure Against Rational Users

WINDOWS AZURE AND ISVS

How To Analyse The Edonkey 2000 File Sharing Network

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

I d Rather Stay Stupid: The Advantage of Having Low Utility

P2P File Sharing - A Model For Fairness Versus Performance

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

ANALYSIS OF PEER-TO-PEER TRAFFIC AND USER BEHAVIOUR

How to Turn Your Network into a Strategic Business Asset with Purview EBOOK

You Now Own the Lifetime Giveaway Rights to This Report. You May Give Away This Report

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

Transcription:

CS 186 Lecture 3 P2P File-Sharing David C. Parkes Sven Seuken September 1, 2011 Imagine you need to distribute a software patch to 10 Million users. What s an efficient way for doing so? If you are using a peer-to-peer file sharing system, why would users provide any of their upload bandwidth to the community? What are the incentives at play in popular file-sharing systems like Gnutella or BitTorrent? For many applications that require the distribution of files to a very large number of users, peer-to-peer (P2P) file-sharing networks are an attractive alternative to server-based solutions. If the community of file-sharers cooperates appropriately, very high download rates can be achieved at virtually no cost to the injector of the content. We begin this chapter with a brief history of P2P file-sharing, including the rise and fall of Gnutella, and explain why so many file-sharing networks suffer from free riding. We then focus primarily on BitTorrent, the most successful filesharing network to date, with more than 100 Million users worldwide. We describe the BitTorrent protocol in detail, and present some game-theoretic analyses. Finally, we describe multiple attacks on BitTorrent that increase an individual user s performance, and show how the BitTorrent protocol could be improved to increase overall social welfare. 3.1 Introduction to P2P File-Sharing 3.1.1 File Distribution via P2P Networks Some of the most popular Internet-based services are in the online media domain. We can buy MP3s from the itunes store, we can download ebooks from Amazon, and we can watch videos for free on Youtube and Hulu, or for a fee on Netflix. All of these services are based on the clientserver model, which means that the required tasks are clearly separated: the provider of the service, or more generally of a resource, is the server; the requester of the service, or the resource, is the client. Generally, the client is a standard home computer that does not share any of its resources, and only contacts the server whenever it wants to consume its service. Because millions of MP3s are downloaded and videos are watched every day, the server is often a whole network of very powerful machines. In the case of Youtube, Hulu, Netflix, etc., the number of servers easily exceeds multiple thousands. Obviously, these servers are very expensive and not many companies can afford to use such large-scale server networks for the distribution of media. In the late 1990 s, a very different information sharing paradigm has emerged: peer-to-peer (P2P) file-sharing networks. In a P2P network, the separation of servers and clients is removed: each computer can act as both a server and as a client, often simultaneously. Users of standard home computers around the world may connect to each other and form a P2P network. With the appropriate protocol in place, users can exchange (by uploading and downloading) files with each 1

other. All of this generally takes place without any centralized system infrastructure or control. This decentralization leads to three advantages that P2P systems have over server-based systems: Users enjoy a higher level of anonymity and thus privacy. Large files can be distributed much more efficiently (i.e., with much lower costs). P2P systems are more resilient to various forms of failures. P2P file-sharing networks have often received a bad reputation because a majority of files exchanged in these networks are copyrighted materials, and thus, exchanging them with other users is illegal in most countries. However, there are also many examples of legal uses of P2P file-sharing networks. For example, the newest Linux distributions are generally available in P2P file-sharing networks. Another example is the the game World of Warcraft, which is distributed via the BitTorrent P2P file-sharing network instead of letting users download it from a centralized server. Finally, some Internet TV services like Joost stream their video data to the users via P2P networks. In all of these examples, the distributors of the content incur minimal cost for injecting the data into the P2P network (i.e., uploading it at least once), and the users can download it at very high speed, in particular when millions of users try to obtain the content at the same time. Consider Figure 3.1 which displays average download speeds in the P2P file-sharing network BitTorrent as measured in 2003. As you can see, already in 2003, hundreds of peers had download speeds faster than 1,000 kbps, without using expensive infrastructure. This is in stark contrast to server-based solutions, where allowing millions of users simultaneous high-speed access to the same content would be very expensive. Figure 3.1: The average download speeds of a peer in BitTorrent (Pouwelse et al., 2004). 2

3.1.2 Napster: how everything began... The rise of P2P file-sharing networks started in 1999 with the release of the Napster P2P filesharing client, which became widely popular in early 2000 and reached its peak in February 2001 with more than 25 million active users world-wide. Before Napster, users had already shared files via other networks like IRC, USENET, Bulletin boards, etc., but Napster was unique in that it had a very user-friendly interface and it specialized exclusively in the exchange of music in MP3 format. By using a centralized directory server, Napster also made searching content much easier than it had previously been on IRC or USENET. A user who wanted to share MP3s with others connected to the Napster server and added descriptions for all its files to Napster s database. A user wanting to download files could then easily query the server for content. The actual exchange of the files then happened directly between the clients, or peers, thus the term peer-to-peer. Due to legal difficulties, Napster was shut down in July 2001, but many other networks followed with similar models, including Gnutella, FastTrack, edonkey, etc. The Gnutella network deserves special attention, because it was the first truly decentralized P2P file-sharing network, and after Napster was shut down, it was the most widely used P2P file-sharing system for a very long time, with market shares around 40% as of 2007. Because these later P2P networks were completely decentralized, they were also impossible to shut down, despite many different law suits over the years. 3.1.3 The Gnutella Network The basic design and operation of Gnutella is very similar to other P2P file-sharing networks. A user who wants to participate downloads an application (or develops one himself) that adheres to the Gnutella protocol. 1 Depending on the role that a user plays in a particular transaction, we will be speaking about servers and clients, but more generally we will just use the term peers to refer to the computers running the Gnutella application. To join the network, a new peer must first connect to one of several known peers that are almost always online (but that generally do not share any files). These peers then forward a list of IP and port addresses of other Gnutella peers to the new peer. Once connected to the network, a peer can either initiate an interaction by sending a broadcast message, or it can remain passive and simply rebroadcast messages from others. To find a desired file, a peer sends a Query Message describing the content it is looking for. These messages are forwarded (i.e., rebroadcast) from peer-to-peer until a peer with the desired content is found, or until some maximum number of re-broadcasts has occurred. A peer that has the desired file replies to a query message with a Query Response Message, containing that peer s IP and port address, a unique client ID, as well as other information necessary to download the file. These query response messages are propagated backwards along the path that the original query message took, until it reaches the original requester who can then contact the peer who has the file and download it. 3.1.4 Free Riding on Gnutella A central feature of Gnutella and other similar networks is that there is no central server, users are not monitored, and no statistics about user behavior are maintained. Thus, for two peers interacting with each other, every interaction looks just like any other one: it s a one-shot game 1 It is very common to use the same term (e.g., Gnuetlla or BitTorrent) for the file-sharing client, the protocol, as well as the corresponding network. 3

with anonymous players. This is in contrast to the way some users had shared files before the existence of P2P file sharing networks. For example, on Bulletin boards, users often knew each other from prior interactions and there was a strong sense of a close-nit community. Unfortunately, the design features of Gnutella lead to an incentive problem inherent to many of the early file-sharing networks (including Kazaa, edonkey, etc.): the majority of users were free riding : Definition 3.1 (Free Riding). An agent is free riding when he consumes resources without contributing back to the community. Consider the simplified P2P File Sharing Game as shown in Figure 3.2. Each player can either cooperate, i.e., share files with other users in the network, or free ride, i.e., only download files without ever uploading any files. The numbers in the payoff table are just illustrative, but are meant to convey the fact that peers obtain positive utility from downloading a file (here 3), but that they have a small cost for uploading a file (here -1). User may incur a cost for uploading for a variety of reasons: they might have to pay for bandwidth, they might need their upload bandwidth for other applications (e.g., voice-over-ip), they might have a disutility from leaving their computer on when files are being uploaded, or they might be concerned about the legal implications from uploading copyrighted material. Player 1 Player 2 Share Free Ride Share 2, 2 1, 3 Free Ride 3, 1 0, 0 Figure 3.2: The P2P File-Sharing Game: A Prisoner s Dilemma. No matter what the exact benefit-to-cost ratio is for a user, as long as a user obtains positive value from receiving a file, and incurs a cost for providing a file, the resulting game has the same structure as the one displayed in Figure 3.2: it is a Prisoner s dilemma game. As we know from the analysis in Chapter 2, it is a dominant strategy for each player to free ride, and this is the only equilibrium of the game. Thus, it might actually seem surprising that the Gnutella network has worked as well as it has, for more than 10 years now. Considering the incentive structure as described above, the gametheoretic prediction would be that no-one shares any files. And in fact, a study of Gnutella in 2000 confirmed that 66% of the peers shared no files at all (see Figure 3.3). In fact, the top 1 percent of the peers provided 37% of the total files shared. Furthermore, out of those peers that shared files with the community, the top 1 percent provided almost 47% of all query responses, while approximately 63% never provided a query response (most likely because they didn t have any files that other peers were interested in.) Thus, the contributions on Gnutella were heavily skewed, and the large majority of peers was free riding. Given the incentive structure of Gnutella, one might be surprised that the network hasn t collapsed very quickly. Some of the sharing activity in P2P file-sharing networks can be explained by users who leave the default settings of a software client in place, and many clients were configured such that downloaded files would automatically be stored in a folder that was also available for upload afterwards. Other users might simply enjoy sharing files with other people so much, that it outweighs their costs for uploading. 4

Figure 3.3: Rank Ordering of Gnutella Peers by Number of Files Shared (Adar and Huberman, 2000). However, the amount of free riding in Gnutella actually increased over the years! A study conducted in 2005 found that 85% of the peers shared no files. The authors of the study argue that the developers of the various Gnutella clients did not have an incentive to build mechanisms into their software that would prevent free riding, because users would have simply switched to other clients without such restrictions. Other file-sharing networks have tried to introduce incentives for sharing, but with little success. In Kazaa, for example, a client would collect local statistics about how much it had uploaded and downloaded, with this information transmitted to other peers, the idea being, that peers would then give preference to those peers with a better upload to download ratio. However, because this information was stored locally on the clients, it could easily be spoofed, and very quickly Kazaa clients were available where a user could simply enter a number that would be transmitted to the other clients instead of the true number of uploaded kilobytes. Thus, to this day, free riding remains a big problem in Gnutella, leading to a slow but steady decline in the network s performance and market share. 3.1.5 Repeated Games in Gnutella? So far we have described the P2P file-sharing game as a one-shot game. However, in reality, most peers download many files, possibly thousands, over their lifetime. Thus, it might be more appropriate to model the P2P file-sharing as an infinitely repeated game. Now, consider for a moment the folk theorem we discussed in Chapter 2. We saw that for the infinitely repeated prisoner s dilemma game, the tit-for-tat strategy is a Nash equilibrium, and that in the equilibrium, both players always cooperate (for a suitable discount factor δ < 1). Given this positive result, we might hope that suitably-designed file-sharing clients could also implement strategies (like tit-fortat) that leads to cooperative equilibria, and thereby provide agents with an incentive to share. 5

Unfortunately, the problem with the design of early P2P file-sharing clients was that individual peers often only interacted once in their life time, i.e., the rendezvous probability was extremely low. And even if two peers i and j saw each other multiple times, it was unlikely that peer i would have a file that peer j was interested in obtaining right now. This problem was neatly solved by Bram Cohen in 2003, when he introduced a new file-sharing protocol that was based on a completely new paradigm, called BitTorrent. 3.2 BitTorrent: Taking Incentives Seriously At a high level, one of the key differences between BitTorrent and Gnutella is that in BitTorrent, a client that is currently downloading a file, simultaneously uploads pieces of that same file to other peers who are also currently interested in obtaining the same file. On a piece level, the peers that are concurrently downloading the same file are playing a repeated game with each other, and BitTorrent has implemented some version of a tit-for-tat strategy to encourage cooperation in that game. The protocol is designed such that the more a peer uploads to other peers, the faster it will be able to download the same file. Thus, BitTorrent was designed with the goal to provide users with a clear incentive to upload to others, and to discourage free riding. In this section, we describe the BitTorrent protocol in detail, and in the following sections we will analyze its game-theoretic properties to see what incentives are actually at play. 3.2.1 The BitTorrent Protocol: The Basics To participate in BitTorrent, a user must download (or develop) a software application that is compatible with the BitTorrent protocol. There exists a reference client that closely follows the original protocol as specified in 2003, and we will use that when describing the default client. However, many other clients have been developed and released to the public that implement strategies deviating from those of the reference client. In 2011, some of the most popular BitTorrent clients included µtorrent and Vuze. The content that users download via BitTorrent can either be a single file, or a large aggregated collection of files (e.g., all MP3s from a new music album), but for simplicity we will always speak of a file, going forward. The peers that are currently involved in uploading or downloading the same file are the swarm. A file in BitTorrent is divided into pieces, which are further divided into blocks (typically 64-512KB per block). To find content, a user generally goes to a website that maintains a searchable directory of torrents or torrent files. One of the most popular such websites is http://thepiratebay.org/, in particular because it provides access to some of the most popular, yet copyrighted movies currently available. 2 The torrent files are often not hosted by the same website that provides the directory service, but only linked to. The user downloads a.torrent file and opens it with his BitTorrent client. The.torrent file contains all the relevant metadata, including the name and size of the file, as well as a SHA-1 2 We discourage every reader to download copyrighted material. When trying out BitTorrent, please be sure that the content you are downloading is not copyrighted! Downloading copyrighted material is illegal in almost all countries around the world and can have severe consequences, including the shut-down of your Internet connection and multi-million dollar lawsuits. 6

fingerprint of the data blocks. 3 Note that the term.torrent file always refers to the file containing the metadata information, but that the term torrent is used ambiguously. Sometimes, it also refers to the.torrent file, and sometimes it refers to the file (or files) that the user is ultimately interested in downloading. One key element of the metadata contained in the.torrent file is the address of a tracker, which is a computer that acts as a server for this particular torrent, responsible for coordinating the peers in the swarm. The client announces itself to the tracker to obtain a list of peers that are part of the swarm, and the tracker returns a random subset, typically containing 50 active peers of the swarm. The peer re-announces itself to the tracker periodically, usually every 15 to 30 minutes, and also tells the tracker when it is leaving the swarm (i.e., when it becomes inactive). Upon receiving the list of peers from the tracker, a peer can connect to the other peers and start downloading pieces from them. The peers who are currently downloading pieces of a file are called leechers, and the peers who already have the complete file and are still uploading the file are called seeders. Thus, the injection of new content into BitTorrent always begins with one seeder uploading the file to other peers. A peer is only interested in peers that have pieces that it does not currently have itself. A peer a can either initiate a connection to another peer b from the list it received from the tracker, or it can receive a connection request from a peer b that has a in the list of peers that b received from the tracker. Either way, when two peers in a swarm connect for the first time, they become neighbors, and they exchange a bitfield, i.e., a bit array informing each other about which pieces they already have. A peer maintains open connections to all of its neighbors, forming the local neighborhood of that peer, and only closes those connections when itself or the other peer leaves the swarm. Whenever a peer finishes downloading a new piece, it sends per-piece have messages to the other peers in its local neighborhood, who can then update their information. Whenever a peer finishes downloading a new piece, it sends per-piece have messages to the other peers in its neighborhood, who can then update their information. Locally, each peer maintains an estimate of the availability of each piece of the file by counting how many of its neighbors have the piece. When a peer p starts uploading to a leecher l, then l informs p about which blocks l wishes to receive next. The standard strategy is rarest-first, where a leecher tries to download blocks from those pieces that currently have the lowest availability in the swarm. A peer generally only uploads to a small subset of peers in its neighborhood. All peers that it doesn t upload to are called choked, and the few that receive some data are called unchoked, also sometimes called the active set. The number of unchoke slots k is variable, and can be set by the client software. The reference client suggests to set the number of upload slots proportional to upload capacity, however, most implementations use a fixed number (often k = 4). A key aspect of the BitTorrent protocol is determining which peers to unchoke. We will soon see that this aspect is of particular strategic importance in BitTorrent. 3.2.2 The Unchoking Algorithm: Who to upload to? A peer always tries to download from all of the peers in its neighborhood, but is very selective regarding who to upload to, i.e., which client to unchoke. In the reference client, a peer s decision in regard to who to unchoke is based on how much the other peers have uploaded to him in the 3 A digital fingerprint is a concise summary of content with the property that it is computationally hard to generate an alternate content with the same fingerprint. 7

past. To make that decision, each peer considers epochs of 10 seconds each, and the download rate for each neighbor is based on a rolling 20-second average. A peer unchokes the k peers with the highest upload rate, splitting its upload bandwidth equally among these k peers at the so-called equal-split rate. The client makes a new unchoking decision every 10 seconds. The reference client reserves one upload slot as an optimistic unchoking slot. Every 30 seconds, the client unchokes one peer at random from its neighborhood. This serves two goals. First, it helps a client to explore its neighborhood, i.e., find new peers that have higher upload speeds than the peers it currently has unchoked. Second, it helps peers that have just joined the swarm to get off the ground, i.e., to obtain their first pieces, before they have something they can reciprocate with, which is good from a social welfare perspective. 3.2.3 A Tit-for-Tat Analysis BitTorrent is a large game, simultaneously played by hundreds or thousands of peers connected to each other in a swarm through random neighborhoods. Rather than modeling this entire system explicitly, we focus on a simpler repeated Prisoner s Dilemma model. As mentioned above, modeling P2P file-sharing systems like Gnutella as a repeated game is not suitable, because peers rarely interact with each other multiple times during their lifetime. However, the situation is drastically different in BitTorrent. Here, files are split into pieces and pieces into blocks, and leechers are exchanging blocks with each other, not just one, but possibly thousands, thus forming the basis for a repeated game. Of course, a peer is not playing a simple 2- player repeated game, as it is simultaneously connected to many different peers. But note that the low-level interactions in BitTorrent are purely bilateral. As long as two peers have pieces that the other doesn t have, they can continue their bilateral exchange. Effectively, a peer is simultaneously playing a repeated game with many different peers. 4 The unchoking algorithm implemented in the reference client can be interpreted as implementing a kind of tit-for-tat strategy. The unchoking algorithm makes a new unchoking decision every 10 seconds based on average performance of peers over the past 20 seconds. We can indeed model this as a repeated game, where all of the upload activity during one 10 second epoch is modeled as a single action in the stage game. For simplicity, let s assume a client only calculates the other peers upload activity in the last 10 seconds, and let τ t denote a threshold such that, if a peer uploaded more than τ t, then it was among the top 4 uploaders for time period t. In this case, the client will unchoke this peer in the next time period. Thus, the tit-for-tat strategy for a peer i would look like this: If peer j uploaded more than τ t in round t: upload to this peer in round t + 1. If peer j uploaded less than τ t in round t: don t upload to this peer in round t + 1. This is quite analogous to a tit-for-tat strategy: if player j cooperates in round t then i will cooperate in round t + 1, if player j does not cooperate then i will not cooperate. Coupled with the Prisoner s Dilemma style payoffs for this system, we might then expect the reference strategy to form a Nash equilibrium in BitTorrent. Indeed, this strategy does provide another client with some incentive to upload. If client j does not upload to client i then client i will not unchoke client j, except via optimistic unchoking, which happens very rarely. On the other hand, by uploading 4 For now, we do not concern ourselves with further modeling complications, e.g., peers may sometimes not have any pieces that the other peer wants, or a peer may decide to leave the swarm upon completing a file download. 8

to i with high speeds, then j gets unchoked by i. Thus, the general idea of the more you give the more you get is satisfied. However, unlike in the Prisoner s Dilemma game, BitTorrent peers have more than just two actions. They cannot only decide between sharing and not sharing, but they can vary how much bandwidth they upload to each individual peer. However, it turns out that the strategy of the reference client is suboptimal and it is easy to find many attacks, or in other words, better strategies. Thus, using the reference client and sharing is certainly not a Nash equilibrium. In the next section we will explore some of these ways in which BitTorrent can be attacked. 3.3 Attacks on BitTorrent As we have seen in the last section, a BitTorrent client has a very large degree of freedom in making decisions regarding how to interact with other BitTorrent clients. The default strategies are defined via the BitTorrent reference client, but can be improved upon for a client, selfishly. Here is a list of the main decisions a BitTorrent client can make: 1. How often to contact the tracker to receive a list of peers? 2. Which pieces to reveal to which agent? 3. How many unchoking slots to use? 4. Which agent to unchoke? 5. How much upload speed to give to each unchoked agent? 6. What data to upload to each unchoked agent? The preferences of BitTorrent users may vary, where some may prefer to minimize their download time, some may prefer to minimize the upload bandwidth they provide, and others may want to strike a good balance between the two. In the following sections, we will present different attacks on BitTorrent, each addressing one of these two goals, or both, often making different trade-offs between them. 3.3.1 Uploading Garbage Data When a BitTorrent client joins a swarm it has no blocks to upload to other peers and must rely on optimistic unchoking by other peers. Thus, getting started in BitTorrent can be slow, in particular when there are many leechers and few seeders. One attack on BitTorrent that addresses this issue is to upload garbage data. The idea is to let the peer announce to all other peers that it has all pieces of the file, but when those pieces are actually requested, to then upload random data. However, using the SHA-1 fingerprint contained in the.torrent file, clients can detect incorrect data already on the block level, and can thus quickly identify a peer that uploads garbage data. Now, most clients usually ban any traffic coming from a peer that has uploaded garbage data before. Thus, this attack is no longer viable in practice. 9

3.3.2 BitThief: Exploiting Optimistic Unchoking The second attack we consider was implemented and tested via a real-world BitTorrent client called BitThief. This client exploits the optimistic unchoking protocol that most BitTorrent clients follow, and as implemented in the reference client. The goal of BitThief is to download files via BitTorrent without uploading anything in return. Thus, BitThief must rely exclusively on the optimistic unchoking slots of other peers. BitThief does two things differently than the reference client. First, it initially asks the tracker for a list of 200 peers instead of 50, which is the standard. Second, it re-announces itself to the tracker much more frequently than the reference client does (standard is between 15 or 30 minutes). The goal of both these techniques is to grow a client s neighborhood as fast as possible. This is beneficial, because every additional peer in a client s local neighborhood means an additional chance of being optimistically unchoked in the next round. Consider Figure 3.4, which compares the number of open connections in BitThief versus the reference client. We see that BitThief is able to open new connections much faster than the reference client, with a particularly high advantage at the beginning, and thus will receive many more optimistic unchoke slots. Figure 3.4: Number of open connections over time, comparing BitThief with the reference client. (Locher et a., 2006). For typical torrents with a mix of seeders and leechers, BitThief can generally be used to download every torrent successfully, and that the completion time was on average between 2 and 4 times longer than with the reference client. For some torrents, in particular those with small files and a large number of seeders, BitThief can even be slightly faster on average than the reference client, without uploading a single bit of data itself. This is because BitThief s strategy of quickly opening many connections is particularly powerful in the first few minutes of a download process 10

(see Figure 3.4). For small files, the additional optimistic unchoke slots obtained in the first few minutes may be enough to outweigh the lost reciprocation due to not uploading, such that the total download time is reduced. Thus, BitThief shows an obvious vulnerability of the optimistic unchoking strategy of the reference client. However, the exploit relies on the fact that trackers are not very careful about this kind of behavior and don t protect against it. It would not be too complicated to prevent multiple requests from the same IP address in a 30 minute window, which would make this attack mute, unless the attacker has a way to obtain multiple IP addresses, which most users cannot easily do. Furthermore, the attack is only interesting to users who really care about minimizing their upload bandwidth. For most files using BitThief leads to significantly longer download times. 3.3.3 Strategic Piece Revelation The reference client always announces truthfully which pieces it currently has, and follows the rarest-piece first policy when downloading from another peer. Being truthful in revealing what pieces a peer has is not necessarily optimal. The key insight is that a peer j is only interested in peer i as long as i has some pieces that j needs. Perversely, a peer i would like to maximize the amount of time other peers are interested in trading pieces. At first sight, it seems optimal for a peer to reveal all pieces it has, to maximize interest from others. However, this view is myopic, and it turns out that by under-reporting its pieces, a peer can benefit. Consider Figure 3.5 where various scenarios from peer i s perspective are shown, and i indicates i s preferences (i.e., scenarios (a) through (f) are ordered in decreasing preference order). Agent i prefers having more peers interested in i rather than fewer, and thus (a) i (c) i (e), as well as (b) i (d) i (f). Next, it is disadvantageous for i, if other peers trade pieces among each other, because this potentially reduces the other peers interest in i in the future. Thus, (a) i (b), and (c) i (d) and (e) i (f). Lastly, it is always better to have one additional peer being interested in i, even if that peer is also interested in another peer in the swarm. Thus, (b) i (c) and (d) i (e). Figure 3.5: Strategic Piece : Peer i prefers to be as interesting Revelation (Levin et al., 2008). Let β i represent i s bit-field, where β i (k) = 1 if peer i has piece k and 0 otherwise. As before, we assume that i reports its bit-field to new neighbors, and sends updates in the form of have-messages to existing neighbors. We let ˆβ i denote the bit-field that i sends to neighbors, perhaps untruthful. Consider the following algorithm: Note that this piece-revelation strategy is counter to rarest-piece first. Agent i only reveals a new piece to j when j is about to lose interest in i. And even then, i reveals the most common piece that it has and that j does not have. The idea is that providing j with a rare piece could increase 11

Algorithm 1: Strategic Piece Revelation Algorithm (Levin et al., 2008). 1. Let β i represent i s true bit-field, ˆβj denote j s bitfield as j has announced it to i, and L i (j) the list of pieces that i has revealed to j. 2. If there does not exist any piece p such that ˆβ j (p) = 0 and β i (p) = 1 then quit; i cannot truthfully gain j s interest. 3. Find the piece p with ˆβ j = 0 and β i (p) = 1 that maximizes the number of other neighbors l for which (a) l also has piece p, or (b) i has revealed p to l before. 4. Send a have-message to j, revealing that i has piece p, and add p to L i (j). other peers interest in j, thereby detracting interest from i. Thus, by following this algorithm, i tries to maintain the scenario from Figure 3.5 (a) as opposed to (b), (d) or (f). Note that in contrast to the attacks that involve uploading garbage data or contacting the tracker more frequently than the default client, the strategic piece revelation strategy cannot easily be defended against. It is private knowledge of each peer which pieces it has, and a single other peer cannot detect whether a peer underreports pieces or not. Figure 3.6 shows an example run, comparing the standard BitTorrent client with one that uses strategic piece revelation. As we can see, the strategic peer is successful in attracting more interest from other peers over a long period of time, and ultimately completes its download about 30% earlier than the reference client. Figure 3.6: Strategic Piece Revelation: The strategic peer can achieve higher interest from others and finish downloading a file faster (Levin et al., 2008). Even though using strategic piece revelation is beneficial from an individual peer s point of view, 12

a remaining question is its effect on overall social welfare. Experimental studies have shown that if all peers use strategic piece revelation, then the average download time for the whole population increases by 12%. All peers withhold information from each other, thereby leading to suboptimal allocations of bandwidth. It is perhaps surprising that the increase in download times isn t higher. However, peers continue to reveal more and more pieces to each other, to garner each other s interest. Thus, the negative effects are particularly strong at the beginning of the download, but become less severe over time. Yet, from a social welfare perspective, it is unfortunate that the BitTorrent protocol allows clients to benefit in this way. An interesting research question is whether a protocol could be designed that incentivizes truthful piece revelation. 3.3.4 BitTyrant: Strategic Unchoking The last strategic attack we consider is also the most powerful one, in terms of decreasing a BitTorrent peer s average download time. The strategy was implemented in a BitTorrent client called BitTyrant in 2006, its efficacy has been experimentally tested, and the client is freely available for download. The main idea used in BitTyrant is that the unchoking strategy, or more generally, the upload strategy implemented in most BitTorrent clients may be sub-optimal. Remember that most clients use a fixed number (often 4) of unchoking slots, upload to those clients that reciprocate with the fastest download rate, and split their bandwidth equally among all of the unchoked peers, called the equal-split policy. Figure 3.7: Reciprocation probability for a peer as a function of raw upload capacity as well as reference BitTorrent equal split bandwidth (Piatek et al., 2007). Now, consider Figure 3.7, where the reciprocation probability of a peer is shown as a function of 1) raw upload capacity as well as 2) the equal-split capacity, when using the reference BitTorrent client (remember, the equal-split capacity is the amount of upload available per slot when the upload 13

is split equally). For both graphs, we see a clear discontinuity in the reciprocation probability. In terms of equal-split capacity, this occurs at around 10 KB/s. Thus, if a peer provides upload bandwidth of 11 KB/s or 12 KB/s, it is much more likely to be reciprocated than at 10 KB/s, but increasing the upload bandwidth above approximately 14 KB/s, leads to almost no further improvement. This clearly indicates that the equal-split policy is sub-optimal from an individual peer s perspective. Especially for a high-capacity peer, it might be better to split its upload capacity into more than 4 slots, each at 30 KB/s. This fact is illustrated in Figure 3.8, where the expected download performance of a peer is shown as a function of its upload capacity. We see that the graph grows sub-linearly, with high-capacity peers getting less than their fair share of download bandwidth in return for their upload bandwidth. Figure 3.8: Expectation of download performance as a function of upload capacity (Piatek et al., 2007). The BitTyrant client exploits this insight by deviating from the reference client in three different ways: How many upload slots to use? Instead of using a fixed number of upload slots, BitTyrant dynamically adjusts the number of upload slots, dependent on whether using an additional slot increases or decreases performance. Who to unchoke? Instead of unchoking those peers with the fastest download speed, Bit- Tyrant unchokes those peers where the ratio of received download speed to provided upload speed is best. (Thus taking a return-on-investment perspective.) How fast to upload to unchoked peers? Instead of using the equal-split strategy, BitTyrant dynamically adjusts the upload bandwidth provided to every unchoked peer, with the goal to upload at the minimum rate at which that peer is willing to reciprocate. To perform this strategy, the BitTyrant client needs to do a little more bookkeeping than the reference client does and some estimation of certain parameters. In particular, for each peer j, 14

the BitTyrant client i maintains an estimate of d j, the current download rate that j provides its unchoked peers, and u j, the upload rate required to become unchoked at j. If i is unchoked by j, then d j is simply the observed download bandwidth. Otherwise, i must estimate d j based on secondary information. Remember that peers send have-messages to all peers in their neighborhood, every time they complete a new piece, even to peers that are currently unchoked. Based on the frequency with which such have-messages arrive from peer j, i can estimate j s total download bandwidth. BitTyrant then uses this estimate as an estimate for j s total upload capacity. This heuristic is justified given the measurements displayed in Figure 3.8, which suggests a roughly linear relationship between upload and download speed. Assuming an equal-split upload rate for peer j, the total estimated upload capacity is then divided by the total number of upload slots used at this speed (e.g., 4), which provides an estimate of d j, i.e., the download rate that i can expect to get from j. To estimate u j, i.e., the upload rate required to become unchoked at j, the BitTyrant client uses the distribution of equal split capacities observed in prior measurements, for example, as shown in Figure 3.7. Now consider Algorithm 2, where the individual steps of the BitTyrant unchoking strategy are described: Algorithm 2: The BitTyrant Unchoking Algorithm (Piatek et al., 2007). 1. For each peer j, peer i maintains estimates of expected download rate d j and expected upload rate required for reciprocation u j. (a) If client i is unchoked by j, then d j is the observed download bandwidth. Otherwise, d j is inferred indirectly from j s block announcement rate. (b) Initialize u j using the distribution of equal split capacities observed in prior measurements (Figure 3.7). 2. Each round, rank order peers by decreasing ratio d j u j upload capacity is reached. d 0, d 1, d 2, d 3, d 4,... u 0 u 1 u 2 u 3 u } {{ 4 } choose k k j=0 u j cap i 3. At the end of each round, for each unchoked peer j: (a) If peer j does not unchoke i: u j (1 + α)u j (b) If peer j unchokes i: d j observed rate. (c) If peer j has unchoked i for the last r rounds: u j (1 γ)u j and unchoke those of top rank until the By using this algorithm, a client i aims to download from those peers that reciprocate most for the least number of bytes uploaded to them. By ranking peers j in decreasing order of their d j u j ratio, it keeps adding new upload slots, filling them with the most valuable peers, until the upload budget cap i is reached. Of course, the parameters α, γ and r used in the update step are open for fine-tuning. In their experiments, Piatek et al. (2008) decreased u j by γ = 10% if the peer reciprocated for r = 3 rounds, and increased u j by α = 20% if the peer failed to reciprocate after 15

being unchoked during the previous round. How effective is the BitTyrant strategy in improving the download performance for an individual BitTyrant peer, and what are its effect on the whole BitTorrent population? Figure 3.9 compares the download times for a single BitTyrant client with that of a single BitTorrent client (the client Azureus in this case), when all other peers are using the standard BitTorrent client. The upload capacity is plotted on the x-axis, and the achieved download times on the y-axis. The BitTyrant client consistently outperforms the BitTorrent client, with average download times up to half as long (e.g., for an upload cap of 300 KB/s). While the performance of the BitTorrent client saturates at around an upload cap of 100 KB/s, the performance of BitTyrant continues to improve. The fact that download times do not decrease much further for BitTyrant as upload capacities reach 1,000 KB/s is due to the limited swarm size. To allow such a peer to achieve its full potential, it would need a swarm with multiple hundreds of peers, which is rare in practice. The BitTyrant client can in fact discover this point of diminishing returns and cap its upload bandwidth at this point. Figure 3.9: Download times and sample standard deviation comparing performance of a single BitTyrant client and an unmodified Azureus client on a synthetic Planet-Lab swarm (Piatek et al., 2007). What if all users in a swarm used BitTyrant? We distinguish between two cases: 1) the Bit- Tyrant client uses its full upload capacity, even if it has reached its point of diminishing returns; 2) the BitTyrant client caps its upload capacity at the point of diminishing returns. Consider Figure 3.10, where the CDF (cumulative distribution function) of completion times is shown, comparing the two version of the BitTyrant client with the reference client. In this experiment, which was conducted on the PlanetLab testbed, every peer makes the same choice of client. We see that un-capped BitTyrant client provides the highest performance, the reference client lies in the middle, and the capped BitTyrant client achieves lowest performance (not for the individual client, but for the whole population). 16

Figure 3.10: CDFs of completion times for a 350 node PlanetLab experiment. BitTyrant and the reference client assume all users contribute all of their upload capacity. Capped BitTyrant shows performance when high capacity peers limit their contributions to the point of diminishing returns for performance (Piatek et al., 2007). The reason why the un-capped BitTyrant client leads to better overall performance is the skewed distribution of upload capacities in the swarm. Using BitTyrant, high capacity peers obtain more file pieces faster, and can then reciprocate with more pieces, thus effectively increasing the swarm s capacity. On the other hand, when the BitTyrant peers cap their upload bandwidth at the point of diminishing returns, the individual peers do not suffer from setting their own cap, but the overall performance decreases. Every BitTyrant peer that limits its upload bandwidth effectively reduces the total amount of upload bandwidth available in the whole swarm, which necessarily means that other peers get lower download speeds. Thus, if in practice, high capacity peers decide to limit their upload bandwidth (for example because they are participating in multiple swarms, or because they want to use their upload bandwidth for other applications), the performance of a BitTyrant swarm can be lower than that of the default BitTorrent swarm. Another draw-back of using BitTyrant is that new users may experience an extended bootstrapping phase. Unlike the reference client, which uses one upload slot for optimistic unchoking, the BitTyrant client only unchokes peers that send at a fast rate, and thus, new peers only get unchoked via excess capacity in the system. That means, a BitTyrant peer will only ever unchoke a newcomer that is not uploading anything in return if there is no other peer where the BitTyrant peer would get more in return for its upload bandwidth, and if the BitTyrant peer is not using an upload cap. Introducing optimistic unchoking would increase the swarm s performance, however, selfish clients would not have an incentive for this behavior. Thus, while BitTyrant certainly improves an individual peer s download performance, it does not necessarily improve the overall swarm s performance. 17

3.4 Discussion As a concluding remark, it is worth noting that BitTorrent works amazingly well, despite all of the possible ways it can be attacked. As we have seen, many BitTorrent clients that can significantly improve a user s download performance have been proposed, and are freely available for download. Yet, almost nobody uses those clients in practice. The most-widely used clients implement strategies very similar to the reference client. It seems that most BitTorrent users are relatively happy with the performance they obtain using those clients. It is also worth noting that the user experience of those clients is also much better compared to the strategic clients we described in this paper, with graphically appealing user interfaces, integrated search, and much more. Many (rational) users may decide that a nice user interface is more important than a slightly faster download experience. However, some users purposefully select very altruistic settings, i.e., they program their clients to upload at least as much as they download. Thus, there are many open questions regarding what motivates users to share in P2P file sharing communities, even when they don t benefit directly from those actions. We will come back to this topic in Chapter 11, where we talk about behavioral economics, and its role in the design and analysis of computational systems. 3.5 Notes One of the first papers that studied P2P file-sharing was the paper Free Riding on Gnutella by Adar and Huberman (2000), which showed convincingly how prevalent free riding was already in 2000. The follow-up paper Free Riding on Gnutella Revisited: The Bell Tolls? by Hughes et al. (2005) showed that the amount of free riding had increased significantly from 2000 to 2005. For background reading on BitTorrent, the original paper Incentives Build Robustness in BitTorrent by Cohen (2003) gives a short yet good introduction to the protocol design. A good reference for the tit-for-tat strategy, an important element of Cohen s original design, is the paper The Evolution of Cooperation by Axelrod and Hamilton (1981). A detailed measurement study illustrating the performance of BitTorrent can be found in the paper The BitTorrent P2P File-Sharing System: Measurements and Analysis by Pouwelse et al. (2005). The majority of papers on BitTorrent have challenged the idea that BitTorrent s incentives provide robustness against strategic attacks. Shneidman et al. (2004) as well as Liogkas et al. (2006) have described multiple simple attacks, including the exploit based on uploading garbage data. The paper by Locher et al. (2006) introduced the BitThief client, and showed that it is possible to use BitTorrent effectively without ever uploading. The piece revelation strategy was described and shown to be effective by Levin et al. (2008). The BitTyrant client was introduced by Piatek et al. (2007) and has shown the major download speed improvements that can be obtained by using a better unchoking strategy. Due to space constraints, we could not discuss the Propshare Mechanism by Levin et al. (2008), which propose a different unchoking algorithm than BitTyrant, leading to slightly better individual and system-wide performance. The advances from the systems research community regarding exploiting the BitTorrent protocol and developing better strategies have outpaced the theoretical understanding of the incentives in BitTorrent. Some examples of theoretical work includes the paper on coupon replication systems by Massoulié and Vojnović (2005), and the paper on anonymous social networks by Immorlica et al. (2010). A good overview of the related literature and a more realistic model of BitTorrent is provided in Ruberry and Seuken (2011). The theoretical work on BitTorrent may eventually 18

reveal the optimal BitTorrent manipulation/strategy, if everyone else uses the reference client, or if everyone else is best-responding. Perhaps more importantly, it will also inform our design of new distributed file-sharing protocols with better incentives, i.e., protocols that cannot be manipulated. References Eytan Adar and Bernardo A. Huberman. Free riding on gnutella. First Monday, 5(10), 2000. R Axelrod and WD Hamilton. The evolution of cooperation. Science, 211(4489):1390 1396, 1981. Bram Cohen. Incentives build robustness in BitTorrent. In Proceedings of the First Workshop on Economics of Peer-to-Peer Systems, 2003. Daniel Hughes, Geoff Coulson, and James Walkerdine. Free riding on gnutella revisited: The bell tolls? IEEE Distributed Systems Online, 6(6), 2005. N. Immorlica, B. Lucier, and B. Rogers. Emergence of cooperation in anonymous social networks through social capital. In Proceedings of the 11th ACM Conference on Electronic Commerce (EC), 2010. Dave Levin, Katrina LaCurts, Neil Spring, and Bobby Bhattacharjee. BitTorrent is an auction: Analyzing and improving BitTorrent s incentives. In ACM SIGCOMM, 2008. Nikitas Liogkas, Robert Nelson, Eddie Kohler, and Lixia Zhang. Exploiting bittorrent for fun (but not profit). In 5th International Workshop on Peer-to-Peer Systems (IPTPS), 2006. Thomas Locher, Patrick Moor, Stefan Schmid, and Roger Wattenhofer. Free riding in BitTorrent is cheap. In Fifth Workshop on Hot Topics in Networks, 2006. Laurent Massoulié and Milan Vojnović. Coupon replication systems. In Proceedings of ACM SIGMETRICS, 2005. Michael Piatek, Tomas Isdal, Thomas Anderson, Arvind Krishnamurthy, and Arun Venkataramani. Do incentives build robustness in BitTorrent? In 4th USENIX Symposium on Networked Systems Design & Implementation, 2007. J.A. Pouwelse, P. Garbacki, D.H.J. Epema, and H.J. Sips. The bittorrent p2p file-sharing system: Measurements and analysis. In Proceedings of the 4th International Workshop on Peer-to-Peer Systems (IPTPS), 2005. Mike Ruberry and Sven Seuken. Sharing in bittorrent can be rational. Working Paper, Harvard University, 2011. Jeffrey Shneidman, David C. Parkes, and Laurent Massouli. Faithfulness in internet algorithms. In Proceedings of the ACM SIGCOMM Workshop on Practice and Theory of Incentives and Game Theory in Networked Systems (PINS), 2004. 19