On the Impact of P2P File Sharing Traffic Restrictions on User Perceived Performance

Transcription

1 On the Impact of P2P File Sharing Traffic Restrictions on User Perceived Performance Ricardo Lopes Pereira, Teresa Vazão Instituto Superior Técnico Av. Prof. Dr. Cavaco Silva, Porto Salvo, Portugal Abstract Peer to Peer (P2P) File Sharing (FS) applications represent, today, the major traffic source on the Internet. Unlike other traffic types, such as HTTP, where the major traffic sources are identifiable, peers are, by definition, spread over the Internet, making it hard for ISPs to architect their networks to accommodate Peer to Peer traffic. In order to alleviate the impact of P2P FS traffic on other applications, ISPs often resort to price strategies based on traffic or traffic shaping techniques, in order to restrict P2P FS applications usage. The success of these initiatives is often limited, as they may aggravate other customers which do not use P2P FS or be circumvented by some of the P2P FS applications, which try to misrepresent their traffic as belonging to other applications. In this paper we study the impact that different methods for P2P FS traffic reduction have on the traffic carried by ISPs and on the download performance perceived by P2P FS users. Through simulation we compared the usage of traffic shaping with the more recent techniques Biased Neighbour Selection () and Adaptive Search Radius (). We observed that traffic shaping provides ISPs with the fewer traffic savings, especially when compared to the price paid by P2P FS users. and provide similar benefits for users and ISPs. We ve also observed that and are complementary technologies, which may be combined in order to achieve higher efficiency. I. INTRODUCTION Peer-to-Peer (P2P) File Sharing (FS) applications are wildly disseminated. Their popularity is such that several sources indicate that they are responsible for up to 70% of the Internet traffic [1], [2]. Contrary to what was expected by some, the availability of commercial music and video download services and the legal pressure imposed by copyright owners associations, such as RIAA, did not cause P2P FS adoption to slow down. P2P FS traffic exhibits a behaviour very distinct from that of traditional applications such as or HTTP, one that most ISPs networks were not designed to handle [1], [2]: Upstream/downstream ratio: Applications such as HTTP exhibit a very high download/upload ratio, which has used to plan many networks. P2P FS users have incentives to upload as much as they download, and some peers may even operate only as uploaders (seeders). Time of day usage patterns: ISPs expect their user population to follow certain behaviour patters. For instance, home users are only expected to use the network at evening and during weekends. P2P FS applications are often left unattended, running 24 hours a day. Traffic sources: traffic originates mostly from servers within an ISP s own network. HTTP traffic comes primarily from local proxies or from a few well known popular sites. P2P traffic can originate anywhere. A peer might decide to download a file part from a peer within the same ISP or from across the world. Over-Subscription ratios: A typical HTTP user will download a page and then spend some time reading it. This allows ISPs to use large over-subscription ratios, as most of the time users aren t utilising the bandwidth made available to them. The download of a large movie file via P2P may take many hours, while it will only take about two hours to view it. P2P FS users will download/upload almost continuously. P2P FS traffic increases ISPs transit and peering costs while affecting the performance of other applications, which can alienate costumers, increasing costumer churn. Upgrading the network to higher bandwidth is not an option as the increased cost would not be compensated by additional revenue. Furthermore, as P2P FS applications are designed to consume as much bandwidth as possible, the new bandwidth would rapidly be consumed. Blocking P2P FS traffic isn t an option either, as P2P FS applications are one of the drivers behind broadband adoption. Doing it would most likely result in the loss of large numbers of subscribers. The increasing usage of P2P FS technology for independent and commercial music, video and software distribution means that the problem is here to stay, and ISPs and P2P FS users must find ways to balance their goals. The main contribution of this paper is a simulation study of some of the options available for minimising the impact of P2P FS traffic on the Internet and on ISPs business. We compare the tradeoff between the traffic savings reaped by an ISP and the performance impact suffered by its P2P using costumers when using three different techniques: traffic shaping, Biased Neighbour Selection () and Adaptive Search Radius (). We ve found that these technologies are complementary, and also studied the effect of combining them. The second contribution is the utilisation of a simulation model which implements the full dynamics of a real P2P protocol (edonkey), taking into account all the effects at the network, transport and application layers, instead of a simple abstraction.

2 In the next section the traffic saving techniques are discussed, along with other methods for reducing the cost incurred by ISPs with P2P FS. Section III presents the simulation study and its findings. Section IV presents the conclusions and plans for future work. II. COST REDUCTION TECHNIQUES It has been observed that P2P FS shows strong locality properties, both in terms of network topology and geography, suggesting that self-caching mechanisms could be enacted by forcing peers to download from nearby peers instead of distant ones [3], [4]. Self-caching would potentially result in better performance for users and lower costs for ISPs. and are two techniques that explore these locality properties. Traffic shaping is a more traditional approach, which limits the amount of bandwidth made available to P2P applications. A. Biased neighbour selection Biased Neighbour Selection is a technique proposed for use with the BitTorrent protocol [5]. It consists of using a transparent redirector, which intercepts the communication from peers inside an ISP s network with the tracker. The sources (peers) provided by the tracker are filtered and replaced in order to force the peer to communicate mainly with other peers within the ISP s network. Only a few peers from outside the ISP s network are provided, to ensure that the download will complete successfully. requires ISPs to run the query interceptors at the edge of their networks, using deep packet inspection in order to intercept communications to the tracker/server and other peers (due to peer source exchange and the growing use of Kademlia). It is a solution that each ISP may choose to use on its own, limiting the amount of traffic exchanged with other ISPs while benefiting users, which are expected to download faster from within their own ISP s network. reduces the costs incurred by an ISP when there are enough sources for the file within its network. Otherwise it provides no advantages, as the files will have to be downloaded from the outside. Furthermore it also fails to reduce the amount of upload traffic, as it doesn t prevent outside peers from downloading from the peers within the ISP s network. The basic concept of had already been suggested before in studies conducted on traffic traces of Kazaa and BitTorrent, where byte hit ratios of up to 63% were estimated to be achievable by exploring locality properties within an ISP [6], [7]. B. Adaptive Search Radius consists of a peer selection algorithm which can be used with emule, BitTorrent or other P2P file sharing applications [8]. Instead of downloading from any peer, selects a subset of the peers it knows, the nearest ones (fewer IP network hops), allowing the file download to be performed with a smaller impact on the Internet. uses file availability as a metric. This is defined as the number of contactable peers sharing the rarest file part the peer doesn t already have. As uploaders are contacted, their distance (in network hops) and the file parts they share are determined. Minimum and maximum file availability thresholds are defined as constant values in the algorithm. An peer maintains a value for the Search Radius of every file, the maximum distance (in network hops) a peer may be in order to be considered. After learning (or updating) which file parts a peer shares, the file availability is calculated. If the file availability is larger than the maximum threshold, the search radius is reduced by one hop while it remains larger than the minimum threshold. If all peers within the search radius have been contacted but file availability remains below the minimum threshold, the search radius is increased one hop. will result in the use of file sources local to the ISPs network when these are available. When contacting peers outside an ISP s network, the closest ones will be used. Figure 1 depicts the behaviour of a P2P file sharing client (node H) while downloading a file. Filled circles represent peers, empty ones represent routers. Filled lines represent links not being used by file transfers to peer H, while dotted lines represent links crossed by file transfer traffic. Node H, through the use of a file sharing P2P protocol, has learned that nodes A to I share the wanted file. In figure 1(a), peer H does not use to filter out distant peers, downloading from all of them. Therefore, download traffic crosses most links, contributing to their congestion. Figure 1(b) shows what happens when is used. In this example, it is assumed that calculated a search radius of 4 network hops. This means, that peers F (4 hops), G (2 hops) and I (3 hops), together provide the required minimum number of sources for all the pieces peer H still has to download. As peers restrain from downloading from peers outside the search radius for each file, information travels fewer network hops, releasing capacity on all other links. Peer H still needs to download a full copy of the file, but the download, being performed from close peers, impacts fewer Internet links. The decreased number of peers to download from is compensated by having to wait less to start downloading, as upload queues are shorter. Since each downloading peer will restrict the set of peers it downloads from, each uploader will have to satisfy fewer requests. should result in lower P2P traffic crossing ISP s boundaries while providing users with faster downlodas. However, this is not a method that ISPs can deploy on their own. Instead it would have to be built into P2P software. C. Traffic shaping In order to control the impact of P2P FS traffic on their networks, many ISPs have deployed traffic shaping equipment. This performs deep packet inspection on incoming or outgoing traffic, in order to determine the traffic flows which correspond to P2P FS sessions. Having identified P2P FS traffic, QoS techniques may be applied to it. ISPs may choose to limit the aggregate rate of P2P FS traffic, the rate of individual users traffic or provide all P2P FS traffic with a less than best effort treatment. This allows ISPs to limit the effect P2P FS has on

3 (a) Without (b) With using search radius of 4 Fig. 1. Links crossed by file download traffic by node H their networks, benefiting other applications, without requiring additional investments in bandwidth. However, P2P FS traffic is difficult to identify as developers continuously try to circumvent detection. This means that some P2P FS traffic will always get past unidentified. The risk for false positives also exists. Furthermore, the processing required for deep packet inspections may cause some delay on all traffic [2]. This may alienate some subscribers. Traffic shaping affects all P2P FS traffic equally. Even though users may not have the right to complaint that their illegal downloads are slow, they will protest when it affects their rightful downloading activities [1]. This may cause legal problems for ISPs, which may be accused of favouring other commercial content distribution methods, especially when network neutrality is being discussed in some countries. D. Other methods Some ISPs have abandoned unlimited traffic subscription plans or increased their prices. Users are now offered different subscription plans, with different monthly bandwidth allowances. Extra traffic is charged at high rates, which discourages heavy P2P FS usage. This transfers the costs associated with P2P FS traffic back to the users responsible for the traffic. However, it also impacts other heavy users, not only P2P FS users. Furthermore, it overlooks that P2P FS traffic has been one of the drivers behind broadband adoption. This method will only work when there is no competition offering unlimited plans, as otherwise it will result in subscribers leaving for the competition. ISPs have also taken advantage of P2P FS protocols reciprocity, which provide peers with download speeds correlated to their upload speed. Many ISPs restrict the upload speed well beyond that imposed by the access technology used to connect the subscribers. One such example is the use of ADSL, where even though the maximum download speed is often used, only a fraction of the possible upload speed is offered. P2P FS cache is a concept similar to HTTP caches, with which ISPs are familiar. Caches intercept P2P connections going outside the ISP s network and impersonate remote peers. If the content is present in the cache, it will be served locally, otherwise the remote peer will be contacted by the cache, which will keep a copy of the content while delivering it to the requesting peer. Byte hit ratios as high as 80% are claimed by P2P caching solution vendors [1]. The use of caches provides P2P FS users with similar or better performance, while releasing bandwidth for other applications, allowing ISPs to avoid or delay investments into network upgrades. However, running a cache may implicate an ISPs in illegal sharing of copyrighted content, subjecting it to legal problems. Also, it has been observed that large video files are the fastest growing file type in P2P networks, representing the largest (65%) portion of bytes transfered. These files are the ones a cache system should address, however, their size and number require too large a storage system to be easily performed outside the P2P network [6]. A. Simulation scenario III. EVALUATION We evaluated, and traffic shaping using the SSFNet 2.0 network simulator, which provides the Layer 2,3 and 4 [9]. On it, we implemented the edonkey/emule protocol. We used the GT-ITM topology generator to create a transitstub network with 350 routers and 840 P2P FS peers [10]. Hosts are connected to their routers using 0.5 and 1Mb/s links. Stub routers are connected among themselves using 5 and 10 Mb/s links. Stub networks are connected to each other and to transit networks using 2 and 5 Mb/s links. Transit routers are connected using 10, 20 and 50Mb/s links. Transit networks are connected using 10 and 20Mb/s links. We simulated the distribution of a 100MB file, seeded by 7 peers to the other 834. This depicts a situation where a company uses P2P FS to distribute some content (for instance e-learning) to their various offices, or a flash crowd where hosts continue to share after finishing their downloads.

4 Each time a peers asked the server for file sources, it would reply with at most 50 peers. When using, of this 50 peers, up to 40, if available, would be within its own ISP. was configured to increase its search radius when the file availability was below 3 and to reduce it when file availability was greater than 6. Traffic shaping was used by limiting the aggregate P2P FS bandwidth permitted on the stub-stub and stub-transit links, using a bit-bucket. We experimented limiting the available P2P FS bandwidth to 4, 3, 2, 1 and 0.5 Mb/s. I ve also decided to combine the use of both and, being that these are not conflicting technologies. The use of will allow to converge on close-by peers more quickly, as the peers it learns about will be mostly within the same ISP. From the point of view of, the use of will allow the peer to choose the closest peers from within the ISP, instead of using a random set. Also, while is unable to provide traffic savings when there are no sources within the same ISP, will download from the closer peers. B. Result analysis Table I shows the main metrics gathered from the experiments. We can observe that both, and their combination provide significant improvements on all metrics over the use of plain edonkey/emule, both from the point of view of the ISPs and of the subscriber. From the subscriber perspective, any of the alternatives, alone, result in noticeably faster downloads, with having a slight advantage. Their combination provides even better results. This means that users would not feel alienated by any of these technologies but rather have the feeling that the ISP embraces P2P FS, which could result in increased costumer loyalty. From the point of view of a transit ISP, both and result in very significant traffic savings, which could be translated into better and cheaper services for their regional ISP costumers. The combination of and results in the release of more than two thirds of the used bandwidth. From the point of view of the regional ISP, which provides the service for the P2P FS users, the use of or results in a reduction of the intra-isp (stub) traffic and in an even more significant reduction of the traffic exchanged with the other ISPs (inter-isp). The combined use of and TABLE I COMPARING THE DIFFERENT TECHNIQUES / Avg. Download Time (s) Transit Traffic (GB) Stub Traffic (GB) Inter-ISP Traffic (GB) Number of Connections (K) Avg. Hops Crossed Efficiency (%) results in even more savings, especially on the inter-isp traffic, which is reduced to less than a third. The large reduction in inter-isp traffic would allow ISPs to significantly reduce their peering and transit costs, which could provide a competitive advantage. Also significant is the reduction in terms of number of connections necessary to distribute the file. Here we see that has an advantage over, but not over their combined use. Even though the number of connections affects primarily the peers, being that TCP is an end-to-end protocol, the fewer connections to keep track of allow for better scaling of any deep packet inspection equipment used by complementary techniques. We can also observe that the average number of network hops crossed by each P2P FS data packet is reduced by the use of and even more by the use of. Their combined use yields the best results. The last metric used was efficiency, were we measure the amount of traffic which crossed every network link against the ideal minimum traffic required to copy the file from the sources to all the other peers. The minimum traffic was calculated by determining the number of links present in the minimum spanning tree rooted at all the sources, which encompassed all the downloading peers. The number of links was them multiplied by the size of the file. It would be impossible for any protocol to reach 100% efficiency as this does not take into account any of the protocol overheads suffered by the real protocols (IP, TCP, edonkey/emule). Once again we can observe that the use of or is advantageous, with having an edge over. Their combination is, once again, very beneficial. Figure 2 show the evolution of the amount of inter-isp traffic used when the allowed inter-isp bandwidth is varied by traffic shaping. We can observe that significant savings are only accomplished when the bandwidth is reduced to 1Mb/s. The reduction of the bandwidth from 1 to 0.5 Mb/s results in a faster decrease in the traffic consumption. The different techniques maintain their relative positions with the different bandwidths, except for 0.5Mb/s, where provides much greater savings than, indicating that copes better with traffic shaping. Figure 3 shows the behaviour of the average download time achieved by the several techniques under different traffic shaping restrictions. It is noticeable that users pay a high price for the use of traffic shaping. The plain edonkey/emule protocol would not allow ISPs to reduce P2P FS traffic below the 2Mb/s, in this particular scenario, or they would witness massive subscriber churn. However, only bellow this mark were traffic saving noticeable. shows a much better behaviour, but it would also prevent the use of 0.5Mb/s, and even 1Mb/s could be risky. show a behaviour very close to that of and combined, and both methods would allow P2P FS traffic to be limited to 1Mb/s without causing major annoyance to subscribers. Figure 4 shows the evolution of the traffic efficiency of the several techniques under different traffic shaping restraints. It

5 Inter-ISP P2P FS traffic (GB) / Efficiency (%) / Fig. 2. Amount of traffic exchanged among ISPs versus allowed bandwidth Fig. 4. Efficiency of the several techniques under different bandwidths Average download duration (s) Fig / 0 Behaviour of download time with different bandwidths is observable that the relative positions are always maintained. However, we can see that combined with and, especially, by it self, react better to severe bandwidth restrictions, increasing their efficiency. Under the 0.5 Mb/s, the plain edonkey/emule protocol reduced its performance, as the reduced bandwidth becomes insufficient for both data and control messages, reducing each peer capacity do discover new, closer-by peers. IV. CONCLUSIONS We analysed the impact of three different bandwidth saving techniques, which have the potential to reduce ISP s costs with P2P FS traffic: Biased Neighbour Selection, Adaptive Search Radius and Traffic Shaping. We ve used a network simulator and a faithful implementation of the edonkey/emule protocol to analyse the impact of every combination of the above techniques on the traffic carried and exchanged by an ISP and on the download time perceived by subscribers. We ve concluded that the techniques are complementary, and that they may be combined, all three, in order to achieve the best results. Traffic shaping, of the three the only technique which is wildly deployed, proved to be the least effective. Not only does it provide modest traffic reduction, but renders P2P FS unusable, due to the very high download times. and, being very different by design, provide similar results. provided slightly faster downloads and slightly higher inter-isp traffic savings which result in more immediate advantages for ISPs. However, generates slightly greater savings for the transit ISPs, which could result in cheaper transit rates for regional ISPs. also behaves better when combined with extreme traffic shaping, providing greater inter-isp traffic savings and faster downloads. The best results were, under every circumstance, achieved by the combination of and. REFERENCES [1] PeerApp. (2007, Mar.) Comparing P2P Solutions. White Paper. [Online]. Available: [2] Sandvine. (2004) Meeting the Challenge of Today s Evasive P2P Traffic. White Paper. [Online]. Available: general/getfile.asp?fileid=16 [3] J. Chu, K. Labonte, and B. N. Levine, Availability and Locality Measurements of Peer-to-Peer File Systems, in ITCom: Scalability and Traffic Control in IP Networks, [4] A. Klemm, C. Lindemann, M. K. Vernon, and O. P. Waldhorst, Characterizing the query behavior in peer-to-peer file sharing systems. in Internet Measurement Conference, A. Lombardo and J. F. Kurose, Eds. ACM, 2004, pp [5] R. Bindal, P. Cao, W. Chan, J. Medval, G. Suwala, T. Bates, and A. Zhangan, Improving Traffic Locality in BitTorrent via Biased Neighbor Selection, in ICDCS, July [6] P. K. Gummadi, R. J. Dunn, S. Saroiu, S. D. Gribble, H. M. Levy, and J. Zahorjan, Measurement, modeling, and analysis of a peer-to-peer file-sharing workload. in SOSP, 2003, pp [7] T. Karagiannis, P. Rodriguez, and D. Papagiannaki, Should ISPs fear Peer-Assisted Content Distribution? in ACM SIGCOMM/USENIX IMC 05, Oct [8] R. L. Pereira, T. Vazão, and R. Rodrigues, Adaptive Search Radius - Lowering Internet P2P File-Sharing Traffic through Self-Restraint, in The 6th IEEE International Symposium on Network Computing and Applications (IEEE NCA07), [9] J. H. Cowie, D. M. Nicol, and A. T. Ogielski, Modeling the Global Internet, Computing in Science & Engineering, vol. 1, no. 1, pp , [10] E. W. Zegura, K. L. Calvert, and S. Bhattacharjee, How to model an internetwork, in INFOCOM, 1996, pp