How To Analyse The Edonkey 2000 File Sharing Network



Similar documents
The Role and uses of Peer-to-Peer in file-sharing. Computer Communication & Distributed Systems EDA 390

Information Searching Methods In P2P file-sharing systems

P2P Filesharing Population Tracking Based on Network Flow Data

Peer-to-Peer File Sharing

DDoS Vulnerability Analysis of Bittorrent Protocol

Peer-to-Peer Networks Status th Week

N6Lookup( title ) Client

Using UDP Packets to Detect P2P File Sharing

Trace Driven Analysis of the Long Term Evolution of Gnutella Peer-to-Peer Traffic

A Measurement-based Traffic Profile of the edonkey Filesharing Service

Napster and Gnutella: a Comparison of two Popular Peer-to-Peer Protocols. Anthony J. Howe Supervisor: Dr. Mantis Cheng University of Victoria

Advanced Application-Level Crawling Technique for Popular Filesharing Systems

Introduction Page 2. Understanding Bandwidth Units Page 3. Internet Bandwidth V/s Download Speed Page 4. Optimum Utilization of Bandwidth Page 8

Multicast vs. P2P for content distribution

Clustering in Peer-to-Peer File Sharing Workloads

An Introduction to Peer-to-Peer Networks

LCMON Network Traffic Analysis

The Challenges of Stopping Illegal Peer-to-Peer File Sharing

Peer-to-Peer Systems: "A Shared Social Network"

The Index Poisoning Attack in P2P File Sharing Systems

Measure wireless network performance using testing tool iperf

A Measurement Study of Peer-to-Peer File Sharing Systems

First Midterm for ECE374 03/24/11 Solution!!

Three short case studies

Bandwidth Management for Peer-to-Peer Applications

Computer Networks CCNA Module 1

P2P Filesharing Systems: Real World NetFlow Traffic Characterization

RESEARCH ISSUES IN PEER-TO-PEER DATA MANAGEMENT

Internet Content Distribution

Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop

Table of Contents. Cisco Blocking Peer to Peer File Sharing Programs with the PIX Firewall

NETWORKS AND THE INTERNET

Mapping of File-Sharing onto Mobile Environments: Feasibility and Performance of edonkey with GPRS

Frequently Asked Questions

PEER TO PEER FILE SHARING USING NETWORK CODING

Examining the Impact of the Earthquake on Traffic on a Macro Level

State of the Art in Peer-to-Peer Performance Testing. European Advanced Networking Test Center

P2P Node Setup Guide Authored by: Unitsa Sungket, Prince of Songkla University, Thailand Darran Nathan, APBioNet

P2P File-sharing Traffic Identification Method Validation and Verification

MINIMUM NETWORK REQUIREMENTS 1. REQUIREMENTS SUMMARY... 1

1 University of Würzburg. Institute of Computer Science Research Report Series

HW2 Grade. CS585: Applications. Traditional Applications SMTP SMTP HTTP 11/10/2009

Telematics. 14th Tutorial - Proxies, Firewalls, P2P

The Effect of Caches for Mobile Broadband Internet Access

Overlay Networks. Slides adopted from Prof. Böszörményi, Distributed Systems, Summer 2004.

P2P: centralized directory (Napster s Approach)

Bandwidth Aggregation, Teaming and Bonding

From Centralization to Distribution: A Comparison of File Sharing Protocols

Unit 3 - Advanced Internet Architectures

LARGE-SCALE INTERNET MEASUREMENTS FOR DIAGNOSTICS AND PUBLIC POLICY. Henning Schulzrinne (+ Walter Johnston & James Miller) FCC & Columbia University

Application Layer. CMPT Application Layer 1. Required Reading: Chapter 2 of the text book. Outline of Chapter 2

Trends and Differences in Connection-behavior within Classes of Internet Backbone Traffic

D. SamKnows Methodology 20 Each deployed Whitebox performs the following tests: Primary measure(s)

BroadCloud PBX Customer Minimum Requirements

Improving Gnutella Protocol: Protocol Analysis And Research Proposals

Peer-to-Peer Networks Organization and Introduction 1st Week

P2P Characteristics and Applications

Ethernet. Ethernet Frame Structure. Ethernet Frame Structure (more) Ethernet: uses CSMA/CD

CSCI Topics: Internet Programming Fall 2008

Exploiting P2P Systems for DDoS Attacks

PUBLIC DOMAIN P2P FILE SHARING NETWORKS CONTENT AND THEIR EVOLUTION

Peer to peer networking: Main aspects and conclusions from the view of Internet service providers

Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU

Internet measurements at complexnetworks.fr

Managing Virtual Servers

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

Network measurement II. Sebastian Castro NZRS 27 th May 2015 Victoria University

WebEx. Network Bandwidth White Paper. WebEx Communications Inc

Improving the Database Logging Performance of the Snort Network Intrusion Detection Sensor

New possibilities for the provision of value-added services in SIP-based peer-to-peer networks

Ethernet. Ethernet. Network Devices

Non-intrusive, complete network protocol decoding with plain mnemonics in English

An apparatus for P2P classification in Netflow traces

LAN Planning Guide LAST UPDATED: 1 May LAN Planning Guide

Inside the Storm: Protocols and Encryption of the Storm Botnet

Peer-to-Peer Networks 02: Napster & Gnutella. Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg

An Architecture Concept for Mobile P2P File Sharing Services

Chapter 3. Internet Applications and Network Programming

SAN/iQ Remote Copy Networking Requirements OPEN iscsi SANs 1

Wharf T&T Limited DDoS Mitigation Service Customer Portal User Guide

PEER-TO-PEER NETWORK

Advanced Peer to Peer Discovery and Interaction Framework

HollyShare: Peer-to-Peer File Sharing Application

1.1 Prior Knowledge and Revision

A Measurement of NAT & Firewall Characteristics in Peer to Peer Systems

A Survey of Peer-to-Peer Network Security Issues

Netgear TA612VMNF & TA612VLD Netgear WGR613VAL. Quality of Service (QOS) function

Peer-to-Peer Systems. Winter semester 2014 Jun.-Prof. Dr.-Ing. Kalman Graffi Heinrich Heine University Düsseldorf

Chakchai So-In, Ph.D.

Best Practices - Monitoring and Controlling Peer-to-Peer (P2P) Applications

INSTITUTIONAL COMPLIANCE REQUIREMENTS PUBLIC LAW

Effects of Filler Traffic In IP Networks. Adam Feldman April 5, 2001 Master s Project

Networks and the Internet A Primer for Prosecutors and Investigators

Voice over Internet Protocol (VoIP) systems can be built up in numerous forms and these systems include mobile units, conferencing units and

MULTI WAN TECHNICAL OVERVIEW

The Algorithm of Sharing Incomplete Data in Decentralized P2P

Characterizing the Query Behavior in Peer-to-Peer File Sharing Systems*

Network-Oriented Software Development. Course: CSc4360/CSc6360 Instructor: Dr. Beyah Sessions: M-W, 3:00 4:40pm Lecture 2

Single Pass Load Balancing with Session Persistence in IPv6 Network. C. J. (Charlie) Liu Network Operations Charter Communications

Internet2 NetFlow Weekly Reports

Transcription:

The edonkey File-Sharing Network Oliver Heckmann, Axel Bock, Andreas Mauthe, Ralf Steinmetz Multimedia Kommunikation (KOM) Technische Universität Darmstadt Merckstr. 25, 64293 Darmstadt (heckmann, bock, mauthe, steinmetz)@kom.tu-darmstadt.de Abstract: The edonkey 2000 file-sharing network is one of the most successful peerto-peer file-sharing applications, especially in Germany. The network itself is a hybrid peer-to-peer network with client applications running on the end-system that are connected to a distributed network of dedicated servers. In this paper we describe the edonkey protocol and measurement results on network/transport layer and application layer that were made with the client software and with an open-source edonkey server we extended for these measurements. 1 Motivation and Introduction Most of the traffic in the network of access and backbone Internet service providers (ISPs) is generated by peer-to-peer (P2P) file-sharing applications [San03]. These applications are typically bandwidth greedy and generate more long-lived TCP flows than the WWW traffic that was dominating the Internet traffic before the P2P applications. To understand the influence of these applications and the characteristics of the traffic they produce and their impact on network design, capacity expansion, traffic engineering and shaping, it is important to empirically analyse the dominant file-sharing applications. The edonkey file-sharing protocol is one of these file-sharing protocols. It is implemented by the original edonkey2000 client [edonkey] and additionally by some opensource clients like mldonkey [mldonkey] and emule [emule]. According to [San03] it is with 52% of the generated file-sharing traffic the most successful P2P file-sharing network in Germany, even more successful than the FastTrack protocol used by the P2P client KaZaa [KaZaa] that comes to 44% of the traffic. Contrary to other P2P file-sharing applications like Gnutella that are widely discussed in literature the edonkey protocol is not well analysed yet. In this paper, we shed light on this protocol with a measurement study. In the next section we describe the protocol itself. In Section 3 our measurements are presented. They were made with the emule client software and with an open-source edonkey server we extended for the purpose of this paper. The results of the measurements are presented in Section 4 before we give a summary and draw the conclusions. 224

2 Protocol Description The edonkey network is a decentral hybrid peer-to-peer file-sharing network with client applications running on the end-system that are connected to a distributed network of dedicated servers. Contrary to the original Gnutella protocol it is not completely decentral as it uses servers; contrary to the original Napster protocol it does not use a single server (farm) which is a single point of failure, instead it uses servers that are run by power users and offers mechanisms for inter-server communication. Unlike super-peer protocols like KaZaa, or the modern Gnutella protocol the edonkey network has a dedicated client/server based structure. The servers are slightly similar to the KaZaa super-nodes, but they are a separate application and do not share any files, only manage the information distribution and work as several central dictionaries which hold the information about the shared files and their respective client locations. In the edonkey network the clients are the only nodes sharing data. Their files are indexed by the servers. If a client wants to download a file or a part of a file, it first has to connect via TCP to a server or send a short search request via UDP to one or more servers to get the necessary information about other clients offering that file. The edonkey network is using 16 byte MD4 hashes to (with very high probability) uniquely identify a file independent of its filename. The implication for searching is that two steps are necessary before a file can be downloaded in the edonkey network. First, a full text search is made at a server for the filename, it is answered with those file hashes that have a filename associated which matches the full text search. In a second step, the client requests the sources from the server for a certain file-hash. Finally, the client connects to some of these sources to download the file. A more detailed protocol analysis based on measurements with TCPDump can be found in our technical report [HB02]. 3 Experiment Description We ran two types of experiments to analyse the edonkey protocol and to collect traffic samples: In the first experiment we measured the traffic at a client connected to 768/128 kbps ADSL connection rsp. a 10 MBit at the TU Darmstadt. The most popular client of the network emule [emule] under Windows XP was used for this experiment. From the measurements we derive traffic characteristics like the protocol overhead, downloading behaviour and message size distributions. For the second experiment we modified an open-source edonkey server [Lus03] that was connected to the university network collecting application level information about the us- 225

age of the network, requested files, offered files and the overall size of the network 1. 4 Results TCP/UDP summary. In the experiments with the client the share of the TCP traffic of the overall network traffic was 94% (number of packets) rsp. 99.8% (payload size). Those values do not differ significantly between ADSL and broadband connections. Note that all TCP ACK packets with zero payload were counted, too. On the ADSL line the packet loss was with 5.5% about 2.25 times higher than on the broadband line (2%). The server-side measurements differ: TCP traffic only forms about 2,4% of the packets and 6% of the payload. The server is mainly busy with handling UDP requests from clients not directly connected: If a search of a client does not deliver enough results from the server it is connected to all known servers can be searched. This behaviour has implications on dial-up connections. If the dial-up IP address was assigned to someone running an edonkey server before, many clients all over the world still reference to that IP address as a server. Server entries are typically kept 1 day or longer in most client implementations. These clients will still send UDP based search queries to that IP address consuming bandwidth. This even lead to a permanent entry in the FAQ section of the computer magazine c t [ct04]. The TCP packet sizes use the whole size spectrum from header only to MTU size. Some characteristic peaks can be identified, e.g., for TCP messages of payload size 24, they are typically used for the frequent QUERY SOURCES messages. In most cases a TCP packet carries only a single protocol message. Protocol messages can be identified quite reliably as the payload starts with E3, for details see [HB02]. The UDP packets are much less variable in size, the size clearly indicating which kind of message is transported. Protocol Messages. Looking at the client s UDP traffic the most observed protocol message by far is QUERY SOURCES with about 65%. Looking at TCP messages they are distributed far more evenly, all common messages have a percentual share between one and ten percent, while - surprisingly - the TCP QUERY SOURCES message has only about 0.008%. On the server side the things the most seen messages are QUERY SOURCES ones, with 36% (TCP) and 95% (UDP) occurrence. Throughput. On the emule/adsl and emule/broadband clients we observed an average overall throughput of 30 respectively 45 kb/s while downloading, which decreased to the preset maximum upload bandwidth (10 kb/s in our case) after downloads were finished. UDP traffic is not very important in this scenario. 1 The server proved to be quite popular and due to the sheer amount of flows was ranked on the second place of the traffic statistic of the whole TU Darmstadt network for the duration of the experiment. 226

On the server, the UDP in/out ratio was 16/2 kb/s after roughly a day (so we have 18 kb/s overall throughput), while TCP is about 0.75/0.25 kb/s after the same time. In the beginning though TCP throughput is 1.5/1 kb/s and decreases significantly after the server gets more known and UDP throughput increases massively thus suppressing the TCP communication. Connection Statistics. We observed about 100 (emule broadband), 85 (emule ADSL) and 150 (server) connection requests per minute. The share of connections actually used for data exchange is 77%, 74% and 72%. The number of simultaneous connections is 30 to 50 for emule/broadband, 30 to 45 for emule/adsl and quite exactly 700 for the server (gathered by TCP trace file analysis). For getting the average bandwidth use per connection we looked at all connections carrying more than 0.5 MB incoming payload. Of those the absolute majority (several hundred) utilises roughly two to four kb/s, with some more (about 40) ranging from 5 to 10 kb/s. About ten more connections used bandwidths up to 55 KB/sec. Those values are connection independent. Our measurements show that an average of roughly four megabytes of data is transferred per (TCP) connection. The maximum size transferred we encountered was 150 MBytes. The average TCP connection time is 30 minutes, the average idle time 875 seconds (ADSL) and 2500 seconds (broadband). User Statistics. We concentrated on getting information about how much the users share in size and file numbers, and what they search most (indicated by search terms and identified files). Our research showed that the average edonkey user shares 57.8 (max. 470) files with an average size of 217 MB (max 1.2 TByte). Most searched-for keywords were media-related words like MP3, AVI and DVD and a very high share of clearly recognisable words from current blockbuster movie titles. Because all files are identified by their unique hash values we were able to identify the most wanted files by analysing QUERY SOURCES messages. The corresponding filenames were extracted from the PUBLISH FILES messages. This showed that the vast majority of the identified files were movies just played in the cinemas at the time of the experiments. Looking at these results we can state that edonkey is mainly a movie network. Geographical Analysis and Network Size. From all clients connected to the broadband test client the IP addresses were reverse resolved to lookup their top-level domains for a rough geographical overview..de dominates easily with 66.21%, followed by the dot-net -Domains (10%), then.fr (6%) and.at (1%), while clients from Taiwan and Guatemala were both seen more times than clients with.uk. While it is probable that clients from a certain region tend towards higher interconnectivity among themselves, e.g. because they exchange movies in their language this is also an indication that the edonkey network is more popular in Germany than in other countries. The latter is supported by 227

To estimate the network size we monitored the number of servers in a server-list provided by ocbmaurice [Ocb04]. This showed that the average size shrinked down from roughly 220 servers in beginning of 2002, over 100 servers in 2003 down to 47 servers in 2004. This indicates that the edonkey network as a whole is shrinking. One reason is probably that since the end of 2002 some people operating an edonkey server had trouble with the police 2. 5 Summary and Conclusion In this paper we presented results from two measurement experiments in the edonkey network. They show that the edonkey network is primarily used for downloading current movie blockbusters. Most of the traffic is caused by long-lived TCP connections. A P2P traffic model will therefore be fundamentally different to a web traffic traffic model. References [San03] [edonkey] [emule] [mldonkey] Sandvine Incorporated. Regional Characteristics of P2P - File Sharing as a Multi- Application, Multinational Phenomenon. White Paper, 2003. edonkey2000 Original Client Homepage. http://www.edonkey2000.com. emule Project Homepage. http://www.emule-project.net. mldonkey Homepage. http://savannah.nongnu.org/projects/mldonkey/. [KaZaa] Sharman Networks. The KaZaa application for the FastTrack network. Official Homepage. http://www.kazaa.com. [Kli02] [Lug04] Alexey Klimkin. pdonkey, an edonkey Protocol Library. http://pdonkey.sourceforge.net. Lugdunum. Building an Efficient edonkey Server on Linux/FreeBSD/Win32. http://lugdunum2k.free.fr/kiten.html 2004. [Lus03] Thomas Lussnig. cdonkey, an Open-Source edonkey Server with Overnet and emule Extensions. http://cdonkey.suche.org. 2003. [Mul02] Tim-Philip Muller. edonkey2000 tools. http://ed2k-tools.sourceforge.net. [Ocb04] Ocbmaurice. edonkey2000 Server-lists and Statistics. http://ocbmaurice.dyndns.org 2004. [HB02] [ct04] Oliver Heckmann, Axel Bock. The edonkey2000 protocol. KOM technical report. http://www.kom.e-technik.tu-darmstadt.de/publications/abstracts/hb02-1.html c t - Magazin für computer technik. FAQ Section Entry - Nervige Port Scans ( Annoying Port Scans ). Heise Verlag. http://www.heise.de/ct/faq/qna/nervigeport-scans.shtml 228