1 Are Distributed Peer-to-Peer Overlay Networks Worth The Effort? Jacob Thebault-Spieker Computer Science Department University of Minnesota, Morris 600 E. 4th St. Morris, MN ABSTRACT Distributed peer-to-peer overlay networks are not dependent on any one point of failure, can be overlayed on top of a traditional network (allowing a subset of machines to form ir own network within larger network), and do not require ir own specific infrastructure. There are a number of distinct types of distributed peer-to-peer overlay networks, including structured, unstructured, and hybrid overlay networks. We will discuss similarities and differences between se types of overlay networks, as well as discuss benefits and downsides to overlay networks as a whole. Furr, we will show how distributed peer-to-peer overlay networks can be applied, given benefits y provide. Categories and Subject Descriptors C.2.1 [Computer-Communication Networks]: Network Architecture and Design General Terms overlay networks, DHT, peer-to-peer Keywords Peer-to-peer, overlay networks, distributed systems, structured, unstructured, overlay 1. INTRODUCTION Distributed peer-to-peer (dp2p) overlay networks can be used for many different things, but ultimately y provide a way of interacting with certain machines on Internet, while excluding ors. Often times, se networks are specifically intended to be exclusive, although this is not a necessary factor. These networks are most commonly used for file sharing, although can be used for anything a normal networked computer would be used for. The basic use for dp2p overlay networks is a simple, distributed key-value store (that is, a peer looks up a key, and receives value attached to key). In this usage case, This work is licensed under Creative Commons Attribution- NonCommercial-NoDerivs 3.0 United States License. To view a copy of this license, visit or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Computer Science Department Senior Seminar Conference 10 Morris, Minnesota USA. value may be a simple message to pass to a destination node, or it may return IP address and file system path for a requested file. In a simple message passing context, message can be considered equivalent to a standard packet in traditional networking. Therefore, dp2p overlay network may be used for anything that a traditional network would be used for. Commonly, dp2p systems are used for filesharing networks (as we have indicated elsewhere), but may be used for wide-area distributed file stores like PAST , or OceanStore . However, because of generic nature of key-value lookup, file storage and sharing is not only use for dp2p networks. Millar et al. demonstrate a peer-to-peer overlay network to allow quick deployment of a mobile ad-hoc emergency response network. In particular, algorithm, referred to as Common Group Aliasing (or CGA) is used in conjunction with Bamboo, a structured dp2p overlay network based on Pastry (see Section 6.1 for discussion of Pastry). Millar et al. developed this system in an attempt to deal with extreme emergency cases ( authors specifically reference EU-FP7 PEACE project as a source for se types of emergency situations). Traditionally, when emergency responders needed use of a network, y would be required to provide ir own infrastructure at site. Overlay networks, in particular dp2p overlay networks are uniquely suited to this type of need, given that y inherently are not dependent on any infrastructure outside of requirements of individual peers, and allow more time to be spent on responding to emergency. Proper discussion of distributed peer-to-peer overlay networks requires specific distinctions to be made between distributed and traditional networks, peer-to-peer and traditional client-server systems, and overlay networks and physical, individual networks. In order to do this, we will first discuss differences between a physical network and an overlay network in Section 2. From re, in Section 3 we will define what a distributed peer-to-peer overlay network is and how it differs from or types of overlay networks. We will n discuss three primary types of distributed peer-to-peer overlay networks (unstructured, hybrid, and structured) and provide examples of se types of networks. This will provide enough context to be able to discuss specific types of dp2p overlay networks, and some of benefits and downsides of dp2p overlay networks when compared to more traditional networking architectures. This will be done in Sections 4, 5, and 6 respectively. We will n finish up in Section 7 with an analysis of research being done on distributed peer-to-peer overlay networks, and potential future research areas.
2 2. BACKGROUND In order to understand overlay networks, we must first make distinction between network and overlay. The physical network consists of network as it exists in a physical sense, including things such as: large backbones of fiber optics, routers, servers, and home user connections. The physical network, on a global scale, makes up Internet, and allows computers from anywhere in world to interact with one anor. Prior to Internet, computers could be networked toger, but those networks were self-contained, and it was impossible for those computers to interact with machines in a separate network (at a different university, for instance). Overlay networks are used as solutions for anything from emergency response systems to applications for smartphones. Ultimately, an overlay network can be used for almost any network requirement, unless solution needs a physical disconnect between it s network and that of rest of world. 2.1 Definition of an Overlay Network An overlay network can be thought of as result of applying a filter to entire Internet. That is, an overlay network is a group of machines connected to Internet which all meet criteria of filter, and form a network with or Internet-connected machines that also meet criteria of filter. This network would be an overlay network, because it is laid over Internet, creating a network which is a subset of global Internet. An overlay network takes advantage of fact that infrastructure needed for participating machines to interact is already in place. For instance, group of people who frequent a website at regular intervals (or, more accurately machines y use to access website) and web server(s) running that website form an overlay network. Unfortunately, this definition includes every clientserver system (web server and browser, server and client, etc.). Additionally, defining an overlay network this way means that a subset of Internet-connected machines interacting with subset of machines smaller than size of Internet is technically interacting via an overlay network. Users may be participating (by doing something as simple as surfing many websites at same time) in multiple overlay networks at once. In this paper, we will focus on a specific type of overlay network: peer-to-peer (P2P), distributed overlay networks. 2.2 What are Distributed Peer-to-Peer Overlay Networks? In distinguishing distributed, peer-to-peer overlay networks from very broad pool of overlay networks, re are two crucial factors. First, in order for a system to be distributed, it cannot have a single point of failure. In a traditional client-server model (web server and browser, for instance), if server fails, client is unable to access information provided by server. Second, within a peer-to-peer system, re is no central node or set of nodes within network, and each node is given designation peer. There are no servers, and re are no clients in a peer-to-peer system, only peers interacting with or peers. A distributed, peer-to-peer overlay network will be referred to hereafter as a dp2p overlay network. A well known example of a dp2p overlay network is file-sharing service BitTorrent.  For a user to be able to gain access to BitTorrent network, user must install BitTorrent specific software on ir computer. This software allows user to join BitTorrent network, and provids m means to understand and use BitTorrent protocol. In order to share files using BitTorrent, one peer must create a torrent file, which provides information about which files are being shared, as well as information about tracker (a server which coordinates distribution of data between peers). BitTorrent operates across Internet, and individual peers within BitTorrent only interact with or BitTorrent peers (within BitTorrent network). The peers within BitTorrent network are not limited to participating only within BitTorrent network, but may also browse web, check ir , etc. However, within context of BitTorrent, machine being used to share files is a single peer within BitTorrent network, connecting to or peers and sharing data with se peers and only se peers. A person visiting GMail website will not improperly receive BitTorrent data unless y are also a part of BitTorrent network. When a user (or peer) requests data from BitTorrent network, software keeps track of or peers within BitTorrent network, and downloads data it requires from peers that have data being requested. BitTorrent downloads sections of total requested data (referred to as data chunks), potentially from entirely different peers, re-assembles this data, and provides user with full requested file upon completion. The primary benefit that BitTorrent provides over a standard client-server model of sharing files is that many data chunks can be downloaded at once, potentially providing better download speeds if data chunks are properly distributed among peers. Software is common method of joining and interacting with dp2p overlay network, as it manages technical requirements of participating within a dp2p network, and allows user to focus on purpose of network. Within BitTorrent, this means that user doesn t need to know which peers have data required, how to request that data from necessary peers, or how to re-assemble data chunks properly. 2.3 Why Distributed Peer-to-Peer Overlay Networks? Distributed, peer-to-peer overlay networks have one primary benefit over centralized, client-server model overlay networks: y are not dependent on a specific piece of server infrastructure being in place. If server providing requested data is unavailable, user becomes unable to obtain that data until a server hosting data becomes available. In a distributed, peer-to-peer overlay network, peers may join and leave as y wish, and network will scale with m. There is no requirement for more infrastructure if a sudden influx of peers wish to join, and re is no wasted infrastructure if number of peers suddenly drops. There is, however, a reason that majority of Internet fits traditional client-server model of networking: simplicity. Running a server, and allowing users to choose to interact with it or not; is a much simpler concept than coordinating all of peers on a network, gracefully managing large changes in numbers of peers on network, and structuring network in such a way that communication between peers is efficient. In context of file sharing, a traditional client-server model would need to act as a central location responding to data requests, a common communication point, and a single point of interaction in order to request data. This model is much simpler than having no central data location (which n leads to issues of
3 data consistency/concurrency), choosing which of peers to download data from (when every peer on network is a potential source), and knowing enough about structure of network to request this data efficiently (finding shortest path from peer A to peer B). These three differences motivate some of primary research topics within dp2p overlay networks, and will be referred to as networking problems for overlay networks. While se are areas of ongoing reserach, se three attributes of dp2p overlay networks also provide areas of contention within research community, and provide opportunities for differentiation between types of dp2p overlay networks. The most common differences between dp2p overlay networks focus on issues of data consistency or network architecture. Data consistency, concurrency, and maintaining proper state among peers is a problem that is inherent in distributed systems as a whole, whereas network structure and architecture is more specific to P2P overlay networks in general. As such, research community seems to have started differentiating between types of dp2p overlay networks based primarily on mechanism for how peers choose which or peers to interact with, and how those interactions are optimized for efficiency. There are also a number of research questions in dp2p overlay networks that address security issues like auntication, encryption, and security among peers on network. Given decentralized nature of dp2p overlay networks, tasks such as aunticating identity of a user and using secure encryption techniques between peers become more difficult. The standard client-server model of auntication assumes a centralized understanding of users (and ir respective access rights/settings within network), but in removing this centralized aspect of network, problem of requesting, verifying, and managing identity of a user becomes much more difficult. One way of dealing with this issue is discussed in , wherein re is a distributed set of auntication servers, and one delegate. When a client needs to aunticate, client interacts with delegate, which n interacts within quorum of servers that are necessary to aunticate client. Similarly, task of encrypting a connection between peers becomes much more difficult when re is no central authority. This task entails aunticating all participating peers to one anor, defining a common encryption scheme, and performing encryption in a way that is secure. Within more traditional networking architectures, a central authority that understands which peers can participate responds to requests for auntication (this is done in a pair-wise way, both client and server aunticate to one anor, in order to verify identity of both client and server) and defines encryption scheme. In contrast, all peers in a dp2p network need to agree upon a specific encryption technique to use. These will be referred to as security problems within overlay networks. Much of current research area of dp2p overlay networks is done on this set of security problems. 3. TYPES OF OVERLAY NETWOKS There are three primary types of dp2p overlay networks: structured dp2p overlay networks, unstructured dp2p overlay networks, and hybrid dp2p overlay networks. Structured dp2p overlay networks are characterized by fact that network as a whole has a mechanism by which network is structured. These will have a very specific architecture, which is defined by system. Conversely, unstructured dp2p overlay networks do not have such a mechanism, and instead take a more bottom-up approach, modifying architecture of network as required by scale of network. These types of overlay networks have no pre-defined structure. In area between se two extremes, re is a third type of network, hybrid dp2p overlay network. The hybrid approach combines both approaches, and may have some over-arching structure, but also has capacity to shift structure should situation require it. Hybrid dp2p overlay networks are also referred to as centralized, peer-to-peer networks, as most common usage hybrid networks have a central server that aids in structuring network . All dp2p overlay networks are a group of nodes, and se nodes can take on role of distributor, router, or destination node. The distributor node is a node or machine that is hosting some object, a router is a node that passes along network traffic as it reaches node, and destination node is ultimate endpoint for network traffic. Nodes may take on one or more of se roles. That is, a node can be both a router and a distributor if y data y are hosting is not data requested, but if node is still participating in routing data to or nodes in network. Additionally, all dp2p overlay networks must define specific behavior for how a node may join network, how network will behave when a node unexpectedly leaves a network, and what needs to occur if traffic does not successfully reach destination node. In following sections, we will discuss specifics of how each type of dp2p network handles se three behaviors. 4. UNSTRUCTURED DISTRIBUTED PEER- TO-PEER OVERLAY NETWORKS The first type of dp2p overlay networks we will discuss are unstructured dp2p overlay networks. These do not have an overarching scheme for how peers fine one anor, nor for how data is routed between peers. Instead, when a peer wishes to interact with anor peer, y must first find a peer to interact with. One well known example of an unstructured dp2p overlay network was original Kazaa filesharing network. If a peer within Kazaa network wanted a file, peer first had to find a second peer that was hosting file. The peer could n contact second peer, and begin download of requested data. When scaled, popular files became very accessible because y were being distributed so widely throughout network. However, items that were less popular eir were unable to be found, or became very difficult to come across. For this reason, as unstructured dp2p overlay networks begin to get large, ir usefulness decreases. In some cases, a peer searching for something would be unsuccessful, because set of peers hosting popular items significantly overshadowed set of peers hosting less popular items. This type of dp2p overlay network, because of it s unstructured nature, cannot guarantee that traffic will reach it s destination. Nor can unstructured dp2p networks guarantee that path traffic takes will be optimal. Furr, unstructured dp2p overlay networks can incur high bandwidth requirements, because of ir need to broadcast data . Furr, due to lack of structure, re is no need to define special join or leave behavior, as nodes can join and leave as y wish without affecting behavior of rest of network. While this allows levels of flexibility between on network, downsides of unstructured dp2p overlay networks make or options more desirable.
4 5. HYBRID DISTRIBUTED PEER-TO-PEER OVERLAY NETWORKS As an attempt to deal with this issue, concept of a hybrid dp2p overlay network came into fruition. Given that unstructured dp2p overlay networks may cause searches to fail in some cases, a simple solution is to provide a centralized mechanism to help a peer searching for something. Within Kazaa example, a centralized lookup index was implemented to allow peers to first ask a server which or peers may be hosting requested data. The server would n respond with a list of peers known to have hosted this data at one point. Upon receiving this list, a peer would n request data from peers on list, allowing peer to successfully retrieve data requested, assuming state of hosting peers had not changed since central index was updated. This method of adding a degree of centralization to an unstructured dp2p overlay network does come with potential downsides, given that, as previously discussed, centralization means a central point of failure. The benefit, however, is that centralization in this method does not make communication between peers impossible if central server were to be unavailable. The central server provides a way to augment capabilities of an unstructured dp2p overlay network, without requiring that network be dependent on server. Due to lack of overall structure, re is no need to define special join or leave behavior, as nodes can join and leave as y wish without affecting rest of network. There are slightly more complex hybrid architectures, that attempt to deal with point of failure problem, while also spreading load across multiple centralized servers.  The two server architectures that are most similar to two extremes (unstructured dp2p overlay networks in Section 4, and structured dp2p overlay networks as described in Section 6) are referred to as full replication architecture and hash architecture. 5.1 Full Replication Architecture The full replication architecture replicates data that would, in simple single-server case, be stored on server to all of servers being used for hybrid network. This architecture allows two primary benefits. First, when a request is made, server that client interacts with will be able to respond to all of requests client is making. This allows client to only interact with one server, and helps cut down on network overhead. Second, because each server maintains a complete copy of lookup index, if any one server goes down, ors may be available to service requests.  However, this structure does mean that concept of world that a single server would be able to maintain needs to be replicated across all servers within hybrid network. One area where this may be an issue is tracking client connections. Consider two servers, Server A and Server B, both are a part of full replication architecture in a hybrid network. Both Server A and Server B host same data, allowing users to interact with one or or, but not necessarily both. If Server A has a sudden influx of users attempting to connect, user connection statistics must be replicated to Server B. This can lead to significant amounts of network overhead between Server A and Server B, and can Server B to need to process significant amounts of incoming data in order to maintain a common state with Server A.  5.2 Hash Architechture The hash architecture is similar to full replication architecture (see Section 5.1) but instead of each server maintaining a fully copy of lookup index, re are instead indicators within lookup index that map to server hosting required information. A client may make a request, which gets routed to Server A. Upon receiving request, Server A will check it s lookup index, and may find that requested data is hosted on a different server (Server B). Server A will n request information from Server B, collect all required information, and return all of this to client.  This architecture also handles problem of maintaining state across servers (mentioned previously in Section 5.1) in a tiered way. That is, as with looking up data, only a subset of servers need to be interacted with in order to notify amounts of users requesting data. As with full replication architecture, this can cause a fair amount of bandwidth usage between servers, but less so than full replication architecture. There are also ways of making this exchange more efficient.  6. STRUCTURED DISTRIBUTED PEER-TO- PEER OVERLAY NETWORKS In order for nodes to minimize overhead and properly route traffic, structured dp2p overlay networks define a structure or algorithm for how traffic is routed throughout network. Within a traditional network, routing is done by specialized hardware, and client is told where to direct outgoing traffic by a specific network service known as DHCP, which is run by ISP (Internet Service Provider) providing client with a connection to network. Within this section we will discuss 3 primary structured dp2p overlay networks, each using a slightly different algorithm for maintaining structure within network, adding and removing nodes, and dealing with failed attempts to route traffic to destination node. All 3 of se use a variation of a distributed key-value hash lookup, known as a Distributed Hash Table (DHT) . A simple example of a key-value lookup is a dictionary, where word being looked up is key, and value is definition. Structured dp2p overlay networks use a distributed keyvalue lookup to obtain information on or nodes, and key is often a hashed name for node. DHTs are used to help define structure of overlay network, and individual DHT structures each require specific join/leave behaviors. 6.1 Pastry Pastry  is a structured dp2p overlay network developed by Microsoft Research in Cambridge, UK, that was intended as a backend for PAST (a distributed peer-to-peer file-store) . Pastry randomly assigns each node within network a 128-bit nodeid, key in DHT, while value is return response (e.g. URL to a file being hosted, specific data requested or an XML response) when a request is received by a given node node. Each node in Pastry network has what is referred to as a state table. This state table is comprised of three sections, leaf set, neighborhood set, and routing table. The leaf set and neighborhood set may contain similar elements. The leaf set is a binary search tree containing L nodes with most similar prefixes and is used for routing. In contrast neighborhood set is unordered, is not used
5 for routing, and instead maintains a list of M nodes that are F R nearest to itself (based on physical network proximity). E C The leaf set will tend to have unique nodeids, but D scheme for referring to neighbor nodes is defined by system being built on top of Pastry. Within PAST, this is A B SHA-1  hashing algorithm. This allows original node Exiting Node to have a diverse set of neighbor node options when deciding how to route data received. When a node needs to send data across network, it requires two pieces of information to do so: first, it needs a numeric key ( nodeid of S New Location Path Old Location Path destination node) to be able to make a decision Figure about5: how Updating location pointers for exiting nodes: Node is about to leave Tapestry network, a to route information. Second, node needs informs to know object server Figure. republishes 1: When node affected B object attempts with new to epoch, leave and in doing network, old location it notifies pointers. destination node S, which steps back so uses last locat what information to pass along, referred to as hop message. address to delete When a node in Pastry network receives data, it will through network to last known location for pass key and message along to neighbor with rely on soft-state toaremove givenit piece over time. of While information, we expect node wide-area R, and network updates to be dynamic, we exp numerically closest nodeid to key. When a new node A attempts to join Pastry only network, a small portion ofneighbor networklists to beaccordingly. entering/exitingtaken overlay from simultaneously. . For this reas it is assumed that new node will already havetapestry knowledge is currently unsuitable for networks that are constantly changing, such as sensor networks. of a nearby node B (one that is already a part of Pastry network). Node A will n request that node leaf set or neighborhood set, but instead maintains a list of 4.2 B Soft-state begin vs. backpointers Explicit Republishing ( set of nodes for which current node routing a special join message, as well as key for is listed as a neighbor). Tapestry also assumes that node A. Node B will n route message to node C While soft-stateset approach of nodes of within republishing network at regular are intervals evenlyisdistributed, an excellent and simplifying solution within its leaf set that has has nodeid most similar to keeping location pointers recommends up-to-date, using it implicitly SHA-1 highlights  hashing tradeoff algorithm betweenwhen bandwidth overhead node A. Node C will continue this process until join message reaches node (Z) in network that republish has operationsassigning and level of nodeids consistency to ensure of location this. pointers. A similar tradeoff exists for use of so most similar nodeid to node A. The destination node state to Z maintain and up-to-date Similarly nodeto information. Pastry, Tapestry This section also routes discusses data se through tradeoffs it s and our mechanis all nodes along path to Z will n send ir for supporting routing mobile peers, objects however, using explicit whererepublishing Pastry doesand routing delete based operations. on numeric proximity to target key, Tapestry routes based on which tables to node A, which is attempting to join For network. example, consider of a large neighbor network nodes of one has million a prefix nodes, that storing mostone closely trillion matches objects of roughly 4KB This gives node A a base routing table to work with, size (one which million objects that per of node). nodeid Assuming of 160destination bit namespaces node. for objects, 120 bits for nodes, both or can n be modified as needed. Node A copies node B s nized as hexadecimal Tapestry digits, an object and Pastry republish use operation similarresults techniques in onefor message adding (a or ObjectID, NodeID neighbor list (and n derives leaf set from re), as tuple) for each logical removing hop en route nodes to from root node, network. for a totalwhen of 40 messages a node attempts of 35 Bytes (160bits+120b node B is assumed to be physically near to node A. Given peer-to-peer nature of system, it each. is common This works out toper join machine Tapestry to be network, it also finds a node on for nodes to leave network without warning. When this. If we set Tapestry republish network, interval which to one day, authors this amount referof tobandwidth, as gateway when amortized, is eq occurs, it will not be noticed until node N makes an to 129kb/s. attempton modern node. high-speed Unlike in ernet Pastry, networks, however, we proximity can expect ahas resulting no bearing location timeout period to route a message through node Z, which has already at least two left. days. on which node is chosen to be gateway node in Tapestry. When a new node N attempts to join network, it will When node N attempts to route information through Clearly, node bandwidth already overhead havesignificantly a nodeid. limits It will n usefulness attempt of our to route soft-state a message we modify to it s our nodeid approach through to one including gateway proactive node, explicit which updates allows in addition to so approach to state ma Z, and node Z does not respond, n node N will contact tenance. As a result, next node (node M) in row containing Z in N s routing table and request nodeid for node that corresponds state republishing. We node take Nthis to gain approach some to insight state maintenance into where of both it ought nodesto andbe objects. logically object located locationon mappings network. to a 3-tuple Node of: N ObjectID, will nserverid, request LastHopID. For ea To support our to node Z (in node N s routing table). If nodegorithms, M is also we modify unavailable, node N will traverse rest of hop row, on aand location path neighbor to some map rootfrom node, each last server node keeps (node L) preceding along nodeid routeas to LastHopID. A attempt same process. In event that none ditionally, of we introduce N s nodeid. notion ofthis epochallows numbers node as anprimitive to haveversioning an initial mechanism neighborfor location poin nodes in row are available, node N will moveupdates. on to In rest map. of this Node section, N we n describe compares how algorithms distance leveraging between se itself mechanisms it s can explic neighbors. Node N chooses neighbor that is closest, and next row until process can be completed. manage While node it s and object state. modifies it s neighbor map to reflect se changes. This possible for this process to fail to find an active node, this process continues until re are no better neighbor nodes is statistically unlikely. The neighbor list also needs to be updated when 4.2.1a node Explicitly Handling availablenode thanstate ones node N is already aware of. After this process, backpointers from node L are traversed, we expect and those nodes nodes to disappear are notified from that Tapestry node duen tohas bothbeen failures and intentio leaves network. This is done by polling all nodes listed in a given node s (node A) neighbor list at a pre-determined In a dynamic network, time interval. If a node does not respond, it will be disconnections. removed In inserted. eir case, These routing nodesinfrastructure n update can ir quickly owndetect neighbor and promote maps, secondary rou fully integrating node N into network. from node A s neighbor list, and node A will request while location pointers cannot automatically recover. Removing a node from Tapestry network is much less neighbor lists from all or nodes in it s neighbor list, calculate approximate distance of new potential neigh- involved. Nodes periodically send messages to all of ir neighbor nodes, and expect12 a response back (in order to bors and add nearest neighbors until its neighborhood set is full. 6.2 Tapestry In a similar way to Pastry, Tapestry  is also intended to be a lower-level system that or systems are built on top of. Tapestry draws it s design from a method introduced by Plaxton et al.,  (referred to as Plaxton method here). Within Tapestry, each node is assigned a 160-bit nodeid, and similarly to Pastry, maintains a list of nearest nodes to it (also referred to as neighbor nodes). However, in contrast to Pastry, Tapestry does not maintain a ensure that all of a given node s neighbors are still available). When a node leaves, it will eir be noticed through this process, or it can notify all of it s neighbor nodes using it s backpointers (as demonstrated in Figure 1). 6.3 Chord While Pastry and Tapestry both structure mselves using a series of interconnected nodes (a series of connected sub-graphs), Chord instead structures its network as a ring. Where Pastry and Tapestry nodes were given an nodeid, Chord gives it s nodes an identifier, which is derived by hashing IP address of machine. The nodes are n ar-
6 ibuted to store ir data me can onsible of pplican load yword om chines ng. In such as chines e for a a filentictories operalement g, and hord to talk to ck. f keys, failure ribes a current se pron of a It uses perties. nodes probnly an tion d load successor(1) = 1 successor(2) = successor(6) = Figure 2: An identifier circle consisting of three nodes 0, 1, and Figure 3. In2: thisan example, identifier key 1circle is located being at node shown 1, key for 2 at three node 3, individual and key 6 atnodes, 0. where successor(x) function returns successor for node X. Taken from . Chord improves scalability of consistent hashing by avoiding requirement that every node know about every or node. ranged numerically in what authors refer to as an identifier circle, by using node identifier modulo 2 n where n A is Chord number node needs of nodes only in a small network amount at of that routing point in information Then, about a key or is nodes. assigned Because to this node information where isidentifier distributed, ei-a node rresolves equals or is hash function next subsequent by communicating identifier withof a few key. or time. nodes. This can In an be seen -node in in network, Figureeach 2. node maintains information onlythis about structure for or node nodes, organization and a lookup means requires that nodes messages. can join or leave network with minimal disruption, as a node Chord only must has update an understanding routing information of itself, when and it s a node successor. joins or leaves If node A network; attempts a join to join or leave requires network, keys previously messages. assigned to node that will become node A s successor get 4.2 assigned Consistent to node A. Likewise, Hashingif node A leaves network, keys that were assigned to node A get assigned to node A s successor. The consistent hash function assigns each node and key an -bit identifier using a base hash function such as SHA-1 . A node s identifier is chosen by hashing node s IP address, while a key 7. CURRENT RESEARCH identifier is produced by hashing key. We will use term key Unfortunately, to refer to both security original key problems and its image mentioned under in Sec- tion 2.3 hash function, as mean meaning that while will dp2p be clear networks from context. currently Similarly, have capacity to operate in same was a a traditional network term node will refer to both node and its identifier under would, traditional networks have advantage of a centralized hash function. security mechanism. The identifier There length is a must significant be large amount enough of to make research probability being doneofinto two auntication nodes or keys hashing schemes to [2, same 17], how identifier peers negligible. know wher or not y can trust information y Consistent are receiving hashing[10, assigns 14, keys 5], and to nodes orassecurity follows. and Identifiers trust are issues ordered within in an a identifier distributed circle system. modulo. Key is assigned to first node whose identifier is equal to or follows ( identifier of) 8. inconclusion identifier space. This node is called successor node of key In this, denoted paper, by wesuccessor have discussed. If identifiers three major are represented types of dis- circle ofpeer-to-peer numbers fromoverlay to networks, nin detail. We have is as atributed first described node clockwise major differences from. between three types of dp2p overlay Figure networks, 2 shows an and identifier we have circledescribed with specific. The circle systems has three within nodes: each0, category. 1, and 3. The Much successor of of research identifier being 1 is node done1, inso key dp2p 1 would overlay be located networks at node is happening 1. Similarly, in structured key 2 wouldp2p be located sys- node(see 3, and Section key 66), at node and 0. we believe se systems are more atems practical Consistent than hashing eir isunstructured designed to letdp2p nodessystems enter and(see leave Section 4) or with hybrid minimal P2Pdisruption. systems (see To maintain Section 5). consistent The unstruc- hash- network ing tured mapping category whenof adp2p node does joinsnot network, guarantee certain that keys dataprevi- ously will be accessible, assigned to and s successor hybrid now dp2p become systems assigned have individual to. When points of failure, which seems to go against overall goal node leaves network, all of its assigned keys are reassigned of a P2P system. to s successor. No or changes in assignment of keys to nodes need occur. In example above, if a node were to join with identifier 9. 7, REFERENCES it would capture key with identifier 6 from node  P. Druschel and A. Rowstron. PAST: a large-scale, with identifier 0. persistent peer-to-peer storage utility. In Proc. HotOS The following results are proven in papers that introduced VIII, pages 75 80, consistent hashing [11, 13]:  W. K. Josephson, E. G. Sirer, and F. B. Schneider. Peer-to-peer auntication with a distributed single THEOREM sign-on service. 1. For any Peer-to-Peer set of nodes Systems and III, keys, pageswith high probability: , Each node is responsible for at most keys  J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Wearspoon, C. Wells, et al. Oceanstore: An architecture for global-scale persistent storage. ACM SIGARCH Computer Architecture News, 28(5): ,  Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker. Search and replication in unstructured peer-to-peer networks. In ICS 02: Proceedings of 16th international conference on Supercomputing, pages 84 95, New York, NY, USA, ACM.  S. Marti and H. Garcia-Molina. Taxonomy of trust: Categorizing P2P reputation systems. Computer Networks, 50(4): ,  G. P. Millar, T. A. Ramrekha, and C. Politis. A peer-to-peer overlay approach for emergency mobile ad hoc network based multimedia communications. In Proceedings of 5th International ICST Mobile Multimedia Communications Conference, pages 1 5,  C. G. Plaxton, R. Rajaraman, and A. W. Richa. Accessing nearby copies of replicated objects in a distributed environment. Theory of Computing Systems, 32(3): ,  A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Middleware 2001: IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg, Germany, November 12-16, Proceedings, page 329,  I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of 2001 conference on Applications, technologies, architectures, and protocols for computer communications, pages ,  K. Watanabe, Y. Nakajima, T. Enokido, and M. Takizawa. Ranking factors in peer-to-peer overlay networks. ACM Trans. Auton. Adapt. Syst., 2, September  Wikipedia. BitTorrent (protocol) - Wikipedia, The Free Encyclopedia [Online; accessed 28-October-2010].  Wikipedia. Distributed hash table wikipedia, free encyclopedia, [Online; accessed 28-October-2010].  Wikipedia. SHA-1 - Wikipedia, The Free Encyclopedia [Online; accessed 28-October-2010].  L. Xiong and L. Liu. A reputation-based trust model for peer-to-peer e-commerce communities. In E-Commerce, CEC IEEE International Conference on, pages IEEE,  B. Yang, H. Garcia-Molina, et al. Comparing hybrid peer-to-peer systems. In Proceedings of International Conference on Very Large Data Bases, pages ,  B. Y. Zhao, J. Kubiatowicz, and A. D. Joseph. Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Computer, 74:11 20,  L. Zhou, F. B. Schneider, and R. Van Renesse. Coca: A secure distributed online certification authority. ACM Trans. Comput. Syst., 20: , November